— iIT-Services

Archive
Tag "Crawler"

An open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way.

This free, open source tool uses web crawlers to extract information from websites. Using this tool does require some advanced skills, and coding knowledge. However, if you are willing to work your way past the learning curve, Scrapy is ideal for large web extraction projects. Because it is an open source tool, there is a lot of good community support available to users: https://scrapy.org/.

Read More

import.io is a web-based platform for extracting data from websites without writing any code. The tool allows people to create an API using their point and click interface.

Users navigate to a website and teach the app to extract data by highlighting examples of data from the page, learning algorithms then generalise from these examples to work out how to get all the data on the website. The data that users collect is stored on import.io’s cloud servers and can be downloaded as CSV, Excel, Google Sheets or JSON and shared. Users can also generate an API from the data allowing them to easily integrate live web data into their own applications or third party analytics and visualization software. For more technical users, import.io offers real-time data retrieval through JSON REST-based and streaming APIs, integration with several common programming languages and data manipulation tools, as well as a federation platform which allows up to 100 data sources to be queried simultaneously: https://www.import.io.

Read More