Scraper API is designed to simplify web scraping. A few things to consider before we get started:
Scraper API exposes a single API endpoint, simply send a GET request to http://api.scraperapi.com with two query string parameters, api_key which contains your API key, and url which contains the url you would like to scrape.
curl "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/ip"
<html> <head> </head> <body> <pre style="word-wrap: break-word; white-space: pre-wrap;"> {"origin":"176.12.80.34"} </pre> </body> </html>
If you are crawling a page that requires you to render the javascript on the page, we can fetch these pages using a headless browser. This feature is only available on the Business and Enterprise plans. To render javascript, simply set render=true and we will use a headless Google Chrome instance to fetch the page:
curl "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/ip"
<html> <head> </head> <body> <pre style="word-wrap: break-word; white-space: pre-wrap;"> {"origin":"176.12.80.34"} </pre> </body> </html>
If you would like to keep the original request headers in order to pass through custom headers (user agents, cookies, etc.), simply set keep_headers=true. Only use this feature in order to get customized results, do not use this feature in order to avoid blocks, we handle that internally.
curl --header "X-MyHeader: 123"
\ "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/anything&keep_headers=true"
<html> <head> </head> <body> <pre style="word-wrap: break-word; white-space: pre-wrap;"> { "args":{}, "data":"", "files":{}, "form":{}, "header":{ "Accept":"*/*", "Accept-Encoding":"gzip, deflate", "Cache-Control":"max-age=259200", "Connection":"close", "Host":"httpbin.org", "Referer":"http://httpbin.org", "Timeout":"10000", "User-Agent":"curl/7.54.0", "X-MyHeader":"123", }, "json":null, "method":"GET", "origin":"176.12.80.34", "url":"http://httpbin.org/anything" } </pre> </body> </html>
Smarty Scraper API exposes a single API endpoint, simply send a GET request to http://api.smartyscraper.com with two query string parameters, api_key which contains your API key, and url which contains the url you would like to scrape.
curl "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/ip"
<html> <head> </head> <body> <pre style="word-wrap: break-word; white-space: pre-wrap;"> {"origin":"176.12.80.34"} </pre> </body> </html>
If you are crawling a page that requires you to render the javascript on the page, we can fetch these pages using a headless browser. This feature is only available on the Business and Enterprise plans. To render javascript, simply set render=true and we will use a headless Google Chrome instance to fetch the page:
curl "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/ip"
<html> <head> </head> <body> <pre style="word-wrap: break-word; white-space: pre-wrap;"> {"origin":"176.12.80.34"} </pre> </body> </html>
Scraper API exposes a single API endpoint, simply send a GET request to http://api.scraperapi.com with two query string parameters, api_key which contains your API key, and url which contains the url you would like to scrape.
curl "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/ip"
<html> <head> </head> <body> <pre style="word-wrap: break-word; white-space: pre-wrap;"> {"origin":"176.12.80.34"} </pre> </body> </html>