Getting Started

Scraper API is designed to simplify web scraping. A few things to consider before we get started:

  • Each request will be retried until it can be successfully completed (up to 60 seconds). Remember to set your timeout to 60 seconds to ensure this process goes smoothly. In cases where every request fails in 60 seconds we will return a 500 error, you may retry the request and you will not be charged for the unsuccessful request (you are only charged for successful requests, 200 and 404 status codes). Make sure to catch these errors! They will occur on roughly 1-2% of requests for hard to scrape websites.
  • If you exceed your plan concurrent connection limit, the API will respond with a 429 status code, this can be solved by slowing down your request rate
  • There is no overage allowed on the free plan, if you exceed 1000 requests per month on the free plan, you will receive a 403 error.
  • Each request will return a string containing the raw html from the page requested, along with any headers and cookies.

Basic Usage

Scraper API exposes a single API endpoint, simply send a GET request to http://api.scraperapi.com with two query string parameters, api_key which contains your API key, and url which contains the url you would like to scrape.

Sample Code

Bash / Node / Python/Scrapy / PHP / Ruby

curl "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/ip"

Result:

<html>
  <head>
  </head>
  <body>
	<pre style="word-wrap: break-word; white-space: pre-wrap;">
	  {"origin":"176.12.80.34"}
	</pre>
  </body>
</html>							

Rendering Javascript

If you are crawling a page that requires you to render the javascript on the page, we can fetch these pages using a headless browser. This feature is only available on the Business and Enterprise plans. To render javascript, simply set render=true and we will use a headless Google Chrome instance to fetch the page:

Sample Code

Bash / Node / Python/Scrapy / PHP / Ruby

curl "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/ip"

Result:

<html>
  <head>
  </head>
  <body>
	<pre style="word-wrap: break-word; white-space: pre-wrap;">
	  {"origin":"176.12.80.34"}
	</pre>
  </body>
</html>							

Custom Headers

If you would like to keep the original request headers in order to pass through custom headers (user agents, cookies, etc.), simply set keep_headers=true. Only use this feature in order to get customized results, do not use this feature in order to avoid blocks, we handle that internally.

Sample Code

Bash / Node / Python/Scrapy / PHP / Ruby

curl --header "X-MyHeader: 123"
\
"http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/anything&keep_headers=true"

Result:

<html>
  <head>
  </head>
  <body>
	<pre style="word-wrap: break-word; white-space: pre-wrap;">
		{
		  "args":{},
		  "data":"",
		  "files":{},
		  "form":{},
		  "header":{
			"Accept":"*/*",  
			"Accept-Encoding":"gzip, deflate", 
			"Cache-Control":"max-age=259200", 
			"Connection":"close", 
			"Host":"httpbin.org", 
			"Referer":"http://httpbin.org",
			"Timeout":"10000",
			"User-Agent":"curl/7.54.0",
			"X-MyHeader":"123",
		  },
		  "json":null,
		  "method":"GET",
		  "origin":"176.12.80.34",
		  "url":"http://httpbin.org/anything"
		}
	</pre>
  </body>
</html>							

Sessions

Smarty Scraper API exposes a single API endpoint, simply send a GET request to http://api.smartyscraper.com with two query string parameters, api_key which contains your API key, and url which contains the url you would like to scrape.

Sample Code

Bash / Node / Python/Scrapy / PHP / Ruby

curl "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/ip"

Result:

<html>
  <head>
  </head>
  <body>
	<pre style="word-wrap: break-word; white-space: pre-wrap;">
	  {"origin":"176.12.80.34"}
	</pre>
  </body>
</html>							

Geographic Location

If you are crawling a page that requires you to render the javascript on the page, we can fetch these pages using a headless browser. This feature is only available on the Business and Enterprise plans. To render javascript, simply set render=true and we will use a headless Google Chrome instance to fetch the page:

Sample Code

Bash / Node / Python/Scrapy / PHP / Ruby

curl "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/ip"

Result:

<html>
  <head>
  </head>
  <body>
	<pre style="word-wrap: break-word; white-space: pre-wrap;">
	  {"origin":"176.12.80.34"}
	</pre>
  </body>
</html>							

Premimum Proxy Pools

Scraper API exposes a single API endpoint, simply send a GET request to http://api.scraperapi.com with two query string parameters, api_key which contains your API key, and url which contains the url you would like to scrape.

Sample Code

Bash / Node / Python/Scrapy / PHP / Ruby

curl "http://api.scraperapi.com/?api_key=YOURAPIKEY&url=http://httpbin.org/ip"

Result:

<html>
  <head>
  </head>
  <body>
	<pre style="word-wrap: break-word; white-space: pre-wrap;">
	  {"origin":"176.12.80.34"}
	</pre>
  </body>
</html>							

If you have any questions, you can contact support or email us at [email protected]

Ready to start scraping?

Get started with 1000 free API calls or contact sales

OR