How to scrape Google with recaptcha breaking
Google is a fantastic source of content for any keyword.
SEO Content Machine is one of the few content scrapers that can use Google to find and download content for you.
However, automated scraping eventually means you will trip up Google anti-bot systems and trigger a captcha event.
For example:
recaptcha
Google will show you the reCaptcha screen and attempt to stop you from continuing to scrape content.
Breaking reCaptcha
After sometime in the lab, SEO Content Machine is now able to automatically solve simple reCaptcha events like the one pictured above for you automatically.
There are no settings to input, or anything you need to do within SCM to get it to work.
The process is automatic, all it requires is that you have pre-purchased captcha breaking credits beforehand.
How to purchase
Jump onto the SCM members website here first and login.
Then follow this link to purchase credits.
Once purchased, you will see the amount of credits displayed in your SCM main window.
(If it doesn't appear click on the label to force it to refresh and show the most up to date value)
Breaking reCaptchas when scraping Google
How do you know if you are being issued Google reCaptchas and need captcha breaking credits?
The SCM application log details all the important steps that any content tasks takes. It is the one stop section to understanding what SCM is doing and what problems or errors it might be experiencing.
Reading the app log can tell you a lot about what might be going right Or wrong. In fact if you send us a support email, we will most likely ask you for a copy of the app log.
As the article creator is running, it will keep a log of each step it takes.
When SCM tries to contact Google, it will try and find any reCaptchas on the page and if it does, it will log “Captcha found”.
What follows then is automatic, SCM will attempt to break the reCaptcha for you automatically.
If you have captcha credits, 1 credit is deducted for a successful solve.
If you don't have any credits, then you will get this log message instead.
The log window will tell you that you are “Google ban” and that you need to buy additional solves by going to http://bit.ly/2cv60R1
Alternatively you can go directly to this link:
http://seocontentmachine.com/members/signup/captcha
Google retry & Proxies
In case you were wondering about the “Retrying Google… 1/5” message.
What it means is that SCM will keep retrying to contact Google up to 5 times before giving up.
Each time it does it, it will rotate through any proxies that you have enabled!
Normally, proxies are not needed because it's cheaper and easier to solve the reCaptchas before falling back to renting additional proxies.
Sometimes a quick IP change using a cheap or free VPN is good enough too.