Tutorials

How To Add Custom Article Sources

Adding additional article sources is a special feature that will greatly enhance the diversity of the content generator.

It also helps to be creative and realize that it works just as well for blogs, information sites as well as traditional article directories. Watch the video below to get a quick run down of how I add a new article source.

Before You Begin

You need to have 2 bits of information before adding a new source.

  1. Domain name of the site
  2. Xpath to the main content area.

The “XPath” bit is the scariest if you have never seen xpath.

Rather than teach it to you, there are some good sites that do that already

Xpath Learning Resources:

How SCM Scrapes Content

SEO Content Machine uses the domain name and runs a search using a search engine (such as BING, BING CACHE or GOOGLE).

To avoid bans you can tell SCM to read the BING CACHE. This way SCM does not download anything of the target site itself. Good for getting around sneaky sites that don’t like you scraping their content.

How To Add A New Source

  1. Add new entry in “Edit Sources”
  2. Add domain name
  3. Open site you want to add and find sample article
  4. Use a web page query tool like
  5. Click on the main content area
  6. Find an ID or class attribute
  7. Add xpath expression
  8. Enable the new source by clicking on the checkbox

Xpath Cheat Sheet

The default xpath selector for most websites looks like this,

//*[@id=”article-content”]
or
//*[@class=”article”]

Lets break it down!

  • // – This means select anywhere in the document
  • * – Select any tag
  • @ – Attribute selector
  • id – The attribute we are selecting ie ID. You can also use class.
  • =”article-content” – The actual class name or ID

As you can see Xpath isn’t very hard to understand if you break it down.

The most important thing is to find the class or ID that encapsulates the content we want to scrape.