Tutorials

How to scrape a sitemap xml for content

You can use any sitemap.xml and extract content using the article downloader.

The article downloader url box can be used to paste in urls directly from the clipboard.

The easy thing with this is that the article downloader will intelligently search for only URLs in any copied content.

How to extract Urls

The first thing you need to do is extract urls from the sitemap.xml, to do this we need to view the source.

  1. Open the sitemap in your browser and “view page source”
    sitemap
  2. Just copy the source as it contains Urls.
    copy sourece
  3. Paste the entire text into the Url grid using “Paste From Clipboard”
    pasteThe article downloader will find urls and import them into the url grid for you.
    scrapeitYou can do this method for anything that has urls in it.