General

Download content from a sitemap.xml

For easier indexing reasons most sites publish a sitemap xml.

This file is an easy way to get a list of all posts on a website.

We can use this file to download content to our hard drive.

Sitemap xmls look like this eg:

https://seocontentmachine.com/post-sitemap1.xml

image 27

As you can see on our site, it lists all the posts.

To download this content inside SEO Content Machine, use the XML Scraper Tool.

XML Scraper Tool

Find the tool under scrapers.

image 30

Create a new XML scraper task.

image 31

Paste in the sitemap xmls.

image 32

You can paste in multiple different sitemaps, one per line.

Check the default settings.

Selecting content

image 33

The scraper will download content using CSS selection.

The default is to download content inside H2 and P tags.

image 34

This is a comma separated list, you can add extra tags like H3, H4, DIV etc.

It will also unwrap any links so they are converted just to text.

image 35

You can also remove tags and attributes.

image 36

Finally, you can shape the output of the saved content via the article string.

image 37

The tool will create an article on your hard drive with a H1 tag and the content of the article below it.

The title of the article is automatically detected by the tool and provided to you via the %title% macro.

The %content% macro is the scraped content from the scrape tags property. eg: H2, P tags

Filtering

You can manipulate the downloaded content.

image 38

The first is to limit the number of items downloaded from the sitemap.

image 40

The default is 5.

There are tooltips as well!

image 41

You can set the value to -1, to make the task download everything it can find.

image 42

You can use regex to find and replace text content.

There is also a toggle to remove all HTML if you need a plain text article.

image 43

You can re-wrap all lines in another set of tags.

image 44

Rewriter

You can use a rewriter on scraped content to make it unique.

image 45

This means you can use AI as well.

image 46

Run it

Click run to start the task.

image 47

Output

Once the task starts running, click on its row to see a task log.

image 48

Posts are found from the sitemap and downloaded to your hard drive.