Download content from a sitemap.xml
For easier indexing reasons most sites publish a sitemap xml.
This file is an easy way to get a list of all posts on a website.
We can use this file to download content to our hard drive.
Sitemap xmls look like this eg:
https://seocontentmachine.com/post-sitemap1.xml
data:image/s3,"s3://crabby-images/3dc4c/3dc4ceb5d0b4bde092c034984a96ccd0582b7775" alt="image 27"
As you can see on our site, it lists all the posts.
To download this content inside SEO Content Machine, use the XML Scraper Tool.
XML Scraper Tool
Find the tool under scrapers.
data:image/s3,"s3://crabby-images/9bd22/9bd22e0fb649ad709f225f05692b2a9fcc82ede7" alt="image 30"
Create a new XML scraper task.
data:image/s3,"s3://crabby-images/c3ce2/c3ce2b432ebae7ea24c70ce08daa7d90b384577d" alt="image 31"
Paste in the sitemap xmls.
data:image/s3,"s3://crabby-images/f6703/f670315f8bbdee471f7291e09128e18437a118a8" alt="image 32"
You can paste in multiple different sitemaps, one per line.
Check the default settings.
Selecting content
data:image/s3,"s3://crabby-images/f4e23/f4e23ad5a4fa8ebb1ce4fb88b193a5776b69b53f" alt="image 33"
The scraper will download content using CSS selection.
The default is to download content inside H2 and P tags.
data:image/s3,"s3://crabby-images/8eedd/8eedd847b0ec7fbb97790aa64ed718fa0945943f" alt="image 34"
This is a comma separated list, you can add extra tags like H3, H4, DIV etc.
It will also unwrap any links so they are converted just to text.
data:image/s3,"s3://crabby-images/3ff22/3ff22033e772230ce735c118bd7bdf6b5d49fc50" alt="image 35"
You can also remove tags and attributes.
data:image/s3,"s3://crabby-images/e6355/e635514f21b8275c514e12ab0472816a68370ee2" alt="image 36"
Finally, you can shape the output of the saved content via the article string.
data:image/s3,"s3://crabby-images/5a4bb/5a4bbeb5a4c48c999b85e7c239e95d7c3f6706a3" alt="image 37"
The tool will create an article on your hard drive with a H1 tag and the content of the article below it.
The title of the article is automatically detected by the tool and provided to you via the %title% macro.
The %content% macro is the scraped content from the scrape tags property. eg: H2, P tags
Filtering
You can manipulate the downloaded content.
data:image/s3,"s3://crabby-images/9dd89/9dd895b0657b4e0b0e6641265452d422ad0736b7" alt="image 38"
The first is to limit the number of items downloaded from the sitemap.
data:image/s3,"s3://crabby-images/614ae/614ae68d378e4c8e5102ff8509c4e8ae467b9a10" alt="image 40"
The default is 5.
There are tooltips as well!
data:image/s3,"s3://crabby-images/c687b/c687becd4901810f023d9a67ac7c6689e53a0831" alt="image 41"
You can set the value to -1, to make the task download everything it can find.
data:image/s3,"s3://crabby-images/94bef/94bef1df41183cc21ff70545abaf68a1009cdcf5" alt="image 42"
You can use regex to find and replace text content.
There is also a toggle to remove all HTML if you need a plain text article.
data:image/s3,"s3://crabby-images/67e71/67e719e6689af171423367f1c86f08347dcb161a" alt="image 43"
You can re-wrap all lines in another set of tags.
data:image/s3,"s3://crabby-images/27e71/27e71a13797c1267991f2e76eafc978d75d13f27" alt="image 44"
Rewriter
You can use a rewriter on scraped content to make it unique.
data:image/s3,"s3://crabby-images/27eb9/27eb90b9986baabd72c707bc72edf2af77e624c4" alt="image 45"
This means you can use AI as well.
data:image/s3,"s3://crabby-images/8f0ba/8f0ba4113d648dbb433aa7c5a421992f47b6af49" alt="image 46"
Run it
Click run to start the task.
data:image/s3,"s3://crabby-images/5bdb8/5bdb81c8b8f85ef5a991867d3663a85a9c0e0c36" alt="image 47"
Output
Once the task starts running, click on its row to see a task log.
data:image/s3,"s3://crabby-images/16771/16771ee5d29d543ff4212ffd5c890c825718e230" alt="image 48"
Posts are found from the sitemap and downloaded to your hard drive.