Enabling your web site for Content Recommendations requires that you implement the Maropost Web Tracking script on your web site, and set the schedule for the Maropost Web Spider to crawl the site. If there are portions of your site that you do not wish the spider to crawl, then add that information to your robots.txt file. Check with your web site administrator to ensure that pages you want to be included in content recommendations are NOT being blocked from the web spider.
If a page was previously crawled and then later on is hidden by the robots.txt file, then the next time the web spider crawls the site, then that page will no longer be considered as “recommendable.”
Maropost’s web spider supports full regular expression (Regex) pattern matching of the Go language— even though the robots.txt specification does not. For example, suppose your web site has a plug-in that allows site visitors to post their comments on a web page. A link to a comment would have a URL similar to https://path-to-your-site.com#comments1234. You will want to make sure that the Maropost web spider doesn’t add any page with specific comments to be added to the list of recommendable pages. Therefore, you will add the following line to your robots.txt file:
To set the schedule for the web spider, click the “Enable Recommendations” check box in the Create/Edit Web Site dialog box.
- Ignore robots.txt – Enable this option only if your ecommerce storefront platform is auto-generating a robots.txt file that is non-editable and is blocking the Maropost web spider from cataloging the proper pages from your site. If you instruct the web spider to ignore your site’s robots.txt file, then all pages will be cataloged. To ensure that only the right pages are selected for recommendations, make sure that (1) the correct pages have the KEYWORDS meta tag and the tag’s value contains the proper comma-delimited keywords, and (2) the filters you set in the Recommendation Rules properly selects only those pages you want for recommendations.
- Parallel hits – Specify the number of concurrent threads that will be spidering your site. The more the threads, the faster the site will be cataloged. However, the default setting of 5 concurrent threads is sufficient to spider sites having millions of pages.
- Crawl day – Set which days of the week you want the web spider to crawl your site. If you are constantly adding new content, then a daily crawling is highly advised.
- Crawl time – Set the hours of the day (in Eastern Time Zone) that you want the spider to run. You probably wouldn’t need to schedule it every hour, but if you are constantly adding new content or products to your site, then every 4-6 hours is recommended.