Every serious website owner or webmaster goes through a phase where they monitor their domain traffic, this monitoring can be done using tools like google analytics, awstats or raw website logs. Personally, I rely on all the resources but for some domains I also record the data to my database.
Using a simple script I can then filter the rows to get the information I am interested it.
The problem I face is that some of the data accumulated is of spiders and bots and not the bots I am interested in. When Google, Bing, Yahoo, Twitter, Facebook crawls a domain you feel a sense of inclusion but there are other bots that just add to your website logs and bring no value unless you subscribe to their services.
These spiders include seo tools that crawl and save statistics like backlinks, internal links, page rank, and other SEO information. While this information is valuable, if you dont use the service it become a tedious task of having to filter through them.
To make things simpler we can block them using a robots.txt file. Copy and paste the following code to your robots.txt file in the root of your domain:
Disallow: / User-agent: SemrushBot Disallow: / User-agent: MJ12bot Disallow: / User-agent: SemrushBot Disallow: / User-agent: SemrushBot-SA Disallow: / User-agent: rogerbot Disallow:/ User-agent: dotbot Disallow:/ User-agent: Alexibot Disallow: / User-agent: SurveyBot Disallow: /
Its important to note that some bots might still continue crawling your site, the change will only take effect once the crawler sees the change made in the robots.txt file and registers the rule.