After a complete website upgrade I discovered Google, Bing, ahrefs,
Moz and a bunch of other bots were constantly crawling the website
looking for old URLs, these old URLs contained PHP variables and would
look something like this:
The links were non existent and had been removed. The content that was once their was deleted. After the upgrade the application was doing a soft redirect, it was basically sending all request to a soft 404 page telling them the URL cannot be found. But not actually telling them that its deleted, the status returned was status code 200, which meant the bots or crawlers would continue scrolling these pages and re-indexing them, the only different being, the content changed.
Why a 404 soft redirect was bad in this case
In this particular case, I have +/- 200 URLs that are indexed by Search engines which no longer existed, the upgraded version of the website only had 20 URLs, The ratio of good content to total crap was not in my favour.
After monitoring crawls to the website from these engines and gauging the results in Google searches I decided that I needed to break the cycle, this was a completely new website attached to an old domain, I needed to break all ties with the stigma attached to the old websites SEO and ranking and the only way I could do this was to tell the search engines that those old crappy links are dead so they can eventually stopusing those dead links to evaluate the rank of my the website.
For this I decided to do the following,
Find all queries where the URI contains a particular phrase I would redirect to a page with one of the following status codes:
There are different ranges of status codes:
- 1xx – Informational, How far along are we
- 2xx – Success, Something is found, we are on our way with the good stuff
- 3xx – They were expecting us, gave us a good welcome and politely told us to visit them somewhere else
- 4xx – Client error, There’s an error caused my the domain
- 5xx – Server error There is an error, We found the IP, We went to the server but before we reached the files, something went wrong so the error must be on the server
Most of the status codes are irrelevant to me so I will mention those that I could use in this particular case
Status Code 410, Deleted
A 410 response code is used when the data, URL requested is gone, deleted. You would use this redirect status to completely remove the URL’s from index, cancel them out. They dont exist! This is what Google suggests to use if you want a page removed, the problem is, it shows up as a crawl error.
Status Code 404, Not found
A 404 response code tells the user that the page requested is not found, this will not work for me because Im not giving the user a clear signal. I could be saying, the content was here once, but its unavailable, or it might been deleted, or it never existed. This status code is the most popular because its general and a good fall back but it should be monitored. In my case, Ive already monitored the website visits and I know for sure that the old URIs are not being entered by mistake, they are indexed links.
Status Code 301, Moved Permanently
A 301 response header tells the user that the URL, page or content that they have requested is still valid but it was moved, You would use this status code if you change domains or if you change the URL of a page. The indexed link will still work but will be redirected to the new URL and the value of the old URL will pass to the new URL. If you dont use a 301 redirect, users coming to you website will see a server error or a 404 page if you have one set up. Without a 301 status code, You will lose the visitor and you will lose whatever value that URL once had. In my case, this could work, but I dont really have the same content as before so I cant say the content was moved. Also, I dont want to be associated to those old links. They dont have value as it is and if I do a 301 redirect, they will remain in the search index.
I opted for a 410 redirect as this will hopefully help me break away from the Websites previous repution