Preventing your site from being indexed, the right way

It keeps amazing me that I keep seeing people use robots.txt files to prevent sites from being indexed and thus showing up in the search engines. You know why it keeps amazing me? Because robots.txt doesn’t actually do the latter, even though it does prevent your site from being indexed.

Let’s go through some terms here:

Indexed / Indexing
The process of downloading a site or a page’s content to the server of the search engine, thereby adding it to it’s “index”.

Ranking / Listing / Showing
Showing a site in the search result pages (aka SERPs).

So, while the most common process goes from Indexing to Listing, a site doesn’t have to be indexed to be listed. If a link points at a page, domain or wherever, that link will be followed. If the robots.txt on that domain prevents the search engine from indexing that page, it’ll still show the URL in the results if it can gather from other variables that it might be worth looking at.

