The image to the left depicts a Quark (at least I think it does), I thought this was a good image to lead into this topic with, the reason why will become clear later, but it is related to the title of this post, “Why all search engines are flawed.”
If I wasn’t too worried about having an ultra long title, and I wanted to include a secondary pun, albeit a rather weak one, I might possibly have titled this “SEO Quirks – Why all search engines are flawed and will always be able to be manipulated by webmasters.” I didn’t think anyone would want to read a post with such a long winded title, so I opted for the shorter version, but you get the gist and I’d now like to explain what I’m talking about in both the title and the meaning.
We are taught to think of search engines – all of them not just Google – as being ultra hi-tech, almost intelligent pieces of software run by ever advancing algorithms that are not only beyond the comprehension of mere mortal men, but probably beyond the concept of all but a few highly intelligent savants who are on a different level to the rest of us.
But as with all complex systems and ideas they can always be broken down into simple, bite sized chunks, and this is also the case with search engines and more fundamentally the very notion of a search engine.
I think that as time goes by we forget the very meaning of words even, it isn’t that long ago that search engines first appeared and the name describes the function and the purpose of them, but we think more of the end results, so let’s think about a similar word and it’s meaning like “fire engine”.
We all know what a fire engine is, and what it does, and so we just accept it, but have you ever thought about the words and how odd they seem when combined together?
The origins of “fire engine” probably come from the 18th or 19th Century when an engine referred almost universally to a steam engine, and steam engines of course were also used to pump water from the tin mines of Cornwall, and coal mines of the North East of England, so a mobile “engine” whose purpose was to extinguish fires became a fire engine (or fire pump).
But they do seem upon closer inspection to be quite a mismatch.
An engine of course refers to a machine that does work, and so what is a search engine but simply a machine whose role is to perform searches and to (hopefully) present relevant documents.
And it is this idea of relevance that provides the unhooking between the belief in the ability of search engines, to the reality of what they can and cannot actually do.
We are all used to hearing the ever familiar words of search engine engineers telling us about the “signals” they use to determine relevancy, and hence results, but we should really examine the notion of these signals in more detail, and think more laterally and literally about their effectiveness.
Can a Layman Determine the Level of Expertise in Any Given Subject?
Suppose that you and I decided to sit down and read 100 books on a subject we know nothing about (lets say Quantum Mechanics, so quarks, muons, and all of the rest) and that we will then rank those books in order of relevance and importance to the subject.[I choose books since they are comparable to websites – a title is a site title, a page meta title is like a book chapter heading, etc]
How on earth do we begin to do that? Imagine if this is compounded by us reading books written in a foreign language that we do not understand or speak or read, or as with search engines, cannot understand the actual meaning, the context, or the inference.
We might have a list of “keywords” where we know that “Quark” in English is “Quirk” in SEOish, and so if we cannot understand the verbiage, or the context, or the meaning then what can we do but look for the number of times a word appears and try to infer something from that?
It is clearly a poor measure.
Since we have no understanding of the subject (and possibly the language) then how can we possibly determine which are the best and most accurate books?
All that we could do is to defer our judgement to other factors, like which is the easiest to understand (to a layman), which has the nicest & clearest graphics, which has titles that contain the “right” words, which has chapters that repeat the foreign keywords, etc, but for all that we know the “facts” inside the book could be totally wrong, or even about a totally different topic.
But since all of this is out of our control, and is in total control of the author (be it us looking at a book, or a search engine looking at a website), then we have to determine to what degree we can trust that author to be truthful in his titles, etc.
In other words is he trying to deceive us and get a website about “sex pills” (lol sorry) ranking, or is it a website full of gibberish that he wants to rank so he can make money from affiliate schemes, or Adsense, or something else?
What else can we do but to defer our judgement even further to 3rd parties who we consider to be “experts in the field” whose word we will necessarily take over our own understanding (or lack of it).
And this is exactly what search engines attempt to do, but on a much larger scale – they are not trying to determine the most relevant site in ONE specific topic, but rather in all topics, anything you can think of or search for, they want to provide the most relevant results.
And they have a distinct lack of understanding in all of them.
Can you see the impossibility of this if you are not an expert with implicit understanding in all of those subjects?
Remember that a search engine is not an artificially intelligent being, it has no understanding of the words on a webpage, or book page, it can just look for words, perform calculations, and try to infer meaning and relevance to them based on some predetermined set of mathematical parameters.
The Hilltop Algorithm
What Google did way back when was to buy the rights to something called the “Hilltop Algorithm” where they try to determine the “authorities” in any given niche, then look at who those people or sites link to, and assign more value to those links since authorities are more likely to link to authorities, especially where anchor text matches the topic.[Of course the obvious problem is that who you determine to be an authority may not be, or they may not be in control of their website (hacking etc), or they may be taking payments for links, or a whole host of other possibilities.]
That is to say, if an authority site in quantum mechanics linked to the local pizza parlour to thank them for their undying supply of nutrition in all of those late hours they’d worked, and they used the anchor text “best pizza in Geneva”, it would have less oomph than if they’d linked to some low grade engineering firm who’d once supplied them with a spare quantum widget with the anchor text “great quantum widgets”.
So in other words the whole thing is unbalanced because first of all the rankings are determined by 3rd parties supposed to be experts who may or may not be experts due to the search engines lack of understanding of the actual topics that they are trying to return results for, and secondly because they try to infer relevance due to topic match between authority sites and receiving sites, plus the rest of the obvious problems.
So in conclusion, if you or I are not experts in any given field, then we cannot possibly be qualified to determine the best resources for that topic, and more relevantly since search engines cannot even understand the language or it’s context then they are even worse positioned to determine relevancy or accuracy of information.
Therefore, since search engines need to rely heavily upon what they consider to be authoritative third party testimonials, reviews, or links in general from these sites (which could be false), then they can and will always (until they are independently intelligent & knowledgeable) be capable of being manipulated.
If someone can determine the most important signals a search engine uses, then they can simulate it to rank better.
So in a final conclusion, if you want to rank well in Google (or any other search engine) then you need to obtain high value links, which implicitly infers links from what are considered by each individual search engine to be high authority, important links from high authority, important sites.