In June 2006, the Oxford English Dictionary decided to add a new verb to its collection – a word that certainly wouldn’t have been considered 10 years prior. Still, it had quickly become a common part of the English Language:
: to use the Google search engine to obtain information about (someone or something) on the World Wide Web.
Search engines have taken over the now-obsolete Yellow Pages, and have become an integral part of our daily lives. But how much do we really know about them? Do they really have information on every single web page? How do they know which ones to show you?
To answer these questions, we need to peek behind the curtains of some of our most well-loved search engines…
Let’s say you’ve just typed a simple question into the Google search bar. As you hit enter, you’re immediately greeted by pages upon pages of (mostly) relevant, high quality results. It seems just like magic, right? In reality, there’s lots going on behind the scenes to provide the perfect answer to your query.
Hundreds of billions of pages will have been searched and indexed by a web crawler, scanning for content and links, and collecting information to store in Google’s index. Think of this index like a giant library of URL’s, with information stored under each one about a page’s contents, keywords, any recent updates, and more. As you hit ‘search’, Google looks through the index for data gathered by the web crawler, trying to find pages that match your query.
With suitable pages identified, it’s time to determine which order to display them in. Though the words used in the query play a big role in determining which ones are most relevant, there are also a number of other factors to consider:
- User’s location: if you have your location turned on, most search engines would aim to display results based in your local area, particularly if you include terms such as ‘near me’ in your query.
- Language detected: naturally, if your query is written in English, you probably wouldn’t expect to see a list of websites in French. Search engines will try to provide results written in the same language as the query, where possible.
- Search history: search engines are able to gleam a lot from your previous searches, and can use this data when finding the best pages for you.
- User’s device: for example, if you are searching from a mobile then the engine will aim to produce mobile-friendly pages.
- Page Rank: last, but far from least, is the quality of the page itself, as determined by a search engine’s algorithm.
All of these pieces of data help Google to present a results page (SERP) that best answers your query. Each search engine has its own crawler, index and algorithm, but they all work in much the same way. Let’s take a closer look at how it all works.
Building the Index
As mentioned before, it all starts with a web crawler – but what exactly is a web crawler? Well, it’s essentially a bot that scans through pages upon pages on the internet, gathering data as it goes. You may have heard of web crawlers being referred to as spiders – this is because of the way they crawl through a page, travelling along any links that they find to discover new pages, and gradually building up a web of interconnected pages, all stored in the search engine’s index. This is how new pages are crawled – although it is possible to manually submit your page to be crawled if you don’t want to wait.
So the pages are found, data is stored, and the crawler moves on. That’s it?
Not quite. Let’s travel back to 2006 once more. A great article has just been published on a respected website. The algorithm at the time ranks it highly, and it appears on page one of the SERP. How likely is it that this would still be the most relevant answer to the same query today?
Pages have to be re-crawled periodically in order to keep the index up-to-date. Exactly how often they have to be crawled is determined by the algorithm – some pages will become outdated quicker than others.
We’ve got our rich index of URLs and web pages, now our search engine algorithm needs to determine what order to display the results in. The first page of the SERP is the holy grail for websites, as anyone entering a query is likely to select one of the first links they see. To decide which pages get the honour of appearing here, the algorithm assigns something called a page rank.
Each search engine has a different algorithm for determining page rank, which is why a SERP for the same query will look different on Google and Bing, for example. Plus, these algorithms are constantly learning and developing, so webmasters need to keep track of any changes and improve their Search Engine Optimisation (SEO).
With that in mind, here are a few things current algorithms are looking for:
Authority / Trustworthiness
Search engines are looking to deliver the best answers to users’ queries, so they’re naturally going to want to offer sites they can trust. But what makes a site trustworthy? To most algorithms, it’s largely about how many pages link to it. For example, Wikipedia is recognised by many as a reliable source of information, with thousands of pages referencing it. A page like this is likely to rank higher on the SERP.
One thing algorithms hate is duplicated content. To most people, it makes sense to avoid completely plagiarising another person’s work, but that’s not all. Similarities between pages on the same website can flag up as duplications, as well as using a similar template for, say, product descriptions. Content deemed to be unique is more likely to rank highly.
Google doesn’t want that 2006 page showing up on the first page of the SERP if it hasn’t been updated in years. If a page has been changed recently, search engines will infer that it contains up-to-date content and boost its page ranking.
One thing a web crawler looks for in particular are keywords and phrases, particularly if they feature in headings and meta descriptions. Algorithms are also able to use semantic relevance to identify the context which the keywords are used in, as well as how many similar words are used throughout the content. This gives the algorithm a good idea of just how close a match the page is to a particular query.
However, too many keywords crammed into a page can flag up as keyword stuffing, bringing the page ranking back down again.
Now that you’ve crawled through this page, storing the information in your own personal index (AKA brain), you’ll never look at a search engine the same way again.