button button button button button button button button button button button button
Side Image
Page Banner

Google

How it Works

In order to understand how to efficiently use search engines, it helps to understand how they work. Here is a quick explination.

Search engines are made up of four main components:

  1. Spider: A program that crawls through the web and feeds web pages to the search engine. The spider crawls through the Internet collecting web pages from websites. It moves from site to site via the links on the pages. The spider starts out at heavily used sites and makes its way through the web following the links to different pages. Search engines will employ the use of multiple spiders to gather information more quickly.

  2. Index: The spider feeds the pages it collects to the index. The index takes the words from the web pages and stores them along with their location on the webpage.

  3. Ranking algorithm: The algorithm ranks the words in the index by looking at how many times the words appear on the webpage and their prominence on the page. Words appearing near the top of page or in the header, and words that appear more frequently are given higher priority by the ranking algorithm. Because each search engine uses its own ranking algorithms, you will receive different results from different search engines. Google’s claim to fame is its unique and successful PageRank algorithm. In addition to evaluating the words in a website by where and how often they appear in a site, PageRank looks at the website as a whole and who is linking into the website. Sites that are linked to more often are ranked higher than sites that are less frequently linked to.

  4. Results list: When you type a search phrase into Google, Google will check its index to see which web pages have a high ranking for the terms. Google then displays the results in order of ranking.

Limitations of search engines

While search engines are able index millions of web pages, they are limited in a number of ways. Because websites are constantly changing with information being added and deleted, it is impossible for the spider to keep up with all the changes. Also, there are many places where the spiders cannot go: they do not read .pdf or Flash files, they cannot index all the data on sites with dynamic, real-time information, and they are unable to get to many databases accessible via the Internet. According to the OCLC Online Computer Library Center's Office of Research, only 35% of the internet is publicly available and searchable via search engines; the other 65% consists of private or provisional sites, or sites that are not accessible by search engines. The information on the web that is not accessible by search engines is part of the Hidden Web or Deep Web.

Information taken from HowStuffWorks.com, “How Internet Search Engines Work.”