Anatomy Of a Search Engine Spider

The term “search engine” is often used generically to describe both crawler-based search engines and human-powered directories. These two types of search engines gather their listings in radically different ways.

Everything the crawler finds goes into the second part of the search engine: the index. The indexer takes every word on a web page, logs it, categorizes it and then stores the results in a huge database. Indexing every word allows most search engines to go beyond simple keyword searches and allows proximity searching for words close to each other.

Some indexers also index the HTML coding which allows the search engine to look by web page categories like URLs or titles. Most special searching features can be utilized in the advanced search areas of nearly all the major search engines. The Help section of every search tool will show you how to get maximum results from that specific search engine.

There is a time lag from when a web page is crawled to when it is indexed. Until it is indexed, it is unavailable to search engine users, which means it exists in their system, but is not yet accessible by you. This is why you have to be skeptical of some of the boasts of search tools. As an example, when Google announced in February, 2004, that it had increased its total number of pages to 4.28 billion, it did not mention that a portion of those results were un-indexed pages. Yes, you still had access to billions of Google’s pages, just not all 4.28 billion!

If a crawler finds changes on a web page, then it updates the index to include the new information. The word “index” implies categorization and classification – activities that require human assessment and interpretation. In reality, the indexing for a search engine is done by computer (software, actually), and the rankings of the responses, or hits, are calculated by mathematical formulas as well.

To improve performance, many search engines eliminate certain common words like “is,” “and,” “or,” and “of.” These are called “stop words” that add no real benefit to the search. Search engines also have taken other steps to focus their searches by eliminating punctuation and converting all letters to lowercase. It is important to remember that each search engine has different rules and ways of working.
Query Process

The third part of a search engine is its query processing capability, the complicated part of the process. What happens is the query is taken by the search engine, the index is searched, and all kinds of different factors are weighed in deciding what is relevant, what is not, all before the results are returned. The exact process differs with every search engine and the search engine companies closely guard the specific mathematical algorithms used to make their calculations. The big difference is the way relevance is calculated.

Crawler-Based Search Engines Crawler-based search engines, such as Google, create their listings automatically. They “crawl” or “spider” the web, then people search through what they have found.If you change your web pages, crawler-based search engines eventually find these changes, and that can affect how you are listed. Page titles, body copy and other elements all play a role.

Human-Powered Directories A human-powered directory, such as the Open Directory, depends on humans for its listings. You submit a short description to the directory for your entire site, or editors write one for sites they review. A search looks for matches only in the descriptions submitted.

Hello , I am Jackie Smijames , I am a SEO expert, website designer and article writer. I am writing articles for nearly 2 years. My new interest is in website development. So come visit my latest website that are Download Mp3 for free and Miley Cyrus Mp3 ,I hope you will enjoy my articles and the websites.

Processing your request, Please wait....