curlyleaf.com
Search:    Main Page >> About Us >> Privacy >> Terms of Service >> Add Your Link >> Add Your Article   
Add Url
 

Health & Hygiene

Government & Politics

Jobs & Careers

Finance & Banking

Indoor Games

Realty & Property

Academics & Learning

Art & Creative

Issues & News

Entertainment

Self Enhancement

Malls & Shopping

Outdoor & Sports

Lifestyle & Fashion

Medical Care

Travel & Vacation

Drink & Food

Business & Companies

Automobile & Automotive

Family & Home

Internet & Computers

Teens & Children

Science & Space

People & Society


 

Main Page » Internet & Computers » SEO
 

How Search Engines Find Documents

 

Every document on the Web is associated with a URL (Uniform Resource Locator). Inthis context, we will use the terms document and URL interchangeably. This is an oversimplification, as some URLs return different documents to the user depending on such factors as their location, browser type, form input etc., but this terminology suits our purposes for now.

To find every document on the Web would mean more than finding every URL on the Web. For this reason, search engines do not currently attempt to locate every possible unique document, although research is always underway in this area. Instead, crawling search engines focus their attention on unique URLs; although some dynamic sites may display different content at the same URL (via form inputs or other dynamic variables), search engines will see that URL as a single page.

The typical crawling search engine uses three main resources to build a list of URLs to crawl. Not all search engines use all of these:

Hyperlinks on existing Web pages

The bulk of the URLs found in the databases of most crawling search engines consists of links found on Web pages that the spider has already crawled. Finding a link to a document on one page implies that someone found that link important enough to add it to their page.

Submitted URLs

All the crawling search engines have some sort of process that allows users or Website owners to submit URLs to be crawled. In the past, all search engines offered a free manual submission process, but now, many accept only paid submissions. Google is a notable exception, with no apparent plans to stop accepting free submissions, although there is great doubt as to whether submitting actually does anything.

XML data feeds

Paid inclusion programs, such as the Yahoo! Site Match system, include trusted feed programs that allow sites to submit XML-based content summaries for crawling and inclusion. As the Semantic Web begins to emerge, and more sites begin to offer RSS (RDF Site Summary) news feed files, some search engines have begun to read these files in order to find fresh content.

Search engines run multiple crawler programs, and each crawler program (or spider) receives instructions from the scheduler about which URL (or set of URLs) to fetch next. We will see how search engines manage the scheduling process shortly, but first, lets take a look at how the search engines crawler program works.

Author: Kamlesh Patel
 
Author Bio:
Kamlesh Patel is a renowned writer. Kamlesh likes to compose articles about this field.
This article can be searched using: search engine optimization services, search engine optimization firm
 
 
 

Related Articles

 
Law Firm Internet Marketing - An Executive Summary Using the Q&A Format ?C Part III
 
Internet Basics: The Internet is Like a Refrigerator
 
How To Get People To Constantly Open Your e-Mail Promotions Or Your Newsletter
 
E-banking (Online Banking) and Its role in Today's Society
 
Ways To Use Google to Boost Sales
 
Web & Graphic Design - Get It Done Right the First Time
 
So What is a Blog Anyway?
 
The Road to Better Results
 
The 10 Biggest Mistakes to Avoid on Your Web Site
 
Wireless Installation Checklist
 
 
 
Main Page >> Privacy >> Terms of Service
Copyright © 2008 www.curlyleaf.com