In order for any goal to be achieved it needs to be formed with a solid foundation to build upon until you reach your desired goal. In the world of internet marketing, online advertising, and search engine optimization our foundation is built upon the basic fundamentals of how a search engine functions.
A search engine is basically a mathematical machine that interprets certain types of code and text. They then generate many data points that they use to produce the results you see when you perform your searches online.
While we use the words “they” and “search engines” generally for the intent of educating you on the most widely used technology we are going to focus mostly on Google. As of late 2007 / Early 2008 Google had roughly 60% of the worldwide market share for online searching. In some areas of the world that percentage was closer to 90%. Google averaged, world-wide between 25 and 31 Billion searches per month in 2007. With those numbers there are a lot of eyes on Google to see what they do. Since Google actually supplies data to many other search engines what works for Google should, and almost always does, work for everyone else.
As you learn more about how a search engine works you’ll start to see that much of the mysteriousness and hype that is generated by online “gurus”, “experts” and so on are a result of mastering these basic ideas. For it is this knowledge that we build upon to produce our online strategies for search engine optimization, online sales, and internet marketing.
In this brief article we’ll discuss the following points:
- What are these “Spiders”?
- The importance of Sitemaps and Robots.txt
- How do search engines find you?
- What do search engines index?
- How are pages ranked?
- How humans interact with search engines.
What are these “Spiders”?
Spiders, robots, crawlers, indexers, etc… These are what we are going to refer to as Spiders. These will become one of your best friends from here on out. Becoming well attuned to what they like, dislike and don’t care about is very important.
A spider is a software program or application that is used by the search engines to find web pages out there across the internet. While there are various types of spiders, we are only interested in ones that actually crawls websites. We use the term crawl because they are basically loading a webpage, reading the content, and following the hyperlinks on the page to continue crawling through the internet.
Spiders work in a fairly methodical way. They start off by fetching a page (crawling to it). The next step is analyzing the webpage code and content and breaking it down into words. The content that gets broken down into words is fed into a massive database which is comprised of words and what sites they occur on.
The last step is where it basically moves on to other pages by reading the URLs and links back into the Spider program so that it can crawl onto those pages to continue the process.
The Importance of Sitemaps and Robots.txt
When a spider comes to your webpage the first thing that it does is read the file called, “robots.txt”. This robots file has instructions that the search engine spiders are supposed to follow. This file is usually located in the same place that your website is located. You can designate which parts of the site you want to be indexed and other areas that you want them to skip. While there are many search engines the majority do follow the same rules here.
Standardization is starting to make its way into the world of search engine optimization. One great example that is currently widely overlooked is the use of a properly formatted site map. For information on how to correctly format a sitemap to be used by a search engine visit: www.sitemaps.org. A correctly formatted sitemap can tell the search engines which pages are more important than others, update time intervals, authors, descriptions, etc. They can be highly effective at helping the search engines quickly index your site properly and thoroughly.
How do Search Engines Find You?
When search engine optimization was in its infancy the majority of search engines had submit forms and other ways for you to manually add a page to be indexed. It would often then be put into a queue and indexed within several weeks.
Today the game is much different. We just learned that a major step of what Spider does is looking at hyperlinks on a page and following them to continue crawling to new webpages. Only a small handful of search engines still operate with this manual form of entry.
There are some out there selling services to submit your website automatically to all the search engines. While some of these are free we wouldn’t recommend them. The return on investment (whether time or money) will be slim. Since the vast majority of traffic only comes from 4 or 5 major search engines with no manual entry it makes little sense to focus on other search engines.
There are some search engines such as Yahoo! that offer paid submission. These services don’t offer special advantages other than the assurance that your page will be indexed in a certain amount of time and that it is up to date.
Basically, when it comes down to it we are suggesting that you let the search engines find your website on their own. Today, when adding a new webpage it rarely takes more than a couple weeks to be crawled and indexed. It will typically take only a week or so for the spider to find you, however it may take up to another couple weeks for someone to actually find the page through that search engine.
What Do Search Engines Index?
When a spider comes along to your webpage and indexes the page the entire page is not physically stored for use. As stated earlier, they take in a lot of information that is about the words and text on your website.
Some of this information is on which words appeared on your website, where they appeared, were they hyperlinks, etc. Some search engines will read images, scripts and other forms of rich media. However these other media types are typically only returned in specific searches such as Google’s image search. However, search engines such as Google are starting to return images as results when they deem it would be useful to the user.
Overall, search engines focus on the text. Since you enter text to perform a search they have naturally built their search engine algorithms around this. This is a primary reason why the technology, Flash, is not read by the spiders.
Code is indexed in a different way since it’s not content. Some search engines will index code that is specifically designed to tell the spider something about the page. One example would be the META tags. META tags can be used to tell a description of the page as well as to tell it keywords used to describe the content. Today however, few search engines place a high value, if any, on these tags due to them being widely abused in the past. Today, the “description” META is probably the most useful as it is the description that is displayed when a viewer sees your page in the results after performing a search.
Other forms of code that is indexed on varying search engines are file names in URLs, the ALT tags of images, and comments made in code for the programmers. Generally placing a lot of time on these elements will have a very low ROI, if any and should be quickly covered to move on to other more important factors.
The last major piece that is read in and indexed is the hyperlinks. These links are widely used to determine what the page is about. The destination page can be used to determine what this page is about as well as the text used in the link, which we call “anchor text”.
Analyzing links is one of the most important tasks the spider has to do. “Link Popularity” is heavily used to determine overall value and importance of the webpage. The idea is that if there are a lot of links to this page then this page must have some authority on the subject matter. On the other hand, if this page of authority links to another site the spider takes notice of this. The relevancy of pages that are linked is another point of interest to keep spamming and abuse in check. For example, if you have an auto-insurance website being linked to from a D/FW Social Happenings website there is an obvious disconnect in subject matter and that link will be valued less.
How Are Pages Ranked?
There isn’t any one solution that will account for every search engine. Each one is slightly different. Each one has its own “algorithm” which is basically their factors of importance of how they read and interpret content and data. While they all share many of the same rules you will notice that the results change from search engine to search engine.
We divide what influences your rank into two different categories. The first is termed “On-Page Factors” and the second is “Off-Page Factors”.
When we talk about on page factors we are generally speaking of what’s going on directly on this page. This includes things like TITLE tags, textual content, headlines, the use of the H1, H2, and H3 tags. Also, styling text with italics, bold, and underlined text helps the search engines determine what’s important on the page.
Keywords on the page are analyzed closely. While this does vary between the search engines generally they take in similar information points such as: Their position on the page is indexed. The proximity to other textual content is measured. Keywords appearing in those stylized ways with bold and italics as well as the heading tags also help portray importance. These factors will help tell the spider that this page is more important for this particular keyword than another page that may have that keyword listed once.
Off page factors can actually be more important than on page factors. Off page factors are primarily focused on links coming into the webpage. The reason for this is that excellent pages will, or hopefully should, have more pages linking to them and pages of little value will most likely have few, if any links to them.
Off page factors are rising in popularity in terms of importance factors because the on page factors can widely done fairly easily and not really offer the viewer any additional value. Gaining links to your website however requires that your content be of high quality making it worthwhile to link to so that it can be shared with others. It adds a bit of a human touch to an otherwise math-based algorithm used to determine importance.
Google has a proprietary system that they call “PageRank”. It is the best documented and most discussed. Essentially it is a metric used to measure quality of a given webpage just like inches is one way to measure length. Every page has some PageRank once it is indexed by Google.
The amount of PageRank is determined by inbound (links to your webpage) and outbound links (links from your website to another). PageRank flows with links. PageRank flows from one page to another through links. The amount of PageRank that is passed is divided between all the links going out from your site. So the more links you have on your webpage the less PageRank is passed to each link.
The more inbound links coming into your site the more PageRank you have flowing your site. This tells Google that your site is more popular, more valuable and has more viewers. The more pages that you have across your enter website the more PageRank you initially have to begin with.
Another slightly lesser known measure of ranking pages is user feedback. Every search engine has some form or fashion of measuring user feedback. Some of these are as simple as seeing which results viewers actually click on and others are a bit more detailed. For example, Google’s local listings allow Google’s email users (gMail) to cast reviews, recommendation and ratings for businesses that are in those listings. The point of user feedback and how to account for it is to design for the visitor, not the search engine. Designing for the visitor will ensure that your strategy will work across the different search engines.
How Humans Interact with Search Engines
We’ve mentioned a few times in this article that designing around the needs of the human visitor is critical to sustaining long term success. Designing for a search engine is a fruitless endeavor. The algorithms used are changing every two to three months. Designing around those will ensure that your rankings change with each revision requiring you to re-strategize your search engine optimization goals and plans.
When a web surfer goes to a search engine they typically search in one of two ways. These are termed, “Drill Down” searches and “Targeted Search”.
Drill down searches the viewer will find a website of a high quality that is an authority of the desired topic. The viewer will most likely not return to the search engine. This could be because they found what they were looking for immediately, or because the website provided links to other pages that provided the information that the viewer was searching for eliminating the need to do another search.
Targeted search the viewer is usually looking for something very specific. This type of search is sometimes referred to as “long tail searches” as well. This is because a targeted search can be a string of several words such as “Fort Worth historic home real estate agent”.
Knowing the common types of searches becomes a key point in determining how to present the information that will be displayed in the listings when a viewer searches. If you’re going to operate an online store you’ll probably be more interested in those targeted searches where the viewer is looking for a particular product or service. For those viewers that use the drill down process offering information in your listing that demonstrates a wealth of knowledge with resources to even more will help attract those types of viewers.
As a viewer scans over the results until they find the first listing that appeals to them they’ll be reading the Title, description and any links that might accompany that result. Pages with poorly worded titles or with no Titles at all will appear to provide little value. Similarly, a page with a poor description in the results listing will provide little incentive to visit this page.
Closing Words
Remember, the search engine’s goal is to provide the user with the best results possible. If the viewer constantly gets poor results they will have little desire to return to this search engine in future for their searching needs.
Likewise, in terms of advertising with the search engines, if your ads constantly appear in unrelated results providing little relevancy and most likely poor conversions why would advertisers return to do business with this search engine?
Attempting to out run the search engines by utilizing “black hat” techniques or other avenues of gaining the system will be short lived for the reasons just stated. While they may work for the short term the search engines will develop ways of closing off those holes.
The principles of search engine optimization have not changed much. While a lot has changed online the basic principles of giving quality information in a value-adding way that makes it easy for the viewer will not. We base our strategies off of these ageless techniques to produce foundations to build upon into the future.
Written By George Gdovin