When Google stores the links to your site it looks at the full URL. Even though humans know that 99% of the time, http://www.example.com/home and http://example.com/home are the same page, the GoogleBot Spider cannot really be sure. This means that it will store them as two different pages but with the same content. You may have read about duplicate content and its negative effects on your site's rankings. This is actually debatable, however in common sense terms it is still useful to provide one unique URL for each page.
What I am saying is: pick a domain format for your site and stick to it. It doesn't matter if you use the www subdomain or not but choose one and stay consistent. To set this up you may simply be able to use your web host control panel, or you may need to configure it using a method such as the Apache .htaccess file.
Let's say you want people to only see pages on your site using the www.example.com URL. If using your web host's control panel, you need to tell it to send all requests to example.com to www.example.com.
If you are using .htaccess, then you need to add a mod_rewrite rule like the following:
RewriteEngine On
RewriteBase /
# force trailing slash
RewriteCond %{REQUEST_URI} /+[^\.]+$
RewriteRule ^(.+[^/])$ /$1/ [R=301,L]
# redirect nonwww
RewriteCond %{HTTP_HOST} ^yourdomainname.com [NC]
RewriteRule ^(.*)$ http://www.yourdomainname.com/$1 [L,R=301]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
You will find your .htaccess file at the root of your site's directory. If you don't have one, just create a file called .htaccess and add the above code. Otherwise, just integrate the code into the existing file. Remember to change the yourdomainname to your site's name.
Notice in the code, that he has also added a rule to add a trailing slash to a URL if it was not typed. This means that if someone types http://www.example.com into the browser, the web server will modify the url to be http://www.example.com/
The reason for this is similar - to avoid duplicate content. Google and other search engines may view the two variations of the URL as different pages with the same content. Again, it is safer and neater to stick with the same convention on all URL's, so why not add this rule also. Some web hosts already have this rule configured so it may not be necessary for your site.
(The htaccess config was found here at Earners Blog. This post has some other htaccess config tips if you're interested.)
I suggest you check all your sites for this issue, as it could be causing some of your pages to fall into the supplemental index.
No comments:
Post a Comment