ZOMDir > Blog

Saturday, 14 September 2013

What is in the CNAME?

For the ZOMDir project there are a few CNAME's used to redirect a subweb like blog.zomdir.com, via ghs.google.com, to  blogger.com

A few days ago I discovered that for all CNAME redirected subwebs it was not possible to set the crawl rate for Googlebot in Google Webmaster Tools. 


It might be coincidence, but I doubt it is. Therefor I started a test with CNAME's and the possibility to change the crawl rate.


The result is that I'm a little confused now. I thought it is easy to add a CNAME. However I get this results:

  1. cname.zomdir.com should show the content of original.helenahoeve.nl
  2. the page shown is www.zomdir.com
Vice versa
  1. cname.helenahoeve.nl should show the content of original.zomdir.com
  2. the page shown is www.zomdir.com
Sorry, I don't understand what's happening. 

Update: The problem described above occurs due to the fact that both sites are hosted by hosting2go. They state that CNAME will not work for a site in the same network which is the case.

The strange thing is that all other CNAME are pointing to ghs.google.com. In Google Admin I have to administer for each appengine site which URL is expected (e.g. websitequality.zomdir.com). Hmmm, that feels like an indication that the configuration as mentioned above will not work. I have to figure out what to do next ...

Tip: You can use mxtoolbox.com to find out what the actual CNAME is for cname.zomdir.com.

Tuesday, 10 September 2013

Site settings for subdomains

For the ZOMDir project there are several subdomains, in the following table I will give an overview of the subdomains and the possibility to set the crawl rate in Google Webmaster Tools.

  1. about.zomdir.com Ok, Let Google optimize for my site
  2. blog.google.com Your site has been assigned special crawl rate settings. You will not be able to change the crawl rate
  3. browsersize.zomdir.com Ok, Let Google optimize for my site
  4. example.zomdir.com Ok, Let Google optimize for my site
  5. faq.zomdir.com Ok, Let Google optimize for my site
  6. onion.zomdir.com Ok, Let Google optimize for my site
  7. pagerank.zomdir.com Your site has been assigned special crawl rate settings. You will not be able to change the crawl rate
  8. safebrowsing.zomdir.com Your site has been assigned special crawl rate settings. You will not be able to change the crawl rate
  9. setextbrowser.zomdir.com Your site has been assigned special crawl rate settings. You will not be able to change the crawl rate
  10. thedarksideof.zomdir.com Your site has been assigned special crawl rate settings. You will not be able to change the crawl rate
  11. try.zomdir.com Your site has been assigned special crawl rate settings. You will not be able to change the crawl rate
  12. websitequality.zomdir.com Your site has been assigned special crawl rate settings. You will not be able to change the crawl rate
  13. www.zomdir.com Your site has been assigned special crawl rate settings. You will not be able to change the crawl rate
  14. zomdir.com Ok, Let Google optimize for my site
Initial I thought that it was not possible to change the settings for www.zomdir.com due to the fact that it is a subdomain of zomdir.com. However now I think it has something to do with the way a website is redirected. 

The subdomains for which it is not possible to set the crawl rate are all redirected via a CNAME record to ghs.google.com. Probably that's the reason why I'm not able to change this setting.

I'll try to sort this out.


Monday, 9 September 2013

Crawl speed to low?

As stated in Taming the bots, I have to slow down the bots. This evening I concluded that Googlebot is again visiting ZOMDir.com multiple times a second. The result is that the site is down again.

I have changed my robots.txt and for Googlebot I have the following setting in Google Webmaster Tools according to this instruction


0.02 requests per second (50 seconds between visits)



The crawlrate in Google Webmaster Tools (in Dutch)
Due to the fact there are 24*60*60 = 86400 seconds in a day, I expect that Googlebot will retrieve a maximum of 1728 pages a day, with this setting.

May be Google thinks this makes no sense, and may be they are right. I will change this setting to 0.2 (actual 0.198) requests per second and hope that Googlebot respect this setting.


Note: I will keep the setting in robots.txt the same for this moment (Crawl-delay: 50) for other bots.


I am pretty sure that Googlebot is visiting ZOMDir.com. The IP addresses used by this bot are for example: 


66.249.73.151

66.249.73.142
66.249.73.158
66.249.73.137
66.249.73.138
66.249.73.157

Verifying these adresses give via Reverse DNS Lookup results like this:



66.249.73.142 resolves to
"crawl-66-249-73-142.googlebot.com"
Top Level Domain: "googlebot.com"
Country IP Address: UNITED STATES

The only hope I have at this moment is that someone wrote that it can take weeks before Googlebot is following your instructions. I hope that that person was wrong ...

Javascript and a broken back button

For the ZOMDir project I use as little javascript as possible. The site should always work and not depend on a feature like javascript. Nevertheless I use javascript to smooth the browsing experience. 

In the javescript I use code like:


window.location.href = "http://example.zomdir.com";


However this code gives a broken backbutton in Internet Explorer 8. After such a redirect and pressing the backbutton the user gets the message "The webpage is expired". To solve this I have coded the redirect like this:


window.location.replace("http://example.zomdir.com");


It is a small change, but it fixed the problem.



A lesson from the bots

As described in Taming the bots, ZOMDir has a lot of initial blank pages. So the bots has a lot pages to visit. In the past, and recently, the bots where very active. The result was an unexpected bug.

ZOMDir has seperate server limits for reading and writing data to the database. When the bots visited ZOMDir the limits where reached.


When I designed ZOMDir I want to reduce the number of reads and writes. I also want a fast loading website. So I decided that when one page was visited and it wasn't stored yet, I will create it and write it to the database. It seems clever, but it wasn't ...


Pity enough I first reached the read limit. So the program concluded for every page that a new page was visited, and wrote a new page to the database. When that occurs all available links on that page where wiped. Oops, that wasn't part of the plan.


Luckily, the information wasn't gone completely. To achieve a very responsive site I had to store information about the links redundant. This made it possible to analyse the damage and restore the links.


So thanks to the bots I learned how the site reacts when it becomes very busy.