Thursday, 5 December 2013

6 Months

Today, ZOMDir is 6 months old, or more exactly 183 days. The avarage number of links added per day is 45.

See the stats for the current situation.

Monday, 18 November 2013

ZOMDir's logo

I always state that ZOMDir's logo is a salamander, which is based on Escher's reptiles. Recently I discovered that a Gecko is a reptile and a Salamander is an amphibian. I think I need to update my biology knowledge ;-)

I also discovered that Gecko is a layout engine used by the Mozilla Foundation and the Mozzilla Corporation. That is funny because the Open Directory Project (dmoz.org) was originally named: Directory MOZilla.

ZOMDir is named ZOMDir because I wanted to re-invent DMOZ, reverse the bold letters and you have ZOMDir. Despite that it is pure coincidence that the salamander is used as logo for ZOMDir.

ZOMDir's salamander
Netscape's Gecko logo

Sunday, 3 November 2013

HTTP Errors

Since the launch of ZOMDir at June 5, 2013 I add a regular base sites myself to the directory. This is great for discovering small "bugs".

In the past a time-out of 7 seconds, before the conclusion is drawn that the given site isn't responding. I discovered that a time-out of 12 seconds is better. Some sites are really slow ...

I also discovered that there are websites which show up in the browser. However when I try to fetch these websites, I get the result "None" and a HTTP error (code 403 and code 412). Very strange indeed.

According to the Hypertext Transfer Protocol these codes have the following meaning:

403 Forbidden
The request was a valid request, but the server is refusing to respond to it.

412 Precondition Failed
The server does not meet one of the preconditions that the requester put on the request.

I really don't understand why browsers correctly shows these websites and while ZOMDir is not able to fetch them correctly.

If you have a clue, please let me know.



Note: Recently I found a website with the status 300 Multiple choices. That one is also difficult to handle.

Thursday, 3 October 2013

Google Web Designer, first impressions

Google Web Designer is a wonderful tool for creating great ads.

The tool seems to focus on pixel perfect animations and 3D rotating of panes. Unfortunately, the learning curve is steep. You still have to know your CSS. When you want to use things like letter-spacing, you have to add this code manually. I didn't find an option in the interface to adjust this. 

I also didn't get the concept of animations allthough I have watched the given instruction videos like "Using the advanced timeline". For the moment I resign. The type of ads you are able to create with Google Web Designer are out of place when compared with the style of ZOMDir.com

So I don't think there will soon be a match between Google Web Designer and ZOMDir, allthough it is of course always possible that someone wants to use ZOMDir to create a collection of nice Google Web Designer examples ;-)

Saturday, 14 September 2013

What is in the CNAME?

For the ZOMDir project there are a few CNAME's used to redirect a subweb like blog.zomdir.com, via ghs.google.com, to  blogger.com

A few days ago I discovered that for all CNAME redirected subwebs it was not possible to set the crawl rate for Googlebot in Google Webmaster Tools. 

It might be coincidence, but I doubt it is. Therefor I started a test with CNAME's and the possibility to change the crawl rate.

The result is that I'm a little confused now. I thought it is easy to add a CNAME. However I get this results:

  1. cname.zomdir.com should show the content of original.helenahoeve.nl
  2. the page shown is www.zomdir.com
Vice versa
  1. cname.helenahoeve.nl should show the content of original.zomdir.com
  2. the page shown is www.zomdir.com
Sorry, I don't understand what's happening. 

Update: The problem described above occurs due to the fact that both sites are hosted by hosting2go. They state that CNAME will not work for a site in the same network which is the case.

The strange thing is that all other CNAME are pointing to ghs.google.com. In Google Admin I have to administer for each appengine site which URL is expected (e.g. websitequality.zomdir.com). Hmmm, that feels like an indication that the configuration as mentioned above will not work. I have to figure out what to do next ...

Tip: You can use mxtoolbox.com to find out what the actual CNAME is for cname.zomdir.com.

Tuesday, 10 September 2013

Site settings for subdomains

For the ZOMDir project there are several subdomains, in the following table I will give an overview of the subdomains and the possibility to set the crawl rate in Google Webmaster Tools.

  1. about.zomdir.com Ok, Let Google optimize for my site
  2. blog.google.com Your site has been assigned special crawl rate settings. You will not be able to change the crawl rate
  3. browsersize.zomdir.com Ok, Let Google optimize for my site
  4. example.zomdir.com Ok, Let Google optimize for my site
  5. faq.zomdir.com Ok, Let Google optimize for my site
  6. onion.zomdir.com Ok, Let Google optimize for my site
  7. pagerank.zomdir.com Your site has been assigned special crawl rate settings. You will not be able to change the crawl rate
  8. safebrowsing.zomdir.com Your site has been assigned special crawl rate settings. You will not be able to change the crawl rate
  9. setextbrowser.zomdir.com Your site has been assigned special crawl rate settings. You will not be able to change the crawl rate
  10. thedarksideof.zomdir.com Your site has been assigned special crawl rate settings. You will not be able to change the crawl rate
  11. try.zomdir.com Your site has been assigned special crawl rate settings. You will not be able to change the crawl rate
  12. websitequality.zomdir.com Your site has been assigned special crawl rate settings. You will not be able to change the crawl rate
  13. www.zomdir.com Your site has been assigned special crawl rate settings. You will not be able to change the crawl rate
  14. zomdir.com Ok, Let Google optimize for my site
Initial I thought that it was not possible to change the settings for www.zomdir.com due to the fact that it is a subdomain of zomdir.com. However now I think it has something to do with the way a website is redirected. 

The subdomains for which it is not possible to set the crawl rate are all redirected via a CNAME record to ghs.google.com. Probably that's the reason why I'm not able to change this setting.

I'll try to sort this out.

Monday, 9 September 2013

Crawl speed to low?

As stated in Taming the bots, I have to slow down the bots. This evening I concluded that Googlebot is again visiting ZOMDir.com multiple times a second. The result is that the site is down again.

I have changed my robots.txt and for Googlebot I have the following setting in Google Webmaster Tools according to this instruction

0.02 requests per second (50 seconds between visits)

The crawlrate in Google Webmaster Tools (in Dutch)
Due to the fact there are 24*60*60 = 86400 seconds in a day, I expect that Googlebot will retrieve a maximum of 1728 pages a day, with this setting.

May be Google thinks this makes no sense, and may be they are right. I will change this setting to 0.2 (actual 0.198) requests per second and hope that Googlebot respect this setting.

Note: I will keep the setting in robots.txt the same for this moment (Crawl-delay: 50) for other bots.

I am pretty sure that Googlebot is visiting ZOMDir.com. The IP addresses used by this bot are for example:

Verifying these adresses give via Reverse DNS Lookup results like this: resolves to
Top Level Domain: "googlebot.com"
Country IP Address: UNITED STATES

The only hope I have at this moment is that someone wrote that it can take weeks before Googlebot is following your instructions. I hope that that person was wrong ...

Javascript and a broken back button

For the ZOMDir project I use as little javascript as possible. The site should always work and not depend on a feature like javascript. Nevertheless I use javascript to smooth the browsing experience. 

In the javescript I use code like:

window.location.href = "http://example.zomdir.com";

However this code gives a broken backbutton in Internet Explorer 8. After such a redirect and pressing the backbutton the user gets the message "The webpage is expired". To solve this I have coded the redirect like this:


It is a small change, but it fixed the problem.

A lesson from the bots

As described in Taming the bots, ZOMDir has a lot of initial blank pages. So the bots has a lot pages to visit. In the past, and recently, the bots where very active. The result was an unexpected bug.

ZOMDir has seperate server limits for reading and writing data to the database. When the bots visited ZOMDir the limits where reached.

When I designed ZOMDir I want to reduce the number of reads and writes. I also want a fast loading website. So I decided that when one page was visited and it wasn't stored yet, I will create it and write it to the database. It seems clever, but it wasn't ...

Pity enough I first reached the read limit. So the program concluded for every page that a new page was visited, and wrote a new page to the database. When that occurs all available links on that page where wiped. Oops, that wasn't part of the plan.

Luckily, the information wasn't gone completely. To achieve a very responsive site I had to store information about the links redundant. This made it possible to analyse the damage and restore the links.

So thanks to the bots I learned how the site reacts when it becomes very busy.

Friday, 30 August 2013

Taming the bots

Even when nobody adds a link to ZOMDir.com, it is a huge site. The reason is that there are 184 languages, 9 initial subjects and circa 200 initial locations (all countries plus the continents). The result is 184 x 9 x 200 or circa 330000 available webpages.

I was glad when googlebot visited the website, but ... the load of googlebot was to heavy. In practice I will be able to handle circa 30000 pages a day. So it shouldn't be a surprise that googlebot sliced the website down. 

It wasn't easy to find where to set the crawl rate in Google Webmaster Tools. Initial I was looking at the menu at the left while the option is available under the "settings icon" shown at the top right. The next hurdle was that it was not clear for me that this setting is only available for the domain name zomdir.com and not for the subdomains www.zomdir.com or thedarksideof.zomdir.com

After changed the crawlspeed I concluded that in process of time Googlebot is indeed slowing down. I hope that the other bots will follow these instruction in the robots.txt file:
User-agent: *
Crawl-delay: 50
At this moment it seems that it worked to tame the bots. The last days the site continues to respond while several bots are visiting ZOMDir.comI hope it stays that way.

Friday, 26 July 2013

50.000 pages tested

Recently there where 50.000 pages tested at websitequality.zomdir.com. That's great. Even better is the distribution of the scores.

3% of the tested webpages got 1 star
35% of the tested webpages got 2 stars
40% of the tested webpages got 3 stars
18% of the tested webpages got 4 stars
4% of the tested webpages got 5 stars

I'm especially glad with the fact that only 4% of the tested webpages got 5 stars. This is important for me, because I use this technology to judge a webpage also for ZOMDir.com. When a webpage deserves 5 stars, that webpage will be highlighted with the word "Tip". 

These stats learns me that not every webpage will be highlighted. That's great, because otherwise the word "Tip" hasn't any value.

To be able to tune the algorithm "Website Quality at a Glance" was the first spin-off I created*. In the previous 2,5 years I have done once a minor tweak to get a slightly better distribution of the percentages. For this moment, I'm very statisfied with the results.

* At the time I created Website Quality at a Glance a lot of people thought that ZOMDir was only about testing other sites. It was difficult and fun to keep my lips sealed ;-)