ZOMDir > Blog

Tuesday, 30 December 2014

Whichloadsfaster

In the past I used at a regular base whichloadsfaster.com. Recently I discovered that this website doesn't work any more. Luckily Ryan Witt, who has conceived and written whichloadsfaster, has stored the original code at GitHub. Besides that Ryan has stated at the about page that:
whichloadsfaster is open source, written in HTML and JavaScript and runs entirely on the client-side. You can host the files on your own site and tweak them to suit your own nefarious plans. We promise not to tell.
Although it is not the core business of ZOMDir.com, I created my own version of whichloadsfaster.

whichloadsfaster.zomdir.com is slightly different from the GitHub version.

  • The order of the menu items is more logical;
  • There is now a copyright statement according to the license;
  • Some texts are changed (mainly on the splash screen);
  • The stats (Google Analytics and Clicky) are removed;
  • The default behavior is now serial instead of parallel;
  • There is a favicon now.

The most important change however is the accuracy, now the duration is rounded at 100ms due to the fact that there was a huge deviation when you repeat tests for really fast webpages like example.zomdir.com and example.com.


Hope that this tool give you a clue about the performance of your website.

Hans

--
ZOMDir.com is a dynamic directory and a wiki
Everyone is able to add a link in 10 seconds

To learn more view this Slideshare presentation 




Monday, 8 December 2014

How is DMOZ doing?

Recently I analysed the directory Yahoo! and concluded that it is logical that there will be a shut down. Today I'm analysing DMOZ.org, the Open Directory Project.

Structure

Like Yahoo! the structure is a mix of language, location and subject. For example the Dutch page about hotels in Amsterdam has the following breadcrumb path: Top: World: Nederlands: Regionaal: Nederland: Noord-Holland: Gemeenten: Amsterdam: Reizen en Toerisme: Accommodatie: Hotels

While the English page about hotels in Amsterdam has the following breadcrumb path: Top: Regional: Europe: Netherlands: North Holland: Amsterdam: Travel and Tourism: Lodging: Hotels

When you select a category you will notice an horizontal rule between the categories. See also the example below. For me it is unclear why this rule is there. Like Yahoo! cross references are followed by the @ sign. However categories mention under "See also" don't have this @ sign. It is nice that there are links to the same category in other languages.


Description

At each page there is a description available about the current category and the sub categories. That's great. That way you (and the volunteers, the editors) know what to expect.

Editors

In principle each category has an editor. When you Google for: site:dmoz.org "category editor" you will find circa 4,800 results. However when you Google for site:dmoz.org "volunteer to edit this category" you will find circa 20,700 results.

That's strange because according to the homepage of DMOZ.org there are 1,023,591 categories and 89,957 editors. So I think it is better to take a sample myself.

When you take a look at the category Arts and open the sub categories (including the cross references) you will find that of the 63 categories there are 18 categories without an editor. So roughly 28% of the categories don't have an editor. Not bad when you realise that the ratio categories to editors is 11 to 1.

Quality of the links

I try to use the same samples as I did with Yahoo!

Sample 1. Top: Society: Genealogy: Heraldry
This page needs an editor. There are 17 links (although the header suggests that there are 24 links). All these links are working, although 1 site is in French and 1 site is in Polish.

Sample 2. Top: Reference: Museums: Arts and Entertainment
This page doesn't need an editor, however it is not clear who the editor is. There are 18 links. 17 links are working and there is 1 broken link.

Sample 3. Top: Society: Subcultures: Hip-Hop: Breakdancing 
This page needs an editor. There are 14 links. All links are working, however 1 link is to the official Website of James Madison University Strength & Conditioning.

Sample 4. Top: Science: Environment: Forests and Rainforests 
This page needs an editor. There are 58 links, of these links there are:

  • 53 working links
  • 3 broken links
  • 1 redirected link (which doesn't redirect automatically)
  • 1 duplicate link
Based on these findings I conclude that the quality of DMOZ.org is much better than Yahoo! 

It is clear to me that nearly 90,000 editors makes a difference. Well done DMOZ.

Hans

--
ZOMDir.com is a dynamic directory and a wiki
Everyone is able to add a link in 10 seconds
To learn more view this Slideshare presentation 












Monday, 17 November 2014

The Directory Yahoo!

A few days after the news that Yahoo! will stop with their directory I read a tweet that suggests to copy Yahoo! and make it available as a subdomain of ... another link directory. 

I wondered is that makes sense, so I decided to find out what the quality of Yahoo! is. Here are my findings.

Structure

Yahoo! has a structure based on subjects which is sometimes mixed with locations. The generic term used is categories. The mix of subjects and locations is available in two flavors. For example:

1. regional > countries >  belgium > [subjects | provinces | regions]
2. news and media > newspapers > by region > countries > belgium

In this structure there are many cross references. When you go to the category Arts you will find some categories ending with the @ sign. This is an indication that this subject belongs to a different category. See the example below:


I have tried to figure out how much unique categories there are. My estimation is that there are 55,000 to 75,000 categories.

Links per categorie

The number of links per category varies. There are a lot categories with a few links. For example the following category has only two links.

Regional > U.S. States > North Carolina > Cities > Durham > Education > College and University > Private > Duke University > Departments and Programs > Center for Documentary Studies

However there are also categories with nearly 1,000 links like the category:

Arts > Visual Arts > Painting > Artists > Personal Exhibits > Oil

To handle this large number of links Yahoo! used pagination. For the above mentioned category there are 49 pages with 20 links per page.

Due to this variation it is very difficult to estimate the total number of unique links on Yahoo! Therefor there is also a large variation in my estimation. I think that there are 1,000,000 to 3,000,000 links mentioned on Yahoo!

Yahoo is probably smaller than DMOZ

As a reference on the homepage of DMOZ.org is stated that there are 4,148,153 sites in 1,023,413 categories. That said, the conclusion is that Yahoo! is smaller than the open directory project.

Quality of the links

To get an idea of the quality of the links I made a few samples. 

Sample 1. Category: Arts > Humanities > History > Genealogy > Heraldry

In this category there are 13 sites mentioned. From these 13 sites there are:
  • 8 working links
  • 3 broken links
  • 2 links to a parking site

Sample 2. Category: Arts > Humanities > Literature > Museums

In this category there are 38 sites mentioned. From these 38 sites there are:
  • 33 working links
  • 3 broken links
  • 2 links to a parking site

Sample 3. Category: Arts > Performing Arts > Dance > Contemporary > Breakdance

In this category there are 11 sites mentioned. From these 11 sites there are:
    • 5 working links (although 2 sites stated that they are closed)
    • 5 broken links
    • 1 link to a parking site

    Sample 4. Category: Science > Ecology > Tropical Ecology

    In this category there are 14 sites mentioned. From these 14 sites there are:
      • 10 working links
      • 3 broken links
      • 1 link to a parking site
      It depends on the chosen category how high the percentage working links is, but it is clear that you will find broken links everywhere. 

      Based on these findings it is clear that the quality of Yahoo! has vanished. There are simply too many non-working links. 

      It is a pity, but it is logical that the Yahoo directory will shut down.

      Hans

      --
      ZOMDir.com is a dynamic directory and a wiki
      Everyone is able to add a link in 10 seconds
      To learn more view this Slideshare presentation



      Monday, 3 November 2014

      604, that's not an error

      The internet is often a dangerous place. As an innocent surfer you will never know for sure what the result of a click on a link will be. Even normally trusted sites might be infected by malware due to malvertising (see also anti-malvertising.com), bot attacks or other nasty tricks. 

      ZOMDir tries to minimize the probability that you visit a site that has wrong intentions. Therefor all links at ZOMDir are checked with Safebrowsing at a Glance.

      At a regular base there is a content check. When a linked page doesn't load, is redirected or is probably unsafe to visit then this link will be removed automatically. When this occurs you will find in the overview of removed links the corresponding error code. In this list you might find also status code 604.

      Status code 604 indicates that it is probably unsafe to follow the related link. The reason might be that link is marked as suspicious by Google Safebrowsing or that this link is marked as unreliable at the Web Of Trust.

      That's all I can tell you now about this status code.

      Hans

      --
      ZOMDir.com is a dynamic directory and a wiki
      Everyone is able to add a link in 10 seconds
      To learn more view this Slideshare presentation

      Monday, 27 October 2014

      Why are these bots visiting ZOMDir.com?

      Roughly a month ago I discovered that ZOMDir.com is visited by a botnet. In the past the number of link updates is almost the same as the number of new links. See this screenshot of the stats on April,1 2014:



      Recently I discovered by taking a look at the stats that the number of updated links (257,066) is much more than the number of new links (15,568).


      Immediately I thought of bots, so I created honeypots to trap them. When I trap a bot, I log this. When it is a new IP address I add it to the page Bots.


      So far I have collected more than 200 unique IP addresses.


      These bots are visiting ZOMDir regularly. To monitor this in more detail I have added some extra logging. Here are the results.



      Frequency

      The bots try to manipulate in salvo's of 5 attacks. That is in a few seconds time, from the same IP address, they tried 5 different texts.

      For example at these times (UTC) bot 37.187.144.114 visited ZOMDir.com:



      • 2014-10-25 18:26:10.183
      • 2014-10-25 18:26:11.600
      • 2014-10-25 18:26:12.328
      • 2014-10-25 18:26:13.306
      • 2014-10-25 18:26:13.814


      These salvo's where at a more or less regular interval.


      See these times rounded at minutes of the 10 most recent salvo's (at the moment I started writing this blog):



      • 2014-10-25 17:33 (62.210.78.209)
      • 2014-10-25 17:38 (62.210.122.209)
      • 2014-10-25 17:50 (62.210.167.213)
      • 2014-10-25 17:53 (94.23.251.211)
      • 2014-10-25 18:03 (37.187.144.114)
      • 2014-10-25 18:15 (62.210.152.149)
      • 2014-10-25 18:20 (94.23.251.211)
      • 2014-10-25 18:21 (62.210.78.209)
      • 2014-10-25 18:24 (37.59.32.148)
      • 2014-10-25 18:26 (37.187.144.114)

      There are roughly 10 to 12 salvo's per hour. In other words, more than 1000 times a day these bots tries to manipulate ZOMDir.


      Texts

      Often the bots try to change the description of the link. I guess that the reason is that this is a textarea. This description will be shown as the tooltip text of the link.

      Per salvo they try several texts. Examples of these texts are:



      Jonny was here http://www.ohword.com/chemistry-homework-help-online/ cheap college history papers He has not spoken to his son since the former National Security Agency contractor left the United States for Hong Kong before news broke in June of the disclosures he made about U.S. surveillance programs.


      I'd like to speak to someone about a mortgage <a href=" http://www.ohword.com/writing-a-research-proposal/#naturalists ">buy essay australia</a> Romney's critics scrutinized his investment record and often portrayed Bain as a corporate raider which profits at the expense of average Americans. They also combed through Bain's private equity portfolio to date to see how Romney benefits.


      Please call back later http://www.ohword.com/someone-to-do-school-work-for-you/ can money buy love essay Although Facebook unblocked the link to "Unstoppable," Cameron's film still remains blocked on YouTube. So, he has called on his fans to rally around his cause once again, according to his Facebook page.


      How do you know each other? <a href=" http://ziplinegear.biz/essay-writing-english/#repent ">buy college paperws</a> Part of the danger is how swimmers can disappear under the surface. Even in a clear pool, a swimmer&#8217;s movement can blur their presence. In a murky water pond, it&#8217;s even more dramatic. But the Wahooo system also helps lifeguards locate a downed swimmer, using a tracking device. Before, the best way to find a lost swimmer was to form a rescue line, sweeping the area step-by-step.


      What do you like doing in your spare time? http://www.cafsowrag4development.org/do-essay-writing-services-work/ bestessayservices "There was another case of PAM possibly connected with Willow Springs in 2010. Based on the occurrence of two cases of this rare infection in association with the same body of water and the unique features of the park, the ADH has asked the owner of Willow Springs to voluntarily close the water park to ensure the health and safety of the public."


      URL's

      A lot of different URL's are used in these texts. Examples of these URL's are:


      • http://ziplinegear.biz/writing-a-college-term-paper/
      • http://barcelonaconsensus.org/by-a-research-paper-cheap-for-jean-piaget/
      • http://ziplinegear.biz/essay-writing-service-recommendation/
      • http://weimar.edu/essay-writing-service-in-il/
      • http://www.jacquelot.com/money-manage#engaged
      • http://buffalonavalpark.org/calculate-interest-rate/#flowing
      • http://buffalonavalpark.org/calculate-interest-rate/#peaceful
      • http://www.adexsus.com/site2/?p=get-payday-loan-bad-credit#solid
      • http://corkfilmcentregallery.com/about-us/
      • http://www.cafsowrag4development.org/do-essay-writing-services-work/
      • http://www.ohword.com/someone-to-do-school-work-for-you/
      • http://www.pensionfreedom.ie/best-fast-loans-no-credit-check
      • http://bikinginbarcelona.net/cash-keywords/
      • http://lawmt.com/lowest-apr-loan/

      I have tested these URL's with Safe Browsing at a Glance and they are all probably safe. I was not that sure about that, so I started Sandboxie to take a look at some of these URL's.

      The results are strange. Half of these links are not working (404 error, database error, just an empty page or an incomplete not working form), and the other half seems to be random normal sites. So I wonder why are these bots doing this? Just trying, peeking and poking to see if something gets broken?


      If you have a better idea, or have a suggestion how to stop them, please let me know.


      Thanks,

      Hans

      --

      ZOMDir.com is a dynamic directory and a wiki
      Everone is able to add a link in 10 seconds
      To learn more view this Slideshare presentation

      Monday, 13 October 2014

      ZOMDir's Inception deck

      When you develop software with SCRUM it might be useful to create an Inception deck before the project starts. Consider it as a summary on one A4 sheet of your new software project.

      Unfortunately I didn't knew the Inception deck when I started the ZOMDir project. Bearing in mind the saying "Better late than never" I decided to make an Inception deck anyway. I can't say that the created Inception deck produced new insights. However it gives a nice summary of the ZOMDir project.



      Good luck with the preparation of your project,
      Hans

      --
      ZOMDir.com is a dynamic directory and a wiki
      Everone is able to add a link in 10 seconds
      To learn more view this Slideshare presentation

      Monday, 23 June 2014

      "IF THEN" but what "ELSE"

      Recently I discovered a very stupid bug in the code of ZOMDir.com

      The code was used to set a variable and was something like this:

      if parentStr == "":
        newParentStr = newStr
      else:
        # ParentStr isn't empty
        if newStr not in parentStr:
          newParentStr = addStrToParent(newStr, parentStr)

      A better look at the code should have learned me that there is something wrong. The second if-statement doesn't have an else statement.

      Oops in rare occassions the variable newParentStr isn't set and an Error is raised. 

      Luckily I was the first one to find out. Even better, the solution is very simple. The second test isn't necessary at all. It was only required for some early testing. So the code is now:

      if parentStr == "":
        newParentStr = newStr
      else:
        newParentStr = addStrToParent(newStr, parentStr)

      This is a better, isn't it? 

      Hans

      --
      ZOMDir.com is a dynamic directory and a wiki
      Everone is able to add a link in 10 seconds
      To learn more view this Slideshare presentation

      Monday, 16 June 2014

      Bulk PageRank Checkers compared

      Update March 2016. Pagerank checkers aren't relevant anymore. Google has removed the public Pagerank.

      A long time ago a wrote about Pagerank checkers. Recently a Dutch SEO specialist became enthusiastic about PageRank at a Glance. So I thought it is time to take a look at the currently available Pagerank checkers.

      In 2011 I didn't found bulk Pagerank checkers at all. Nowadays I have found the following 7 bulk Pagerank checkers:

      The differences between these Pagerank checkers are relative small. My findings are:
      From this three checkers, you have to try for yourself which Pagerank checker you prefer. It will not wonder you. I still prefer PageRank at a Glance.

      --
      ZOMDir.com is a dynamic directory and a wiki
      Everone is able to add a link in 10 seconds
      To learn more view this Slideshare presentation

      Monday, 24 February 2014

      There is a hole in this blog

      Today I discovered that there is a hole in this blog. Between 26 february 2012 and 26 july 2014 I have posted nothing. There was a reason for that. I was working on the real site ZOMDir.com and was afraid that someone will pick-up my ideas and realise it earlier than I was able to do it.

      Now I think that it is a silly idea to keep your mouth shut for 17 months. So I want to catch up and tell a lot more about the ZOMDir project.



      The first idea

      The first idea was formed in the summer of 2010. I have found some sketches I have made that summer and scanned it. Here is the first one.


      The first thoughts about what is now ZOMDir.com

      From this sketch it is very clear that I believed that it should be possible to create a better directory than DMOZ.org. At the upper left corner you will see that at that moment in time there where 17 campings in the Dutch province Zeeland. At the same moment I have logged around 70 campings in Zeeuws-Vlaanderen (which is a part of Zeeland). See this page and this page


      At the upper centre you see that I wanted to use the database of Google App Engine.


      At the upper right you see the initial goal. I wanted to create the Dutch DMOZ/Yahoo!


      At the middle right you see that I want to use the salamander as logo, and some brainstorming about names. Recently I checked salamandir.com and mandir.com (formerly the names I loved the most). Both are still for sale. Salamandir.com could nowadays be yours for $1795. 


      At the lower right you see the initial structure of the URL and a note that I want to show Google Ads when there are at least 20 links at a page.


      At the lower centre you see that I believed that the optimum number of links on a page is 7 till 40 and that I want to use social media to let the world know that a link is added.


      At the lower left you will see all the stuff that I wanted to collect of a website. I wanted to warn the owner of a site by mail when the link was changed. You see also that people should be able to judge a link.


      At the middle left you will see that it should be a smart Wiki and that there shouldn't be a (human) moderator. 


      In the middle you will see the initial title wjijw.nl. As far as I can remember it was a short way of saying "What you want". It was the initial title because "wjijw" is a palindrome. Due to the fact it wasn't international enough it didn't make it.



      The results

      A lot of the first ideas are realised. 
      • I think that all the Dutch campings are linked at ZOMDir.com. In Cadzand (a village in Zeeland) there are 12 campings nowadays. See: Campings, Cadzand;
      • I have used Google App Engine and the associated datastore; 
      • The initial goal is the same. I still want to gain ground in the Netherlands first;
      • The salamander is indeed the logo;
      • The structure of the URL is almost the same as initially conceived; 
      • Currently there are no ads at all, ZOMDir is programmed that efficient that it isn't necessary yet;
      • I used to tweet the results of Website Quality at a Glance, however at a given moment of time something changed and the tweet functionality was broken. See https://twitter.com/wqaag, so I didn't thought of tweeting from ZOMDir.com itself;
      • I collect as less information of a website as possible, the URL, a text and a description is enough;
      • ZOMDir.com is indeed a smart Wiki. Links are added or refused without a human moderator;
      • The owner of a website will not be warned by e-mail when something changes, instead a subscription to a feed is possible.
      So for me it was very helpful to trust my thoughts to the paper. It took some effort, but the result is there.

      Wednesday, 19 February 2014

      About humans and bots

      ZOMDir is a human edited directory. Everyone is able to edit almost everything. Texts, subjects, locations and of course links. As a consequence everyone is also able to remove a link. This is very powerful in the flight against linkrot.

      Due that linkrot is a huge problem there is an automatic link removal tool. Once in a while links are tested. When the linked webpage doesn't response normally, the link will be removed directly.


      Such a tool seems to be necessary. For example DMOZ.org has pmoz.info and Robozilla to fight linkrot. 


      Recently I learned that Wikipedia use bots to fight vandalism. For ZOMDir that isn't necessary yet, however I will add an anti vandalism bot on the product backlog with for the moment a very low priority.


      Wednesday, 12 February 2014

      Linking is the essence of the Web

      Recently I read an article of Gerry McGovern called "Content Paupers". In this article Gerry used the phrase: "Linking is the essence of the Web". My first thoughts were: 
      1. "Indeed", 
      2. "That is where ZOMDir is about" and 
      3. "I should ask Gerry if it is ok to quote him". 
      So I asked if it is ok to quote him, and that was fine. So now you will find on English and Dutch pages without any link this image:

      I hope it is an extra stimulans to add a link to the directory.


      Monday, 27 January 2014

      ZOMDir on YouTube

      For the ZOMDir project I have created some movies about adding and removing a link. These movies are:
      1. Add a link in a blink
      2. Remove a link in a blink
      3. Drag and drop to add a link
      4. Drag and drop to remove a link
      Have fun.

      Disclaimer: I made this short movies at the moment I thought that the site was stable, however currently the site might look slightly different. The functionality is still the same.



      Friday, 17 January 2014

      The results of optimizing the performance

      The result of minimizing of the number of requests by ZOMDir.com is huge. I have minimized the number of images by using CSS sprites and I have merged all CSS code in one file.

      According to tools.pingdom.com there where in the past for a typical page 14 requests:

      • 7 images
      • 4 css files
      • 2 html file (due to an Ajax call there is an extra html file)
      • 1 javascript file
      After that I have optimized the code there are 7 requests:
      • 4 images
      • 1 css file
      • 1 html file
      • 1 javascript file
      With GTmetrix.com I have compared the old and the new version. The old version has:
      The new version has:
      • Page Speed Grade: A (99%)
      • YSlow Grade: A (97%)
      • Total page size: 34.9KB
      According to whichloadsfaster.zomdir.com the new version is 2.3x faster (the average over 100 runs). The average loadtime of the new version is 670ms instead of 1511ms for the old version.



      A faster website with CSS media queries

      Recently I have been working on the performance of ZOMDir.com. This time I wanted reduce the number of http requests by reducing the number of CSS files. This can be done by using CSS media queries.

      Previously I used 4 CSS files. 

      1. s.css for normal and large computer screens;
      2. m.css for mobiles, and handhelds;
      3. t.css for the television;
      4. p.css for the printer.
      With media queries I only have one CSS file, named c.css

      I have experimented with several Methods of choosing a responsive design on several devices (at the moment of writing this blog, I realise I didn't have tested the pages on a television). The result so far is that the best option is: Basic media selectors like screen, handheld, print and tv with css queries in one single css file.

      Note: This is true for all current browsers. In particular older versions of Internet Explorer do not support Media queries



      Viewport

      When I was coding the above mentioned experimental pages, I almost forgot to add the code:

      <meta name="viewport" content="width=device-width" />
        
      With this tag you tell the browser what the width is of your website, so it is essential for a responsive webpage.

      CSS

      The basic code of the c.css file is:

      @media screen and (min-width: 731px), projector and (min-width: 731px) {

        /* A normal (large) screen, width > 730px    */
        /* Hide code not intended for large screens  */
        .mo,.po,.to{display:none}
      }

      @media handheld, screen and (max-width: 730px), projector and (max-width: 730px) {

        /* A mobil device or handheld, width <= 730px */
        /* Hide code not intended for small screena   */
        .so,.po,.to{display:none}
      }

      @media tv {

        /* A television                               */
        /* Hide code not intended for televisions     */
        .mo,.po,.so{display:none}
      }

      @media print {

        /* A printer                                  */
        /* Hide code not intended to be printed       */
        .mo,.so,.to{display:none}

      }

      Monday, 6 January 2014

      A faster website with CSS sprites

      The last few days I've been working on the performance of ZOMDir.com. That was necessary due to the fact that some extra functionality slowed down the site a little.


      CSS Sprites

      To guarantee a good user experience it is necessary to have a quick responding website. I have done almost everything to speed up the site, but I've never used CSS sprites.

      First I created a sprite with instantsprite.com and experimented with jsfiddle.net to get some experience. 



      Pros and cons of CSS Sprites

      In practice it was not possible for me to sprite all images used. Soon I concluded that:
      1. I was not able to repeat a CSS sprite image;
      2. You can't scale a CSS sprite image (100% for the desktop version, 50% for the mobile version of the site);
      3. The CSS sprite image is not shown when the page is printed.
      Therefore I did not sprite all images. Nevertheless I was able to improve the performance. The new version of the website loads circa 12.5% faster than the old version.

      However there are some drawbacks. With CSS sprites the user is not able to save the image. This implies that these sprited images will not be indexed by Google. Probably this has also consequences for the partially sighted. A sprited image doesn't have an "Alt" text.

      An alternative, to handle this latest drawback, is to use code like this:


      <div class="sprite iconname" title="This text will be shown in the tooltop"><span class="displaynone">Alternate text describing the image you will see</span></div>



      Instead of this standard image code:

      <img src="images\iconname.png" title="This text will be shown in the tooltop" alt="Alternate text describing the image you will see" />




      Friday, 3 January 2014

      10.000 links

      Today, there are 10.000 links on ZOMDir.com. This fantistic number is reached in 212 days. The avarage number of links added per day is 47.


      See the stats for the current situation.