ZOMDir > Blog

Monday, 30 July 2018

Linking out isn't about risk. It's about uncertainty.

Recently I wrote that you should check redirected links.

In The Ambler Warning from Robert Ludlum it is explained exactly why this is important.

From the previous blog, you might get the idea that it is an acceptable risk that one percent of the removed links, link out to hacked sites.

However Caston -my favorite character in The Ambler Warning- explains the difference between risk and uncertainty. Let me quote Caston.
'This isn't about risk. That's what people like you never understand. It's about uncertainty. You think you can assign a probability metric to future events like this. For technical reasons we do that all the time. But it's bullshit - nothing more than a convention, an accounting conceit. Risk suggests measurable probability. Uncertainty is when likelihood of future events is simple incalculable. Uncertainty is when you don't even know what you don't know. Uncertainty is humility in the presence of ignorance.'
Because I don't know how sites get hacked, how it will be done in the future -Will the hackers use artificial intelligence?-, -Are hacking instructions for sale?- I realise that the percentage of 1% in the previous link is indeed bullshit.

However the advice to check redirected links is still valid, although it was based on risk. I think the advice should be based on uncertainty.

Again, good luck with checking and repairing your broken and redirected links.
Hans

--
ZOMDir.com is a dynamic directory and a wiki
Everyone is able to add a link in 10 seconds
To learn more view this Slideshare presentation

Monday, 23 July 2018

Why you should check redirected links (and 5 facts regarding broken links)


Everyone will understand that broken external links will lower the user experience.

Not everyone will know that the percentage of broken links at your website is an indication for search engines how often you check for broken links. 


A lot of broken links? Poor maintenance!

The halftime of an external link is two year. Hence the number of broken links indicates what was the last time you checked and repaired broken links.

Repairing broken links at a regular base is the difference between a neglected site and a smooth website.

Redirected links are dangerous

Almost everyone thinks that redirected links won't do any serious harm. Believe me that is a huge mistake. Redirected links are dangerous.

Recently I analyzed the reason why links are automatically removed from ZOMDir.com (you will find the numbers below).

One of the main findings is that one percent of the removed links, link out to hacked sites

How to find links to hacked sites?

Almost the only way to discover links to hacked sites, is by following the redirected link. When you see an online shoe store where you expected information of a fitness center, you know the site is hacked. 

Of course there are a lot of redirected links. A lot of websites switch from http to https these days. Make live easier for yourself and link to the redirected site instead of the original site. Otherwise you have to check every redirected site again and again when you check and repair broken links.

Other findings

1. Half of the broken and redirected links could be fixed. That is the target website is still working fine, although the website has been rebuild. Depending on the quality of the rebuilding team you are redirected, get a neat 404 error or a brute server error; 

2. Ten percent of the broken links are due to programming errors. This vary from very slow loading webpages, to pages without any content, to incorrect database connections. So always check your entire site with a broken link checker when someone has changed something at your site;

3. Another ten percent of broken links are due to the classic 404 page not found error. Most of the time these error indicates that the website was rebuild, and the rebuilding team didn't redirect the old webpage; 

4. When a site ownes gives up a domain name, around 25% of old domains are parked by a domain broker;

5. 30% of the removed links are removed due to redirections. Nearly all redirected webpages redirect to a secure site.

Raw research results

Here are the raw results of my research.


Error codes broken links

ZOMDir has an inbuild linkchecker (similar to "Broken Links at a Glance") and ZOMDir keeps track when and why a link is removed.

I analyzed the last 5000 removed links. These links where removed from July 30, 2016 till July 16, 2018.  

On average every day 7 links are removed at ZOMDir.com.


Manual removed links

These links are removed by hand. Probably because these links were added to an inappropriate category.

Manual        408     8%


Server side errors

Error 500    2361    47%
Error 503     169     3%
Error 502      10     0%
Error 504       1     0%

Redirections

Error 301    1027    21%
Error 302     374     7%
Error 303      25     1%


Client side errors

Error 404     479    10%
Error 400      30     1%
Error 401      30     1%
Error 402      20     0%
Error 403      18     0%
Error 410      16     0%


Security errors

When ZOMDir checks broken links,  there is also a check if the page linked to is still safe to visit. When that's not the case you will get the 604 error.

Error 604      32     1%


Manual checked links

By hand the latest 564 removed links where checked. The formal error codes aren't that relevant now. I want to know indepth why a link was removed. Here are the findings:

Server not found          149  26%
Temporary error           110  20%
Redirected                101  18%
Page not found             76  13%
Domainparking              45   8% 
Programming error          32   6% 
Slow website (no response) 19   3% 
Stop of freehosting        15   3%
Site in maintenance mode   10   2% 
Hacked                      7   1%

Thanks, for your attention, happy linking and please check and repair your links at a regular base.
Hans

--
ZOMDir.com is a dynamic directory and a wiki
Everyone is able to add a link in 10 seconds
To learn more view this Slideshare presentation

Wednesday, 22 November 2017

How often should I check for broken links?

How often you should check broken links depends on the percentage of broken links which is acceptable for you. 

The lower the percentage broken links you allow, the more frequent you should check and repair broken links.

With the tool Maintenance Frequency at a Glance you are able to find out how often you should check for broken links.

This tool is based on research regarding the half time of broken links in a copy of the former Yahoo! directory.

Some findings are:
  • When 3% broken links is acceptable, you should check your site every 1 month.
  • When you check your site every 3 months, you might expect 8% broken links.
  • When you check your site every 6 months, you might expect 16% broken links.
  • When you check your site every year, you might expect 29% broken links.
  • When you check your site every 2 year, you might expect 50% broken links.
I think that you should check for broken links at least every 3 months although I often advice to check your site every month for broken links.

For relative small sites I advice Broken Links at a Glance. For larger sites I advice Xenu's Link Sleuth.

Happy broken link hunting,
Hans

--
ZOMDir.com is a dynamic directory and a wiki
Everyone is able to add a link in 10 seconds
To learn more view this Slideshare presentation

Saturday, 14 October 2017

Dead Link City - A comparison of 8 Free Online Link Checkers

The site DeadLinkCity.com is a test site used by DeadLinkChecker.com.

I used this site to improve the link checker Broken Links at a Glance. Now it as good as DeadLinkChecker or, in my humble opinion, even better.

Take a look at these results.

1. Broken Links at a Glance
Tests 86 unique links, finds 75 broken links

2. DeadLinkChecker
Tests 95 links, finds 74 broken links

The difference between these link checkers is a link to the page http://www.deadlinkcity.com/disallowed/disallowed.html

This page should be blocked by robots.txt but isn't due to malformed code.

Instead of:
User-agent: *
Disallow: /disallowed/disallowed.html

this robots.txt file contains the code:
User-agent: *
Disallow: disallowed/disallowed.html

The difference of one "/" is the difference between blocked or not. See for yourself what a difference a "/" makes with this Robots.txt Testing Tool.

You might also use this Robots.txt Test Tool to see live if this "disallowed" page is really disallowed.

So DeadLinkChecker interprets the robots.txt file of DeadLinkCity incorrect, at the moment of writing this blogpost.

3. Online Domain Tools
Tests 85 links, finds 68 broken links

This link checker misses the following broken links:
  • disallowed/disallowed.html
  • error-page.asp?e=401
  • images/missing-command-icon.jpg
  • images/missing-video-poster.jpg
  • missing-button-formaction.asp
  • missing-head-profile.txt
  • missing-html-manifest.txt

4. W3C Link checker
Tests 79 links, finds 68 broken links

This link checker misses partly the same broken links. The missed broken links are:
  • disallowed/disallowed.html
  • images/missing-input-src.jpg
  • missing-button-formaction.asp
  • missing-form-action.asp
  • missing-input-formaction.html
  • missing-object-classid.html 
  • missing-object-codebase.html 

5. Dr. LinkCheck
Tests 51 links, finds 46 broken links

6. BrokenLinkCheck
Tests 49 links, finds 43 broken links

7. Internet Marketing Ninjas
Tests 50 links, finds 39 broken links

8. WebToolHub
Tests 54 links, finds 9 broken links

Good luck with your links, choose your link checker tool wisely and don't forget to check your links at a regular base.

Hans

--
ZOMDir.com is a dynamic directory and a wiki
Everyone is able to add a link in 10 seconds
To learn more view this Slideshare presentation

Thursday, 12 October 2017

The half-life of a link is two year

The half-life of a link is two year. Better said, the half-life of an external link is two year. 

That is, when you create today a website with 100 working external links and checks your website after two year with a broken link checker, you will discover that rougly 50 links are broken.


How do you know?

I can almost hear you thinking "How do you know?". Well I will explain below.

In the past I have copied as much data as possible of the directory Yahoo! This is because Yahoo! stopped, I have created a directory myself and I wanted to analyse the links and structure of this famous directory.

At January 4, 2016 I analysed the data I have and concluded that 77% (or more exactly 76.8387682%) of the links are fine.

Recently (October, 9 2017) I analysed the data again. Now 42% (42.0219319%) of the links are fine.

Based on this data I concluded that on an average day 0,093670021% of external links will get broken. That does not seem much. However the linkrot percentage per month is 2.81%. 


Consequences

After a half year one sixth of the links are broken.
After a year 30% of the links are broken.
After two years 50% of the links are broken. Hence the half-life of a link is two year.

See also this graph below



So when you think 3% broken links is acceptable, then you should check for broken links every month.

When 5% is acceptable, check every two months and when you think 10% is acceptable, check every 4 months for broken links.

Tip: Use the tool Maintenance Frequency at a Glance to find your optimal maintenance frequency. 

Be wise, and check and repair your links at a regular base,
Hans

Update: After writing this blogpost  I discovered that in the document "A longitudinal study of Web pages continued: a consideration of document persistence" it is stated that the half-time of a random web page is about 2.0 years. Great that's exactly what I concluded.  

--
ZOMDir.com is a dynamic directory and a wiki
Everyone is able to add a link in 10 seconds

To learn more view this Slideshare presentation

Monday, 9 October 2017

How DeadLinkCity improved "Broken Links at a Glance"

DeadlinkCity is a test site of www.deadlinkchecker.com.

The site contains several types of broken links similar to httpstat.us/

At DeadLinkCity it is stated that:
(...) There are 74 known bad links in DeadLinkCity.com, and one additional link which should not be reported if the tool obeys robots.txt directives. (...) The perfect score is 74 - the closer the number of reported errors is to 74, the more accurate the tool is. (...)

When I tested DeadlinkCity with "Broken Links at a Glance", the link checker discovered 56 broken links. That is far from perfect. 

Quickly I discovered that "Broken Links at a Glanceonly checks links mentioned in the href and src attribute.

So I decided to modify "Broken Links at a Glance". Now it also checks links which might be mentioned in the attributes:
  • action
  • archive
  • background
  • cite
  • classid
  • codebase
  • data
  • formaction
  • icon
  • longdesc
  • manifest
  • poster
  • profile
  • srcset
  • usemap

This list is based on these overviews of HTML 4 attributes and HTML 5 attributes.

Retesting DeadlinkCity with "Broken Links at a Glance" gives the almost perfect score of 72 broken links found.

Nice but it seems that I still miss the 3 URL's mentioned in the CSS code. So I updated "Broken Links at a Glance" that it also checks for links in the CSS code. The tool now finds 75 broken links.

That's one to much.

According to DeadLinkCity disallowed/disallowed.html shouldn't be tested, because it was mentioned in their robots.txt file. 

However the robots tester I use is very strict. It does allow this link because it didn't start with a "/".

Hopefully "Broken Links at a Glance" will be added to the Comparison Table of DeadLinkCity soon (and their robots.txt file will be corrected).

Hans

--
ZOMDir.com is a dynamic directory and a wiki
Everyone is able to add a link in 10 seconds

To learn more view this Slideshare presentation

Monday, 18 September 2017

How to fix broken links

When I detect a redirected or broken link with Broken Links at a Glance I follow in general the following steps:

  • Follow the link to analyse it 
  • Try another URL of the same site 
  • Try to contact the website owner
  • Search for an alternative webpage 
  • Remove the link


1. Follow the link to analyse it

Always follow the link to see what happens. 

When you got a 404 Page not found you might go to the next step.

When you link to a website that doesn't exist anymore you might get the error "Server not found". 

If that's the case and the HTTP error code wasn't 503 assume this link is broken and go to step 4 or depending on the situation to step 5. 

2. Try another URL of the same site

When you thought you linked to the homepage of a website and you got a 404 error then it is often easy to navigate to the new homepage of that website.

Update your webpage by replacing the old link with the new address of the homepage.

When you linked to a specific page, it could be that that information is still available at another location. 

So you have to search at that site for the same information. 

When found, you should update your webpage by replacing the old link with the new address. 

When not found, take the next step.

3. Try to contact the website owner

As you probably have experienced, websites aren't as static as you want. However that doesn't always mean that the information is gone. 

Due to a website reorganisation the information you want to link to might be at another address. 

When you are not able to find it yourself, you might contact the website owner. Almost every website has a contact information page. 

If you can't find an e-mail address you might try the e-mail address info@websitename.com.

Make clear that you linked to a webpage with information regarding ... and ask what the new address is for this information because the webpage you linked to has vanished.  

4. Search for an alternative webpage

Linking out is good practice, so I prefer and advise you to keep linking. 

When all previous steps failed search for another webpage with the information you want to link. 

Simple use your favorite search engine and hunt for the information you want to link to. 

Often you will find an alternative. 

When found, update your webpage by replacing the old link with the new address of the alternative webpage found. 

Otherwise, you should remove the link as described in the next step.

5. Remove the link

Too bad, the webpage you linked to doesn't exist anymore, and you can't find an alternative webpage to link to.

When that's the case you should update the webpage where you linked from and remove the link completely. 

Mind that this might have the consequence that you should rewrite your text.



That said, the process of fixing broken links is relative straightforward.  

For the best results, check your links at a regular base. For example every month.

Finally I like to tell you something about redirected links and broken links. 


Why check redirected links?

Redirected links are often indicated by the HTTP status code 301

In general a HTTP status code which has the format 3XX indicates a redirected link.

Often people think -incorrectly- that a redirected link isn't a problem. Okay, sometimes it isn't a problem, but sometimes it is. 

To find out you have to click these links to see where they redirect to.


From http to https

When the redirect is logical, for example from http:// to https:// it is advised to update your webpage with the redirected link by removing the old location (http://...) with the new location (https://...). 

By doing this, the next time you check your website for broken links, you have to check fewer redirected links.


Another system

It might happen that the website you link is using now another content management system with the side effect that old pages are redirected. When the redirect is logical update your webpage and replace the old location with the new location. However when the redirect isn't logical, then consider this as a broken link.


For sale or sold

It might happen that the website you link to is gone and a domain name speculant redirects your page to a "domain for sale landing page". You should consider this as a broken link. 

It might happen that the website you link to is now owned by someone else who works in a complete different business. In that case you should consider this as a broken link.


Hyjacked

It also might happen that the website you link to is hyjacked and is now selling shoes instead of ... whatever you where linking to. Also in this case you should consider this as a broken link. You might consider to warn the original website owner by sending a mail to info@websitename.com. Inform what you have discovered and ask polite to inform you when the website is restored so you are able to restore your link.



What are broken links?

Broken links are often indicated by the HTTP status code 404

However other status codes also might indicate a broken link. When the status code starts has the format 4XX or 5XX the link is probably broken.

I have experienced that websites which respond with a 408 or a 500 HTTP status code still might work although the may be a little slow. 

When that's the case you have to decide for yourself if you consider this as a broken link or not. 

When you link to a small website which probably doesn't get much visitors it might occur that the first time someone visits that website (the broken link checker) the response is slow while at a second visit (you checking the links marked as broken) the response is reasonable.

A website which responds with a 503 HTTP status code is in maintenance mode. You might ignore this broken link for the moment, however you might assume that in a few days that website will work.

Hope this helps,
Hans

--
ZOMDir.com is a dynamic directory and a wiki
Everyone is able to add a link in 10 seconds

To learn more view this Slideshare presentation