The Limitations of Automated LinkCheckers

If you’re planning to test a content-rich website that includes numerous pages, images, and links between them, you might consider using an automated link checking tool. An automated link checker, such as Xenu LinkSleuth or Fast Link Checker, spiders your website starting, typically, from the home page and follows all links on your website to ensure that they return something other than web errors. An automated link checker probably will also check references to JavaScript, Custom Style Sheet, image, and other files your web pages reference. The link checker then provides a report that includes the names of pages, the links on the pages, whether the link successfully resolved, and perhaps more robust reporting options and details.

The automated link checker is but one tool in testing websites. You need to recognize some of the limitations as you choose an automated link checker or as you prepare to run one.

Basic Limitations
Your automated link checker probably has high-level limitations to consider. Most importantly, your link checker will lack basic common sense and context. At the root, it only checks to see if the web server returns something when the link checker sends a request. The link checker won’t know if the web server is returning the correct thing. For example, if your link text says ‘Home’, but the link’s target is ‘about.html’, as long as the ‘about.html’ exists, the link checker will consider that a success even though the link leads to the ‘About page’.

Secondly, your link checker cannot handle forms. Therefore, the link checker cannot check the following types of pages:

Search results

Thank you pages that display when the user submits a form.

You can get around this limitation by specifying the URL of these pages and running the link checker against them, but that does require extra effort on your part and it requires you to know the URLs of these pages, which takes some of the automated out of automated link checking.

Clickable Events
End users, particularly users who do not work in the computer industry and have only a working knowledge of the web, don’t know, really, what a link is. We, of course, recognize a link as a particular type of clickable event that takes the user to another page using an <a href=””> HTML tag. The user, though, thinks in terms of clickable events, places where the mouse cursor changes and something happens when she clicks the button. So I’m going to talk about clickable events instead of links for a while because the automated link checker might not adequately check some clickable events.

There are many intra-page types of clickable events that your automated link checker might not test. For example, slide show presentations where the user sees an image and then can click to see another image that is already loaded with the web page. As these are JavaScript functions (or similar), automated link checkers will not ensure that, when the user clicks 4, the fourth image displays. ript, some websites use JavaScript functions to open web pages instead of HTML links. You can find these sorts of links frequently in footers or leading to terms and conditions or contest rules. Instead of the browser’s status bar displaying a URL when you mouse over the link, you will see javascript:newWindow(‘/terms.aspx’) or something to that effect. How well will your link checker recognize those links? It depends on your link checker and your website’s JavaScript.

The more you know about how your automated link checker works and how your
website works, the better you can account for the gaps in the link checker and
test those areas manually.

Your automated link checker might not handle embedded media, such as Flash or Silverlight applications, correctly. Many websites, particularly websites targeted to consumers, use Flash extensively and do not have HTML equivalents. Automated link checkers might not handle those so well. If your site embeds objects or uses iframes, your link checker might not handle error codes for those objects. That is, if your site embeds a YouTube video, your link checker probably won’t flag a problem if YouTube returns an error message instead of your video, especially if it returns the error message within the context of the embedded object. If your website uses widgets or banner ads, you need to experiment with the link checker to see how it reacts and interacts with them.

If your web designers and producers are clever, they might use clicks to show and hide content within the web page. FAQs use this technique to expand or collapse answers when the user clicks the question. Other sites use this to display different panels of information, such as a tabbed view that shows ingredients, serving suggestions, or product sizes depending upon what the user clicks. An automated link checker probably won’t report to you whether that content displays correctly when the user clicks.

Aside from those non-HTML clickable events, you do need to consider two types of HTML anchors that your link checker might or might not.

(1.) Intrapage anchors that move the user down the page or, more often, back to the top. Does your link checker ensure that the target anchors exist within the page so that the user does go back to the top?

(2.) Another type of HTML anchor tag to consider is the <a href=”mailto:”> link. This link should spawn a new email window using the default mail program on the client machine with a To: address and perhaps a subject line filled out. Your automated link checker probably does not check these links and certainly cannot determine whether the values provided to the email client are correct. It cannot verify the email address is correct or that the words in the subject line are spelled correctly.

The Environment
Both the web server configuration and the location from which you run the automated link checker might yield false success results or otherwise impede your tests. For example, if your website uses content management system (CMS) software or search engine optimization techniques that alter the URLs for the same pages, your link checker can think the same piece of content is a new page each time it encounters the content. In this case, your link checker might go on forever. Additionally, if your site makes extensive use of parameters on the querystring (that is, the URL), you need to determine how your link checker will handle that.

If your website uses a custom error page, such as CustomError.aspx, instead of returning a 404 error, your link checker might not identify this as a broken link. Instead, the link checker might just think that all the broken links on your site are valid links to CustomError.aspx. You might be able to search your link checker’s report to find if any links resolve to CustomError. aspx, but you need to be aware of the possibility. Additionally, if all of the links on your site are valid, the link checker will not find CustomError.aspx on its own, so if that page contains links of its own, you’ll have to test it independently of a spidered web crawl.

If your website uses redirects extensively, you’ll have to investigate whether the link checker follows those redirects on its own.

If your website is large enough to warrant load balancing, you should run the link checker against all web servers by name. To accommodate heavy traffic loads, sometimes organizations copy exact duplicates of the website to multiple web servers. When a user visits the “website,” the load balancer points that user to an individual web server hosting that website. So if you run an automated link checker against a load-balanced website, you’re really only running the automated link check against a single web server and are not checking the others. If you think that your tech, development, or deployment team could not possibly fail to copy all of the files to all of the servers, well, you’re new here, aren’t you?

On the client side, if you run an automated link checker on a machine that has access to internal, testing, or staging versions of the website, any links to those internal environments will resolve and report success. Users outside your network would not have access to these internal web servers, so any URLs that point to http://staging1.mysite.com/about.html will show them as a 404. You can work around this by searching for the strings of your internal environments within the link checker’s report, but again you need to know the danger and to account for it.

Conclusion

Automated link checkers, in spite of their shortcomings, are handy tools to use when testing websites. When an automated link checker finds something missing, it does find some sort of problem. However, when the automated link checker does not find something missing that does not mean a problem does not exist. It only means that the automated link checker did not find it. The more you know about how your automated link checker works and how your website works, the better you can account for the gaps in the link checker and test those areas manually.

About the Author

Brian J. Noggle Brian J. Noggle has worked in quality assurance and technical writing for over a decade, working with a variety of software types and industries. He currently works as a freelance software testing consultant through is own company, Jeracor Group LLC and has recently published a novel set in the IT world, John Donnelly’s Gold.

The Limitations of Automated LinkCheckers

About the Author

Submit a Comment Cancel reply

Webinar

Recent Posts

Categories