Web applications are vulnerable right through the front door —by hackers who subvert data entry fields, hijack URLs and grab the file system. Here’s how to stop ’em. Enterprise software faces a wide variety of threats and security risks. But none of these threats is as serious as when an attacker goes straight for the heart and directly attacks your application code. By manipulating program input, attackers can often trick the server into revealing customer data, or allowing access to unauthorized files or execution of program code on the server itself. Indeed, insecure code is the source of countless intrusions.

The risks of insecure code are great for several reasons:

  • There are so many different ways to exploit insecure code.
  • There is no need to obtain a password because the code is already running in the context of an authenticated user.
  • The attacker gains access to anything that the Web application can access, which usually includes sensitive user data.
  • Most Web applications are not properly configured to detect and prevent these types of attacks.

We’ll examine some of those threats in more detail, and also discuss what you can do about them. We’ll use ASP.NET as the sample platform, but many of these techniques can be mapped to other Web application server platforms.

Handling Malicious Data

Before an attacker can exploit your application with malicious input, the attacker has to get the input to your code. And that is where Web developers have the advantage. By carefully identifying and controlling input, you can prevent the attacks before they ever get to the sensitive code.

Half the challenge of stopping malicious input is identifying the numerous ways your application accepts input. All attacks on the application itself are based on some form of manipulating the permitted user input. If you handle form input properly, you can eliminate most, if not all, of these vulnerabilities. Not only should you handle form input and query strings, but you must also consider any other data that an attacker can modify. Often overlooked are indirect sources of input and data that you might think an attacker cannot access.

Centralizing Code. Controlling variables at some point involves filtering or sanitizing the data in those variables. Rather than writing code for each time you accept user input, it is a good practice to centralize your filtering code.

As you build an application, use centralized filtering functions on every source of user input. Centralizing your code organizes your code and reduces complexity. It also reduces the attack surface by reducing the amount of code, and lets you make quick fixes to deal with future attacks as they surface.

Complexity is the enemy of security. By keeping your code organized and under control, you reduce the likelihood of application vulnerabilities. In general, reducing the code volume reduces bugs, while keeping your code simple and reusing code decreases the number of attack vectors in your code.

Also, centralized code allows you to easily adjust your filtering functions to address new attacks as security knowledge and research evolves.

Testing and Auditing. Due to the complexity and variety of application-level attacks, it is easy to overlook simple mistakes. You should always test your security code to verify that it in fact does what you expect. For example, one commercial Web application used a regular expression to restrict access to certain administration pages so that only users on the local system could browse those pages. To do this, it checked the client’s IP address against the regular expression “127.*.”

Since any IP address that begins with 127 refers to the local host, the programmer expected that this expression would properly restrict access. However, because the programmer did not use the ^ anchor to force matching from the beginning of the string, and because the .* portion of the expression means zero or more occurrences of any character, the regular expression in fact matches any IP address that contains 127 in any position, such as 192.168.1.127. It would not be difficult for an attacker to find an IP address with a 127 and completely bypass this restriction.

By building a proper audit plan and testing with different IP addresses, the programmer could have prevented this flaw.

Using Explicit References. Many programming languages allow programmers to take shortcuts to save typing by allowing certain implicit defaults. For example, if you do not provide a fully qualified path when accessing a file, the system assumes that the file is in the current working directory.

This is important when it comes to filtering user input because ASP.NET allows you to reference items in the Request object without explicitly naming a specific collection. For example, Request(“Password”) is the same as Request.Form(“Password”).

When you refer to the generic Request object, ASP.NET searches the QueryString, Form, Cookies, Client
Certificate and ServerVariables collections, in that order, to find a match. Therefore, by not explicitly stating the collection, you could inadvertently take input from the wrong source. The problem here is that QueryString is the first collection searched.

Constraining Input

The key to protecting your application from malicious data is to validate all user input. There are numerous actions in your code that user input may affect, and therefore many different techniques for validating this input. For example, those actions might access a database, read the file system, allow users to upload or save pages, or process a shopping cart.

Fortunately, there are many techniques your development team can use to counter these threats.

Bounds Checking. Bounds checking is a quick and easy way to prevent many application-level attacks. Check input values to be sure they comply with the expected data type, string length, string format, set of characters and range of values. ASP.NET provides several easy methods to check input data, including validator controls, type conversation and SqlParameters.

The most powerful of these validators is the RegularExpression Validator, which allows complex pattern matching to ensure that input falls within very specific parameters.

Other platforms will have other methods available for you to use; the concepts are essentially the same.

But it is important to note that although validator controls are powerful, they do have some limitations:

  • You can use them only to validate form controls.
  • You can validate form controls only when the page posts back to itself, not to another page.
  • They work only with server-side controls.
  • ASP.NET does not perform any validaion with validator controls before it fires the Page_Load event.
  • They tend to decentralize validation code, moving it to individual pages rather than having a centralized mechanism for input filtering.

Because validator controls focus exclusively on form input, it is easy to neglect filtering other forms of user input. To deal with these limitations, you will need to develop custom functions for validating other input. Nevertheless, because of their automated error messages and the addition of client-side code, you should still always use validator controls for form input.

Pattern Matching. The most common and effective method for addressing malicious input is to apply pattern matching through regular expressions. With pattern matching, you block input that contains specific malicious characters or permit only input that contains a set of known safe characters. Under most circumstances, the latter is the preferred method for checking input.

Because it is difficult to anticipate the numerous ways one could exploit your application, it is usually best to establish which characters you will allow and then block everything else. Using this method, however, does require some forethought. Users will quickly get frustrated if you do not allow certain characters, such as apostrophes or hyphens, in a ìlast nameî field.

But pattern matching is more than blocking and allowing individual characters. Some attacks might use no invalid characters but still be malicious. For example, consider a Web application that saves data to a file and selects the filename based on user input. To prevent directory traversal or file access attacks, you might allow users to input only alphanumeric data, which you can enforce with a regular expression.

But what happens if the user selects a filename using a reserved DOS device name such as COM1, PRN or NUL? Although these device names do not contain anything other than alphabetic characters, accessing these devices might cause a denial of service or facilitate some other kind of attack.

For some types of input, you should allow only known good data and then perform a follow-up check to make sure that input does not contain known bad data.

Sometimes you will allow only known good data, and other times you might filter out known bad data, but usually you should perform both checks.

Escaping Data. Sometimes you want users to be able to enter special characters without restrictions. But allowing these characters might expose your application to attacks such as SQL injection.

You might, for example, want to allow users to enter an apostrophe in their last name to allow for names such as O’Brien, but this character has special meaning in a SQL statement. Allowing this character might make the application vulnerable to SQL injection.

The fix is easy: Replace every single quote with two single quotes. This allows you to build SQL statements without having to worry about users passing in single quotes. By escaping (or quoting) the single quote character, it no longer has any special meaning to the SQL interpreter.

Data Reflecting

When Microsoft first released Windows 2000, security was a long-neglected issue that rapidly gained attention. Security researchers found numerous holes in the operating system, particularly in Microsoft’s Internet Information Services (IIS). Some of the most serious flaws allowed the viewing of protected files and traversing the file system to access files outside the Web content directories.

Security researchers found ways to fool IIS into thinking it was retrieving a file with a different extension or a file in the same directory when it was in fact pulling a file from a parent directory. While these techniques fooled IIS, the operating system itself used a different mechanism to access files and therefore accessed them correctly. By discovering subtle differences between how IIS interpreted file paths and how the OS interpreted file paths, researchers exposed some serious vulnerabilities.

Unauthorized File Access. One of the early vulnerabilities discovered in IIS 5 was the ability to view portions of server-side source code by simply appending the string “+.htr” to any URL. Instead of processing the server-side script normally, IIS would return the source code of the file itself, often revealing sensitive information such as database connection strings and passwords.

To exploit this vulnerability, an attacker could enter a URL such as this: www.example.com/global.asa+.htr

Normally IIS does not allow requests for files with .ASP or .ASA extensions, but adding the extra characters fooled IIS into thinking it was accessing a file with the .HTR extension. However, the ISAPI filter that handled .HTR extensions discarded the extra data and returned the contents of the file itself.

Microsoft quickly released a hotfix to address this vulnerability, but another researcher found that you could still fool IIS by simply adjusting the string to “%3F+.htr” like this: www.example.com/global.asa%3F+.htr

Once again, the server returned the source code for global.asa rather than blocking the request. Although Microsoft fixed the specific known vulnerability the first time around, it failed to address the underlying weakness that made it possible to fool IIS in the first place.

IIS was also vulnerable to various directory traversal vulnerabilities. In these, an attacker requests files outside the bounds of the Web application.

Normally, IIS will not allow requests outside the Web root, but by disguising the double dots (“..”) through encoding and other techniques, researchers found ways to trick IIS into thinking it was accessing a file within the Web root when it was in fact accessing a file in a parent directory. These turned out to be very serious vulnerabilities because they usually allowed attackers to execute commands and quickly gain control of the server.

Furthermore, many Internet worms such as Code Red and Nimda exploited these vulnerabilities to propagate themselves from server to server.

Reflecting the Data. To prevent directory traversal and server-side code access, developers usually check file extensions and watch for paths that contain double dots. However, this is not always effective because there are techniques, such as encoding, that attackers use to disguise these characters.

Rather than attempting to anticipate every way an attacker can fool your code, a more effective technique is data reflection. With this technique, you take the user input and pass it to a trusted system function. You then read back the system interpretation of that data and compare it to the user input.

The advantage of this technique is that because the operating system will ultimately decide which file to access, you have the system tell you which file it intends to access based on the given user input. You validate the path and use that same path when actually accessing the file.

Encoding Data

Sometimes hackers are not trying to break into your Web site but instead want to exploit your Web application to target other users or glean user data.

For example, an attacker may want to gain access to another user’s online bank account or personal e-mail. Using a technique called cross-site scripting (sometimes referred to as XSS), an attacker injects active HTML content into a Web page to exploit other users. This content may contain malicious HTML markup, including deceptive links, HTML form tags, client-side scripts and ActiveX components.

At the heart of this attack is the abuse of trust that results from the malicious content running on a trusted Web site. Attackers can exploit cross-site scripting vulnerabilities to carry out a large number of attacks, perhaps by stealing client cookies, gathering the user’s IP address, modifying the behavior of links or forms, or by redirecting users to an untrusted Web site.

Indeed, many developers underestimate the seriousness of cross-site scripting attacks.

Cross-site scripting vulnerabilities occur when a Web application dynamically displays HTML output to one user based on input from another user, such as displaying the unfiltered results of a guestbook or feedback system. Attackers can exploit this by injecting HTML tags that modify the behavior of the Web page.

For example, an attacker might inject JavaScript code that redirects a user to another site or steals a cookie that contains authentication information. Web-based e-mail services such as Hotmail have long been a target of cross-site scripting attacks because they display HTML content in e-mail messages. An attacker simply has to send the target a specially crafted e-mail to execute the attack.

For cross-site scripting to work, the attacker must send HTML markup through some form of input. This might include an HTML form, a cookie, a QueryString parameter, or even an HTTP header. For example, there are many login pages that pass error messages back to the user like this: www.example.com/login.aspx?err=Invalid+username+or+password

The page checks the Err parameter, and if it exists, displays the contents back to the user as an error message. If the page does not filter this input, an attacker might be able to inject malicious code.

Encapsulating. Sometimes you need to act on user input but you may not care about the actual value of the input. For example, you might want a unique identifier based on user input or want to store a value such as a password for later comparison. You can use a hash to encapsulate the data in a safe string format while still maintaining a link to the original data.

Good hashing functions have some properties that make them useful for encapsulating data. They produce long, random digests that make use of the entire key space. A well-designed hash produces few collisions; it would be extremely rare for two input strings to generate the same hash. They always produce digest strings of the same length. And you shouldn’t be able to reverse-engineer or otherwise derive the original data from a properly computed hash value.

With a hash you can neutralize any malicious content because the hash mangles the string into a safe format. You can then format the hash as a hex string for safe handling. If you hash a password before saving it, you never need to bother with checking it for invalid characters because the hash will not contain malicious content. This allows users to enter characters they want in a password without your having to worry about the impact of special characters in the string.

Another example is when you must create a file based on user input. Because any file operation based on user input could be dangerous, you might want to first convert the input to a safe hash string.

Hashes are also effective at disguising data to reduce vulnerability to guessing attacks.

One e-commerce application used temporary XML files for its shopping cart. The filenames were based on the user ID and the current date. However, a flaw in the application often left the files orphaned so they were not deleted at the end of the session, leaving a directory full of temporary files containing private user information that included customer credit card details. An attacker needed simply to employ smart guessing tactics to gain access to this information. Instead, using a filename based on a hash would make it unpredictable and would use a large enough key space to prevent guessing.

Parameterizing. Parameterizing is a technique in which you take user input and place it within a fixed context so that you can control the scope of access. Consider a Web application in which you access files based on a selected link. A link may take you to a URL such as this: www.example.org/articles.aspx?xml=/articles/a0318.xml

The first problem with this URL is that it is immediately apparent to attackers that you are accessing the file system. This might prompt them to experiment to find ways to break the application. What happens if you pass a filename with a different extension? Or what if you add additional path information?

To prevent abuse of your Web application, accept only the minimal amount of information required and insert this as a parameter to a full path. If the path and filename are fixed, a better version of the URL may be this:

www.example.org/articles.aspx?article¬ =a0318

Now take the /articles path and append the article parameter, followed by the .xml extension. Now, no matter what the user enters, it will start in the /articles path and have an .xml extension.

Parameterizing is not just for file access; it is an effective technique for limiting many types of attacks.

Double Decoding. Double decoding is a technique specifically designed to counter a type of encoding attack called double encoding. Vulnerability to this type of attack occurs because your application may decode an encoded string more than once from different areas of the application.

Attackers can take advantage of this by creating multiple layers of encoded strings, usually in a path or query string. In other words, you encode a string, and then encode that string again. This might allow an attacker to bypass pattern matching or other security checks in your application code.

Because it is difficult to anticipate a string being decoded twice in your application, a more effective strategy is to initially check user input for multiple layers of encoding.

By decoding a string twice, you can detect multiple layers of encoding, but what happens if someone uses more than two levels of decoding? How do you know how many times to decode to get to the final layer? Could someone cause a denial of service by encoding a string a hundred times?

The solution is that you only decode the string twice, comparing the first result with the second result. If these do not match, then you know that the string contains two or more levels of encoding and is likely not a valid request. If you encounter this, simply reject the request and return an error to the client.

Syntax Checking. After accepting user input and applying one or more of the techniques described in this chapter, you will eventually need to do something with the data. You may, for instance, build a SQL statement to look up account information based on a given username. You might use one or more techniques in this chapter to check user input, but before executing that SQL statement on the server, you might want to perform a final check to be sure that the SQL syntax follows the format you expect.

For example, you don’t want to send a SQL statement with multiple verbs such as two SELECT statements or a SELECT and a DELETE. Passing the final string through a pattern-matching function can be extremely effective in stopping attacks, albeit at the cost of some additional processing overhead.

Syntax checking serves as a last line of defense against those attacks that get past all your other filters. Examples of syntax checking are:

  • Ensuring that a shelled command does not contain piping, redirection, command-concatenation characters or carriage returns.
  • Ensuring that e-mail address strings contain only a single address.
  • Ensuring that file paths are relative to a Web content directory and do not contain drive designators, UNC paths, directory traversal characters or reserved DOS device names.

Exception Handling. Hackers donít exploit normal operations of your Web application; they go after the exceptions that you failed to anticipate.

Properly handling exceptions is a powerful defense in stopping a large percentage of Web application vulnerabilities. Although your code might fail to catch malicious user input, an exception handler might catch an error before an attacker can exploit it.

Exception handling is a long-standing best practice, but limited error-handling capabilities in classic ASP have resulted in many programmers failing to properly deal with exceptions. ASP.NET provides a much more robust error-handling system that you should take advantage of.

Exception handling is much more than handling errors. Some components do not raise an error but provide error information through events or properties. Furthermore, sometimes an error never occurs, but the results are not what you would expect.

For example, if you perform a database query to look up a particular userís record, you would expect only that record to be returned. If it returns more than one record, you have reason to be suspicious. You should always check results to be sure they are as you would expect them to be.

Think Like a Hacker

A common mistake Web developers make is assuming that server-side code is protected from intruders. Although it is meant to be protected, experience has shown us that this is not always the case.

You should work with the assumption that this code is not safe, and therefore take appropriate precautions with what you include in these files.

Server-side code is not an appropriate place to store secrets such as passwords, database connection strings or other sensitive information. Sometimes something as simple as a comment could reveal vital information for an intruder to further an attack.

Look at your server-side code from the hacker’s perspective to see what information might be a security risk.

This article was adapted from “Hacking the Code: ASP.NET Web Application Security” (Syngress, 2004), by Mark M. Burnett.

h3 SIDEBAR: COMMON THREATS

Here are some of the threats caused by poor input filtering:

SQL injection
Manipulating user input to construct SQL statements that execute on the database server.

Directory traversal
Accessing files outside the bounds of the Web application by manipulating input with directory traversal characters. This is also known as the double dot attack.

Server-side code access
Revealing the content of server-side code or configuration files by manipulating input to disguise the true file extension.

File system access
Manipulating input to read, write or delete protected files on disk.

Denial of service
Causing the application to consume system resources excessively or to stop functioning altogether.

Information leakage
Intentionally sending invalid input to produce error messages with information that may facilitate an attack.

Cross-site scripting
Injecting HTML or script commands, causing the Web application to attack other users.

Command injection
Injecting special shell metacharacters or otherwise manipulating input to cause the server to run code of the attacker’s choice.

Buffer overflows
Overwriting a buffer by sending more data than a buffer can handle, resulting in the application crashing or executing code of the attacker’s choice.

Despite the wide variety of input injection attacks, Web developers have one great advantage: These attacks are completely preventable through careful input filtering and smart coding practices.


About the Author

Mark Burnett