Common Web Application Vulnerabilities - Part 6
November 03, 2014
In this series of posts, my colleagues and I will dig into some specific, common web application vulnerabilities we observe regularly while performing network and application pentests. The intention of this series is to further expand upon a lot of the great information that already exists on the topic while preemptively addressing common questions we receive from our customers.
Part 6: Directory Traversal
Directory traversal, also known as path traversal, is another vulnerability we sometimes run into during our Security Assessment engagements. A path traversal vulnerability allows attackers to access files and directories that were not intended to be exposed to users. Depending on the severity of the vulnerability, an attacker may be able to gain access to web application configuration files, operating system files, passwords and other forms of sensitive data.
It is a vulnerability that is fairly well known, and while many developers believe they have implemented secure coding mechanisms to prevent exploitation, these measures are often insufficient against a skilled attacker. For this reason, it is important to understand the underlying causes, risks and mitigation strategies for directory traversal vulnerabilities in order to be certain that your application is safe from this potentially devastating class of vulnerabilities.
Before we get into the details, I’d like to spend some time discussing what directory traversal vulnerabilities are and what allows them to occur. Directory traversal vulnerabilities are in a class of user-input verification flaws that occur when user-supplied input is not properly sanitized prior to being processed by the web application. The overwhelming majority of web application vulnerabilities fall into this categorization, primarily because it is a hard problem to solve in most applications. User input is needed for any meaningful web application to function, and every piece of user input must be treated as potentially malicious or there will be a risk of web vulnerabilities appearing.
Directory traversal relates to the “../” or “dot dot slash” operator that operating system file systems interpret as “the directory that contains the current directory.” For example, if the current directory on a Linux system is ‘/var/www’, the “../” operator will refer to the ‘/var/’ directory. On Windows, if the current directory is ‘C:\www\root\’ then the “../” operator will refer to the ‘C:\www\’ directory. Furthermore, the path ‘/var/www/../www’ will resolve to ‘/var/www’. Got it? Good. It’s important to understand because this is the fuel that drives these types of vulnerabilities and exploits.
In many applications, users are allowed to name files that are uploaded or retrieved by the web application. Think of the many different popular file management applications that exist. These applications allow users to upload and retrieve files of their choosing, often by manually entering the filename. This filename is then passed to the web application, which creates a local pathname to access this file. Web applications are executed from a directory often called a “content directory.” This is the directory that contains the majority of the files that the web application needs to function.
Imagine an application that allows a user to read log files. For the following examples, I’ll be using the following snippet of vulnerable code on a page called ‘log_viewer.php’ hosted on a Linux web server:
$logFile = $_GET['userProvidedLogfile'];
$logData = file_get_contents ('/var/www'.$logFile);
In plain English, this code:
- Reads in user-controlled variable ‘userProvidedLogifle’ and sets it to $logFile.
- Constructs a file path using the $logFile value appended to the web application’s content directory, ‘/var/www’.
- file_get_contents sends this newly constructed log file path to the underlying operating system and opens it for reading, then assigns the content of the log file to $logData.
- The log file content is then echoed, or printed, to the page.
A valid user would trigger this piece of code by issuing the following GET request through their browser:
The web application snippet takes the value ‘october20log.txt’ and appends it to the current content directory, making the full file path ‘/var/www/october20log.txt’, then echoes the october20log.txt file back out to the user after passing it to file_get_contents. This is expected output and is not malicious.
As long as the user provides a valid log file name the application functions as is expected. But what happens if this isn’t the case? What happens if the user is malicious? What happens if the user includes a directory traversal string? Let’s find out.
Imagine an attacker finds this page and decides to try to access areas of the web server that were not intended to be accessible. This attacker triggers this same piece of vulnerable code, but issues a malicious GET request through their browser:
What happens here? The page would return the value of the ‘/etc/passwd’ file of the underlying Linux-based system and expose the usernames of every registered user on the Linux operating system. Why? Let’s find out.
What happens when the attacker provides the value ‘/../../../../../etc/passwd’ as the ‘userProvidedLogfile’ variable? The application takes this value, then in the call to file_get_contents, appends it to ‘/var/www’, thus creating the value ‘/var/www/../../../../../etc/passwd’. Based on what we know about the “../” operator, this will create a directory that the web applications developers did not intend to allow access to. The “..” operators are resolved, ultimately resulting in the file path being ‘/etc/passwd’.
It’s important to note that both Windows and Linux do not care about redundant “..” operators once they have traversed to the root of the filesystem, which is normally C: in Windows and ‘/‘ in Linux. Because of this, it is possible to submit any number of “../” operators and still result in a valid request to the ‘/etc/passwd’ file.
Let’s see this type of attack in a real browser. In these examples, I will be using the bWAPP vulnerable web application from itsecgames that was purposefully created to be vulnerable.
As we can see, the above request in the browser is similar to our previous example. The ‘message.txt’ value is controlled by the user and is passed to a vulnerable php application. The contents of the ‘message.txt’ are included on the page. If we replace the value of the ‘page’ variable with something malicious, we can see the contents of the /etc/passwd file:
As you can see, this vulnerability has the potential to be devastating. A skilled attacker would be able to read any file that the web server software has access to. In some situations, malicious users are able to upload a file and, using directory traversal, add malicious scripts to a directory that has execution rights. This allows the attacker to execute malicious code on the server.
How do you detect these vulnerabilities?
At FishNet Security, we begin by finding any forms or fields that that are directly controlled by the user. We then perform input manipulation tests against these fields checking for SQL injection, Cross-Site Scripting and the other common web application vulnerabilities, including Directory Traversal. We pay special attention to fields or web application logic that perform language selection, allows users to upload or view files, or performs service or system monitoring. This type of functionality typically includes passing user input to the underlying filesystem APIs directly and may have directory traversal vulnerabilities. We also take special note if any user-supplied data field appears to be a file, such as “image.jpg” or “log.txt”.
A common test that we run is to test that the “../“ operator functions properly when passed. For instance, if the application takes in ‘logfile=docs/test.txt’, we can try to change this value to ‘logfile=docs/asdf/../test.txt’. If the application is vulnerable to directory traversal, this will result in the exact same output from the web application. This is because ‘docs/asdf/../test.txt’ resolves to ‘docs/test.txt’ after the directory traversal string is interpreted. If the string fails, it may be that the application has a blacklist of directory traversal strings. In this case, it is sometimes possible to bypass the blacklist by using URL or Unicode encoded versions of the directory traversal string, e.g., '%2e%2e%2f' instead of '../'. It is also a common test to insert the following two values into fields to see if operating system files are trivially available:
Linux systems: ../../../../etc/passwd
Windows systems: ../../../../windows/win.ini
In a situation where you have the source code of the application available, find any locations where files are retrieved from the operating system based on user input. If the user input is not being properly validated, it is likely that you have a directory traversal vulnerability of some sort.
How do you prevent these types of vulnerabilities?
Defending against directory traversal vulnerabilities uses the same type of mechanisms that are used to defend against other types of user-input sanitization vulnerabilities. Setting up or tuning a security event monitoring system is also worthwhile. If user input is detected that contains directory traversal strings in a field that normally should not include them, you should alert on it, block the request, and possibly ban the IP address, as this request is almost certainly malicious. This is a good reactionary step, but preventative measures should be taken as well. Ideally, a web application will be architected so that user input is never passed to filesystem APIs, but many web applications require this functionality to work correctly. In these cases, there are some methods that can be used to mitigate the risk of directory traversal but it is important that they are implemented correctly or they can be bypassed by a skilled attacker.
Blacklisting known bad input is one technique that can be performed. Checking for directory traversal strings in user input using regex or pattern matching is a rudimentary form of mitigation that can help detect and block simple malicious attacks, but like any blacklisting method, it runs the risk of being bypassed by a string that is not included. As noted above, a web application may filter any requests that include “..” but allow the encoded version '%2e%2e’. For this reason, it is often the case that blacklisting by itself is not sufficient to prevent a skilled attacker from exploiting directory traversal vulnerabilities.
Many language libraries include a function that checks the path to ensure that only allowed filetypes are passed to the filesystem API. In an image management application, for example, a request to ‘/var/www/images/new.jpg’ would succeed, but a request to ‘/var/www/images/../../../etc/passwd’ would fail this check, since ‘passwd’ does not end with a valid image format filetype. Unfortunately, this type of check is sometimes possible to bypass by appending a null byte character ‘%00’ in a way that the web application verifies the extension (e.g. ‘/var/www/images/../../../etc/passwd%00new.jpg’). The web application will check the path extension and verify that it is a .jpg file, bypassing the filter. When this path is passed to the filesystem, the null byte character effectively tells the filesystem to ignore anything that comes after it. When the path is resolved by the filesystem, it interprets the directory traversal vulns and transforms ‘/var/www/images/../../../etc/passwd%00new.jpg’ into ‘/etc/passwd’.
Some libraries also allow a check to determine if the beginning of the path being passed to the filesystem includes the correct content directory, such as ‘/var/www/’. This is obviously bypassable, as appending directory traversal strings like in ‘/var/www/../../etc/passwd’ will pass this filter.
Another solution is to use functions in most language libraries to check the path that was constructed using user input against the file system prior to retrieving the file. These functions pass the constructed file pathname to the filesystem and then return the canonicalized version of the pathname that will be used if the file is opened. For example, the php command realpath(‘/var/www/../../etc/passwd’) will return ‘/etc/passwd’. The web application should then validate that the file path is pointing to the expected directory and file. If it isn’t, deny the request and do not open the file. Some useful path canonicalization functions include:
GetFullPath() in ASP.NET
realpath() in PHP
getCanonicalPath() in Java
Directory traversal vulnerabilities are well known by developers but the methods for protecting against these types of attacks are sometimes implemented incorrectly or are missing corner cases that can be exploited by a skilled attacker. Review your code base to ensure that user input being passed to filesystem APIs is kept to a minimum, and in the cases where it is occurring, be sure to validate the input prior to processing.