Web Application Logging
Standardizing application logging across the enterprise is an important, yet typically forsaken, task. Too often, the logging style varies from application to application, from developer to developer. This results in myriad log files in disparate locations, with no way to get a cohesive understanding of the overall performance of the applications across the enterprise. This post is designed to give a high-level overview for addressing application logging across the enterprise. A successful application logging program requires addressing three things:
- What to log
- Where to log
- What to do with logs
What to log
If you log every page access of every application the same way, you will quickly end up with an unmanageable mess that no one has the time to read or monitor. Additionally, you will likely be duplicating much of the data that is already being recorded in the web server’s access log. Instead, you should decide whether a certain functionality needs logging and how much it needs. This in part depends on the nature of and number of web applications in the organization, and on the scope of each application. An application handling sensitive data (e.g., financial or health information) has different logging needs than a 1-page “contact us” application. For example, any application that accesses protected information, whether covered by compliance or legislation or not, will need to create logs when a user authenticates, accesses protected information, or takes action involving protected information. A simple contact page may only need logging to determine trends or usage information. Ultimately, determining what to log should be decided at the enterprise level and documented in your organization’s coding standards. At a minimum, you should end up being able track the following types of events:
- Authentication attempts (failed and successful) and sessions (starting and terminating)
- Application errors and warnings
- Application error (i.e., HTTP status code 500) including details of the error (stack trace, line number, etc.)
- Security violations (e.g., attempts to access a resource without authorization, client-side controls are bypassed, the application detects invalid input, etc.)
- Total request time
- This can highlight both DoS/performance issues and some timing-based attacks like deep blind SQL Injection.
- Sensitive data access
- For data that should be controlled, log the user name, time accessed, the action performed (read, delete, etc.) and some sort of identifier for which sensitive data was accessed (e.g., account password changed, credit card number accessed).
- This logging should occur as far down the stack as is practical. For example, log that a user accessed Social Security numbers from the underlying web service or stored procedure that controls the access, not from the web page that displays the data. This facilitates:
- spotting an attacker who finds a way to access items in the database without using the intended web functionality
- logging access when a new front end (like a mobile application) is used with the same back end.
Also design your logs to have enough metadata to allow them to be categorized into event types effectively. This could be an “event type” column in a flat file or DB, or it could be simply logging the different types of events to different locations. Similarly, you will need to be able to differentiate the logs from different applications if they are all feeding into the same storage mechanism.
Where to log
If there are a large number of web applications, you probably don’t want all of those in the system (Windows or syslog) logs, though you would want it standardized. For example, for an organization with hundreds of applications of varying sizes, throwing the application logs of all of those into a centralized logging server or event log could create a lot of noise. However, any application that pertains to compliance or handles sensitive data (e.g., financial accounts, Social Security numbers, health information) should be logged to your central logging server and/or SIEM (which may or may not include a stop in the syslog or Windows event log) to ensure reliability and accessibility of the information. A SIEM is also more likely to be able to include rules to mask potential credit card and social security numbers to limit information leakage/PCI scope creep. (Sensitive data should not be included in logs anyway, but such rules help eliminate the occasional interloper.) Less sensitive applications can still log to your SIEM if you have the bandwidth, or they can log to a dedicated flat file or database.
For non-SIEM logging, flat files and databases each have their own strengths and weaknesses and neither is inherently right or wrong. Determining the best option for your organization requires evaluating several criteria: site traffic level, existing infrastructure, developer experience, etc. A database’s intrinsic ability to categorize, filter and search makes it a natural choice, but there are more setup and security implications. For teams needing less overhead, flat files may be the better option. If an application logs directly to the syslog or windows event log, care will need to be taken to ensure that the settings for event logs won’t allow logs to easily be overwritten or filled up. Data retention requirements may also be harder to guarantee for something like the windows event log.
Also consider who needs access to the logs. If the developers need the logs for debugging, would they have access to the SIEM or the server event logs? If managers need the logs for reporting, would they have access to the flat files (and be able to parse useful information from them)?
Regardless, for consistency, the entire thing should be abstracted out so the developers don’t have to remember what gets logged where or what the format is. If you decide to start logging a different way later, you don’t have to change all of your code, just a function or two. This should be reflected in the organizations coding standards as well; identify the data that should and should not be logged in the standard and require that logging be handled by the abstraction layer, rather than directly by the app.
Whatever logging method is used, ensure that it is not publicly accessible. Logging to a flat file in the webroot of your server is a very kind gift to give attackers.
What to do with logs
Regardless of whether you log to flat files, a database, or a SIEM, you need to have a system in place to monitor the logs. Admins do not have enough time to manually read logs to look for problems or compile statistics for trend analysis, so you’ll need some automation to help them. For example, a SIEM is designed to allow you to watch trends and set up alerts (e.g. too many failed logins means you’re under brute force attack). You should consider the ease of setting this up in a SIEM compared to the measures you would need to take for effective monitoring of flat files or databases. A log using flat files only may require extensive parsing and a custom alerting/trend system. A database used for logging could have a combination of triggers and scheduled jobs to handle alerts and trends. A hybrid approach, where databases and/or flat files feed into a SIEM for alerting and trend analysis, might be the best choice for many organizations, as it can combine flexibility of storage with mature reporting and alerting functionality. As a bonus, the extra abstraction will make it easier to switch components when necessary. For example, if you decide to switch SIEMs, you will still have existing log data from flat files and databases to load into the new system for testing and tuning.
Finally, make sure that the systems you use to store and view log items are themselves properly secured. If set up incorrectly, a database logging system might be vulnerable to SQL Injections, or a log viewer web application might be vulnerable to Cross Site Scripting. Be sure to employ the same security best practices on your logging infrastructure as you would on the applications themselves.