The Future of Forensics
I have wanted to write this article for a couple years now. I have been talking about enterprise forensics problems and limitations for about 10 years or so, and the nature of the problems become increasingly relevant with each passing day.
The more I look at the “big data” problem, including the proliferation of terabyte and larger hard drives at the endpoint, massive centralized data repositories and the logistics of collecting and processing the data centrally - the more it becomes evident that the future of computer forensics must be based on four primary principles:
- Deduplication of data prior to searching, collection or other invasive functions
- Distributed processing at each endpoint with smart agents that are capable of decoding, decrypting, interpreting and searching all data types
- Cloud-based agent instruction and correlation of processed data indexes
- Centralized viewing of data and results from all distributed data sources
If you take a look at the ever expanding scope of data being created and disseminated, the challenge of using traditional, centralized computer forensic methods are unscalable. I have been doing enterprise forensics since 2003, a time when slow WAN speeds and 40GB hard drives were too much for the technology, and the user experience wasn’t very good.
Fast forward to today. It is true that WAN speeds have improved (which theoretically should help), but the size of hard drives and data repositories has grown even faster. In the end, the user experience of trying to view data over a WAN isn’t any better now than it was then. In fact, it is probably worse. You just can’t reasonably collect, process and view the quantity of data that exists in modern environments using the current centralized approach employed by most enterprise forensic tools. It times out and fails.
To combat this, there is a workaround (really more of a hack). You can deploy a forensic examiner (physical or virtual) to each physical location so you have LAN-based speeds to do your investigation. Or, you need to do some kind of remote/phone home collection to bring all of the data back to the central location and search it locally. But, this is not a reasonable approach for every situation, and it is almost always clunky and clumsy because it requires multiple attempts to collect data. This seldom results in a comprehensive data set as it generally precludes the ability to obtain data from small offices and remote users.
Considering the increased capabilities of modern endpoints, it is now reasonable to push more of the workload down to the local machine, sending only the metadata associated with the search results back to a centralized location. In this way, it could be centrally indexed, queried and viewed almost instantaneously from the central console in the cloud.
At this point in the conversation, the logical question is: If native data that exists only on the endpoint needs to be viewed, how do you get to it? The answer: It could be requested by the interface in the cloud and provided by the endpoints. The endpoints would need to check in at rapid intervals for instructions if their data was being reviewed, and the endpoints could serve up requested data to be viewed by the central console. Most live preview functionality works similar to this now, with the exception that console reaches out the endpoint, instead of the other way around. Either way will work if the endpoints check in frequently enough.
Considering my background in forensic software design, there is a temptation to turn this post into a series of feature requests and architecture diagrams, but I probably shouldn’t go any deeper than this. I am happy to work with any forensic software company and whiteboard out an architecture diagram and set of features for a system such as this. I would love to see someone make it and to be a part of it. It would make my life in enterprise forensics a lot easier.