There's Gold in Them Thar Metadata!!

  • Many electronic documents posted online contain metadata in some form.
  • The organization's username structure and deployed technologies can allow an attacker to guess passwords and successfully breach the perimeter.
  • This post explains how it works and offers advice on how to prevent it.


To put it simply, metadata is information created during document creation that describes the data. When first targeting an organization, an attacker will scour document metadata, as it may expose internal usernames, software versions, GPS data as well as the document creation date. While its purpose is benign in nature, for an attacker it can open the door to more successful attacks against the target organization.


Many electronic documents contain metadata in some form. For a penetration tester, it is often the organization's username structure and deployed technologies that are the most interesting.


For the purpose of this discussion, we will be focusing on username enumeration as it is paramount to conducting a successful password attack. When planning for a password attack, the most valuable piece of information is determining the target organization's username format. While there are common username formats (,, etc.), that may not always be the case. It is becoming more common for organizations to set user IDs to values that do not match employee email addresses. Doing so makes discovering internal username conventions more challenging. Often after performing the typical leaks harvest and statistically likely username activities, only a few valid users are found. This can be particularly concerning when the target organizations can have hundreds if not thousands of users. Thus, the journey to harvest the gold in a target organization's metadata begins.


During the discovery phase, login portals such as Outlook Web Application (OWA) are sometimes uncovered. Login portals are a prime target for a password guessing attack since access to these types of systems allow an attacker to access email and any sensitive information contained therein. In order to make sure this attack has the highest chance of success, the username format for the target organization must be determined.


Figure 1: OWA Login Page


The internal username convention of the target organization can often be found by browsing recent breach data containing possible email addresses. To validate usernames against an OWA portal, a Client Access Server (CAS) Timing Attack can be performed using tools such as the Metasploit Framework or Burp Suite. This vulnerability takes advantage of the way OWA responds to valid and invalid email addresses. The authentication response times contain noticeable deltas, allowing for the positive identification of valid and invalid user accounts. This sometimes results in only a few valid accounts, which can indicate that there may be another username convention in use.


Enter FOCA and Pymeta, document metadata gathering tools, to analyze files on the target domain. As shown below, using FOCA, document metadata can uncover the unusual format the target organization has deployed. These new usernames can be validated utilizing Metasploit to verify if they're valid.


Figure 2: FOCA Results for Word Document


Figure 3: Username Validation of Usernames Recovered Via FOCA


Next, recent breach information and sites such as LinkedIn can be used to gather names of employees and potential email addresses. Leveraging the collected data, a unique username list can be generated using the discovered format. Again, the CAS timing attack can be used to validate accounts prior to performing a password guessing campaign.


Figure 4: CAS Authentication Timing Attack – Final User List Enumeration


Finally, passwords such as Password1!, Summer2020!, Fall2020! and CompanyName2020 can be used for the initial automated authentication attack. It's important to limit the number of attempts made to avoid locking out accounts. To rapidly test the usernames against weak passwords, the owa_login module in the Metasploit Framework can be used to automate the attack process. In the test case, a valid login is shown below.


Figure 5: Successful Password-guessing Attempt


While existing and archived documents on the internet may be challenging to control, organizations should strive to implement Data Loss Prevention Tools (DLP) to scrub metadata from any document before it is posted online. Taking these steps will further hamper attackers and assist in protecting the organization from the next password attack.


So the next time you're on the warpath and the perimeter has you down, remember there's gold in them thar metadata.

Jeffery L. Wright
Manager, Demand and Delivery | Optiv
Jeffery Wright is a Demand and Delivery Manager in Optiv’s Threat Management practice. He has over twenty-five years of experience with a background in enterprise network administration, engineering, and security. Prior to his role in Demand and Delivery, he was a senior security consultant on Optiv's Attack and Pen team. His experience on both the offensive and defensive sides of security gives him a unique perspective when approaching customers' security questions.