Automating Captcha Attacks

Captchas: the go-to solution to keeping bots away from sensitive forms. The problem is that they often don’t work.

 

It’s not uncommon for applications to protect sensitive forms exposed to unauthenticated users by showing an image of text, usually with extra lines through the writing, some letters blown up large and others quite small and any number of other distortions applied. Ideally, these should be easy for humans to solve, but not computers. Unfortunately, this has proven to be very difficult to achieve, and many of these captchas are now difficult for humans to read. Other solutions, such as reCaptcha, work by requiring people to select from various images using contextual knowledge.

 

Captcha bypasses are not new, but some applications still rely on them as a primary defense against automated attacks. When used as a defense-in-depth control to supplement other security measures, they can provide significant protection. When used alone, they can frequently be defeated which allows attackers to target sensitive application functionality or data.

 

During a recent mobile application assessment, I found a login form which protected sensitive user data - debit card numbers - from enumeration by attackers with a captcha. Unfortunately for the app, this captcha turned out not to be as strong as the developers hoped and I was able to defeat it with a few Python modules and a free Optical Character Recognition (OCR) program. The scripts below require only the following:

 

  • Python, and the following Python modules:
    • Pillow
    • Numpy
    • pytesseract
  • Tesseract OCR

 

This blog post demonstrates how I developed a script to solve the captchas I encountered. The final script is not the most robust captcha solver, but it was sufficient to attack this application. More importantly, it shows the process I used to attack this captcha, highlights some of the difficulties image captchas face and shows why they should not be considered a primary security control.

 

 

What’s unique about my text?

The first thing we should take a look at is what is it about the text that’s different from the rest of the image. The captchas below were taken from a mobile app during a real assessment. Defeating them was all that stood between me and automated harvesting of debit card numbers.

 

Image
captcha_img1

 

The text we want to recover is red, which doesn’t match the two lines meant to obscure the text. One of the first things we might try is to split these images up by their color channels: red, green and blue.

 

We can actually do this quite easily with a Python script, using the libraries above. He’s the one I put together to try and achieve this:

 

from PIL import Image

im = Image.open('captcha.png')
(red, blue, green) = im.split()

red.save('red.png')
green.save('green.png')
blue.save('blue.png')

 

This produces the following images:

 

Red
Image
captcha_imgset 2a
Green
Image
captcha_imgset 2b
Blue
Image
captcha_imgset 2c

 

At first glance, it might seem surprising that the text is completely absent in the red channel. However, in the RGB colorspace, the color white is represented by setting the red, green and blue channels to their maximum value. As a result, both white and red have the same value for the red channel.

 

Perhaps there are other colorspaces in which this red text is more recoverable? Pillow allows us to do a lot of conversions between color spaces. The Image.convert method can convert between some of the most common colorspaces, but we can perform some less common ones with only a little more effort:

 

from PIL import Image, ImageCms

im = Image.open('captcha.png')
(red, blue, green) = im.split()

red.save('red.png')
green.save('green.png')
blue.save('blue.png')

hsv = im.convert('HSV')
(hue, sat, val) = hsv.split()
hue.save('hue.png')
sat.save('sat.png')
val.save('val.png')

cmyk = im.convert('CMYK')
(cyan, magenta, yellow) = im.split()
cyan.save('cyan.png')
magenta.save('magenta.png')
yellow.save('yellow.png')

# Convert to L*a*b colorspace is more complex:
rgb = ImageCms.createProfile(colorSpace='sRGB')
lab = ImageCms.createProfile(colorSpace='LAB')
transform = ImageCms.buildTransform(inputProfile=rgb, outputProfile=lab, inMode='RGB', outMode='LAB')
lab_im = ImageCms.applyTransform(im=im, transform=transform)
l, a, b = lab_im.split()
l.save("l.png")
a.save("a.png")
b.save("b.png")

 

Hue
Image
captcha_imgset 3a
Saturation
Image
captcha_imgset 3b
Value
Image
captcha_imgset 3c
Cyan
Image
captcha_imgset 3d
Magenta
Image
captcha_imgset 3e
Yellow
Image
captcha_imgset 3f
L
Image
captcha_imgset 3g
a
Image
captcha_imgset 3h
b
Image
captcha_imgset 3i

 

Two of the most promising images are the “Yellow” channel image and the “a” channel. During the assessment, I chose to use the “a” channel. The strategy below could likely be applied to the yellow channel with similar results.

 

 

Removing Distractions

To “fix” this obscured image, we’re going to need to first identify a strategy that retains the text and discards the lines. But before we dive into this problem, we need to start thinking about our OCR library. In particular, Tesseract likes images with at least 300 dots per inch (DPI). These images were meant to be immediately displayed in a mobile application, so their DPI is a little dubious. If the data DPI isn’t encoded into the image metadata, we’re going to make a conservative assumption that it’s 72 DPI and scale up. The method below implements this:

 

def rescale(im):
# Assume 72 DPI if we don't know, as this is
# one of the lowest common DPI values.
try:
dpi = im.info['dpi'][0]
except KeyError:
dpi = 72

target_dpi = 300
factor = target_dpi / dpi

return ImageOps.scale(im, factor)

 

Now we’re ready to return to the problem of removing the non-text lines. One of the simplest ways to do this is thresholding: If a pixel is “bright” enough (in the “a” channel) we’ll set it to full white. If it’s less than our threshold, we’ll set it to black. Within the images, pixel values range from 0 to 255. I initially tried a few threshold values starting with 128, but found a value of 180 worked best:

 

from PIL import Image, ImageCms, ImageOps
import numpy as np

def rescale(im):
# Assume 72 DPI if we don't know, as this is
# one of the lowest common DPI values.
try:
dpi = im.info['dpi'][0]
except KeyError:
dpi = 72

target_dpi = 300
factor = target_dpi / dpi

return ImageOps.scale(im, factor)


im = Image.open('captcha.png')

# Convert to L*a*b colorspace is more complex:
rgb = ImageCms.createProfile(colorSpace='sRGB')
lab = ImageCms.createProfile(colorSpace='LAB')
transform = ImageCms.buildTransform(inputProfile=rgb, outputProfile=lab, inMode='RGB', outMode='LAB')
lab_im = ImageCms.applyTransform(im=im, transform=transform)

lab_im = rescale(lab_im)
l, a, b = lab_im.split()

# Convert to numpy array and apply the threshold to remove lines
np_a = np.array(a)

threshold = 180
np_a[np_a threshold] = 0
np_a[np_a > threshold] = 255

# Invert the image: we want black text on a white background
np_a = 255 - np_a

a = Image.fromarray(np_a)
a.save('a.png')

 

The code above produces the following image:

 

Image
captcha_img4

 

We’ve successfully removed the obscuring lines and retained the text, but we’re missing some pieces of the text image. One way we might recover this missing data is by “expanding” the dark pixels. With Pillow, we can do this with a MinFilter. (Black has a value of 0, so expanding the black area means causing more pixels to become 0.) How much to expand the text was difficult to figure out in advance; by trial and error, 11 pixels did a decent job of closing up some of those missing pixel gaps:

 

Image
captcha_img5

 

Next, we’re going to apply a MaxFilter to “contract” the text. Because the text is so thick on some images, letters ran together. The MaxFilter attempts to help fix this. Here’s the result:

 

Image
captcha_img6

 

Here’s our new code after adding the filters:

 

from PIL import Image, ImageCms, ImageOps, ImageFilter
import numpy as np

def rescale(im):
# Assume 72 DPI if we don't know, as this is
# one of the lowest common DPI values.
try:
dpi = im.info['dpi'][0]
except KeyError:
dpi = 72

target_dpi = 300
factor = target_dpi / dpi

return ImageOps.scale(im, factor)

im = Image.open('captcha.png')


# Convert to L*a*b colorspace is more complex:
rgb = ImageCms.createProfile(colorSpace='sRGB')
lab = ImageCms.createProfile(colorSpace='LAB')
transform = ImageCms.buildTransform(inputProfile=rgb, outputProfile=lab, inMode='RGB', outMode='LAB')
lab_im = ImageCms.applyTransform(im=im, transform=transform)

lab_im = rescale(lab_im)
l, a, b = lab_im.split()

# Convert to numpy array and apply the threshold to remove lines
np_a = np.array(a)

threshold = 180
np_a[np_a threshold] = 0
np_a[np_a > threshold] = 255

# Invert the image: we want black text on a white background
np_a = 255 - np_a

a = Image.fromarray(np_a)

# Expand image to close up "gaps" in letters, shrink to
# stop letters running together
a_filtered = a.filter(ImageFilter.MinFilter(11))
a_filtered = a_filtered.filter(ImageFilter.MaxFilter(5))
a_filtered.save('a-filtered.png')

 

 

OCR: solving the captcha

Finally, we need to actually run the image through Tesseract to get the text back. The Tesseract documentation indicates that it works best when images have a border, so we’ll have to add that to our image. This, too, is pretty straightforward with Pillow. We’ll add the function below, with the understanding that this method will crash if its border_size parameter is odd:

 

def border(im, border_size=4):
im = ImageOps.expand(im, border=int(border_size/2), fill='white')
im = ImageOps.expand(im, border=int(border_size/2), fill='black')

return im

 

With the pytesseract module, this is actually quite straightforward. This module works by starting the tesseract program and passing our image data. Here’s the new code:

 

from PIL import Image, ImageCms, ImageOps, ImageFilter
import pytesseract
import numpy as np

def rescale(im):
# Assume 72 DPI if we don't know, as this is
# one of the lowest common DPI values.
try:
dpi = im.info['dpi'][0]
except KeyError:
dpi = 72

target_dpi = 300
factor = target_dpi / dpi

return ImageOps.scale(im, factor)


im = Image.open('captcha.png')

# Convert to L*a*b colorspace is more complex:
rgb = ImageCms.createProfile(colorSpace='sRGB')
lab = ImageCms.createProfile(colorSpace='LAB')
transform = ImageCms.buildTransform(inputProfile=rgb, outputProfile=lab, inMode='RGB', outMode='LAB')
lab_im = ImageCms.applyTransform(im=im, transform=transform)

lab_im = rescale(lab_im)
l, a, b = lab_im.split()

# Convert to numpy array and apply the threshold to remove lines
np_a = np.array(a)

threshold = 180
np_a[np_a threshold] = 0
np_a[np_a > threshold] = 255

# Invert the image: we want black text on a white background
np_a = 255 - np_a

a = Image.fromarray(np_a)

# Expand image to close up "gaps" in letters, shrink to
# stop letters running together
a_filtered = a.filter(ImageFilter.MinFilter(11))
a_filtered = a_filtered.filter(ImageFilter.MaxFilter(5))
a_filtered.save('a-filtered.png')

# Run OCR and get the result
result = pytesseract.image_to_string(a_filtered)

# strip() helps remove some whitespace (like \n) that the OCR returns
print(result.strip())

 

And the result:

 

$ python solve-captcha.py
kwbkc

 

We can refactor the code above into a function and attempt to test three of the images above. This gives us some confidence that we will solve all the captchas the server sends and that our code doesn’t just solve this one image. Here are the results for the three images above:

 

Image
captcha_img7

 

The script correctly solves 2 out of the three, and missed the middle one by a single letter.

 

 

Making an educated guess

I then pulled 100 captcha images from the server and ran the script against these pages. The script correctly identified 29 of the 100 images. This is actually good enough for an attack, although it won’t be as fast as possible. If this is the only rate-limiting control of the application, we’ve effectively circumvented it. About one in four requests will pass the captcha, and I can attack the application endpoint effectively. At this point, the captcha has been defeated. Since it’s the only security control preventing automated attacks, the debit card numbers are now at risk.

 

One in four odds means we’re resending the same request, with different captcha solutions, frequently, however. With only a few more changes, we can do significantly better and create a more effective attack. Let’s take a look at some of the cases where we’re wrong, but close.

 

Script Result Correct Answer
f8kh/ f8kh7
f/6d7 f76d7
6/5eX 675ex
b8n&3s b8n83
hmn/2 hmn72
6k&g6 6k8g6
X4mxp x4mxp
/XCWY 7xcwg
&dhx2 8dhx2

 

We can see three common patterns for mistakes. By far, the most common was confusing a “7” in the image with a forward slash (“/”). Because I had 100 captcha images to sort through, I was confident that there would never be a forward slash. The two next-most common errors were confusing an “8” with an amperpsand (“&”) and incorrectly identifying capital versions of some letters with little variance between upper- and lowercase forms. Replacing any forward slash with “7,” ampersand with “8” and converting the string to lowercase meant that the script recognized 44 of the 100 images.

 

We can make a few other corrections because we know some facts about all captcha solutions. All captchas were five characters, consisting of lowercase letters or numbers only. Applying all of the corrections below to the OCR result improved the success rate to 52 out of 100 images.

 

def apply_corrections(result):
result = result.strip()
result = result.replace('/', '7')
result = result.replace('&','8')
result = result.replace('S','5')
result = result.replace(' ', '')
result = result[:5]
result = result.lower()

return result

 

At this point, I launched the attack and was able to recover many debit card numbers. Since Optiv’s Application Security testing tries to avoid denial-of-service attacks, I made no effort to attack the application with a multithreaded, high-volume attack. Still, I was able to recover many valid card numbers in a few minutes. This demonstrates that a captcha is not a robust single line of defense against automated attacks.

 

When the captcha solution was correct, the remote API would either indicate that the card number wasn’t valid or would prompt the user to provide a passcode to finish logging in. After five login failures, the account would lock and it would only unlock if the customer called the bank. This is an effective rate-limiting control that prevents access to the accounts. The application could have implemented a similar protection for debit card numbers or rejected multiple debit card number requests from the same IP address in a short span. The captcha would have enhanced that protection, since failing the captcha half of the time means I would make twice as many requests to identify a single debit card number, on average, and been locked out sooner.

 

 

Captchas, Huh! What are they good for?

Captchas provide defense-in-depth protection at the cost of an easy user experience (UX). My final captcha solving script is under 70 lines of Python code and uses a collection of free software. This isn’t the strongest set of captcha images, but it took about six hours of hands-on-keyboard work to put together and costs nothing. If the reward is worth enough, there are more advanced paid captcha-solving tools attackers might use, and simply paying humans to solve captchas may be a viable approach. Applications (mobile and web) should not rely on captchas as a primary form of defense. Even things like reCatpcha aren’t a panacea, although they’re quite difficult to solve with computers.

 

Instead, consider more robust approaches. Locking user accounts after a series of failed logins dramatically limits the effectiveness of password-guessing attacks. These locks should be logged and monitored. Sudden spikes of account lockouts may indicate that an attacker is targeting the application, and security personnel can take action against the attack. Password reset endpoints, which take a user identifier and send an email with a reset link, can perform rate-limitng based on IP address. If a client makes a large number of requests in a short period, more than is reasonable for a single user trying to reset their password, future requests from the same IP can be rejected without processing them. A captcha might be added as a defense-in-depth measure to make these controls even stronger, but should not be relied upon as a primary security control.

 


References:

Steven Hartz
Senior Security Consultant | Optiv
Steven Hartz is a senior security consultant in Optiv’s Threat Management practice specializing in Application Security. His role is to provide in-depth adversarial review services to Optiv’s clients with expertise in performing web application penetration tests, mobile application penetration tests, source code reviews and threat modeling assessments.

Prior to joining Optiv, he worked as a network penetration tester for the U.S. Department of Defense. In addition, he has performed assessments for Fortune 500 companies across many industry verticals, including Technology, Healthcare, Financial and Retail for both national and international companies.

Steven earned a bachelor’s degree in computer engineering from Michigan State University.