VP of Intelligence Operations
Mark Maxey is VP of intelligence operations with Optiv. In this role, Mark manages our global threat intelligence center, SIEM, threat analysis and endpoint security teams.
Testing Web App CAPTCHA controls
CAPTCHA ("Completely Automated Public Turing test to tell Computers and Humans Apart") is a type of challenge-response test used by many web applications to ensure that the response is not generated by a computer. CAPTCHA implementations are often vulnerable to various kinds of attacks even if the generated CAPTCHA is unbreakable.
I've had a few questions on testing CAPTCHAs as of late and decided to do a quick write-up on how I test the strength of a CAPTCHA or in some cases write a CAPTCHA breaker. I will start below with a quick test that I use to gauge the initial strength of a CAPTCHA implementation (Microsoft Onenote has excellent handwriting detection and is very easy to use for this purpose):
- Copy the image contents to my clipboard
- Open up onenote (or your favorite OCR tool)
- Paste the image onto a one note page.
- Choose copy text from picture
- Now you will have the contents on your clipboard. Paste that into notepad and compare the results.
- If there is noise in the middle of the text, such as a curved line, make the image very large and stretch the image vertically. Then pass this through a handwriting detection library. The stretching appears to make the noise in the middle less prominent. Note: This is based on my own personal tests and not concrete science.
- Convert the image to black and white (this, for whatever reason, filters out a ton of background noise).
- Many CAPTCHAS use a static piece of noise like curved line the middle of the word. You can often get around this by doing a static crop of a region of the image.
- Cut the image up into a grid. This can easily be achieved using a Photoshop script or ImageMagick, but I have not gone through the trouble of making one in a long time. See the example in Figure 2. This can be achieved by examining each pixel in the image and identifying the leftmost black pixel as a starting point and identifying the rightmost boundaries of each letter where the black pixels are continuous. This assumes there is a clear boundary however between each letter. This may be easier to solve by treating each CAPTCHA as a series of images in favor of a single image.
In many ways, automating CAPTCHA strength testing is very similar to handwriting detection and simple tools are widely available for this task including FOSS libraries.
A couple other CAPTCHA solver libraries are out there, including the somewhat dated PWNCAPTCHA that was recently open sourced. Here is a list of a few other helpful tools that you can use to make your own CAPTCHA solvers:
- Perl OCR Libraries - http://search.cpan.org/search?query=ocr&mode=all
- Ruby OCR Libraries - http://code.google.com/p/ocropus/
- Perl IMAGEMAGICK Image Manipulation Library - http://www.imagemagick.org/script/perl-magick.php
#!/usr/bin/perl -wSome key things to remember when testing a CAPTCHA:
# CAPTCHA Solver v1 - A simple tool for image transformations and OCR to solve CAPTCHA
# Author: Mark Maxey - firstname.lastname@example.org
# Version 1.0
# read in the image
my $image = Image::Magick->new;
# turn the image to black and white
# cropping the image to eliminate static noise
# resize the image
my $img_width = '2000';
my $ratio_main = '1';
my $img_height = '2000';
$image->Resize(width=>$img_width * $ratio_main, height=>$img_height * $ratio_main);
# OCR Code here
# if you can't figure this part out you shouldn't be doing this
# end OCR
1. Eliminate as much noise as you can, which is generally easy by just converting the image to black and white
2. Identify areas where static cropping of noise can be eliminated
3. Some OCR toolkits can limit the character set to specific characters (no special characters and all lowercase for example). Use this where applicable to improve the accuracy of the test
4. Turning the CAPTCHA into a grid will often make it very easy to solve by clearly defining word boundaries
5. If the CAPTCHA does not involve text you probably can't solve it using the methods I described above
6. Increase the size of the image, this will help you hone in on where the boundaries are and makes a lot of the noise much easier to deal with
7. Sometimes a CAPTCHA, if there are parameters available for tampering, can be used to DoS a site or cause other problems. Quite often you will see a parameter like width=200&height=350, so what if you make this 999999999999 x 99999999999999999 etc.