Image Spam: The Email Epidemic of 2006
End-users around the world are reporting an increase in spam. Much of this increase can be attributed to a resurgence of spam in 2006 — driven by the emergence of new, more sophisticated forms of image spam.
Image spam is a technique with which spammers advertise the "call to action" of their message as part of an embedded file attachment (like a .gif or .jpeg) rather than in the body of the email. These images are automatically displayed to end-users, yet the content of the image itself remains hidden from most spam filters.
The increase in more complex image spam attacks has caused spam capture rates across the email security industry to decline, resulting in wasted productivity and end-user frustration as more spam gets delivered to their inboxes. The sheer increase in the volume of spam, combined with a higher percentage of larger-sized spam, is also clogging the email infrastructure as many mail systems are unable to keep up with these spam volumes.
This document summarizes (1) the recent trend in image spam, (2) why it is difficult to detect and (3) how IronPort protects customers from this increasing threat.
Fueled by a worldwide increase in image spam, overall spam
volumes surged in the second quarter of 2006.
Trends & Solutions:
According to IronPort's SenderBase Network, spam volumes leveled off in 2005, but surged again in the second quarter of 2006. As illustrated on the left axis in the chart above, SenderBase shows that worldwide spam volumes grew from approximately 30 billion messages per day to over 50 billion over the last 12 months. IronPort saw a 40 percent increase in spam volumes during Q2 alone. This means that, even if the spam capture rate is held constant, the average end-user will have noticed 40 percent more spam in their inbox since April.
Much of this increase in overall spam volume can be attributed to the growth in image spam. As illustrated by the right-hand axis in the chart above, image spam rose from around 3 percent of spam a year ago to over 20 percent today. When overall spam volumes spiked in Q4 '05 and Q2 '06, image spam was fueling the increase.
The root cause behind this sharp increase in spam volumes is money. Spammers are single-minded: they send spam to make money. The more messages that are delivered to inboxes, the better the chances recipients take action on the messages, resulting in more income for spammers.
As illustrated in the next section, randomized image spam is especially difficult for most spam filters to detect — causing more of the spam to get delivered. Spammers can also make their images appear quite normal and compelling to users, resulting in higher response rates. Since neither of these factors is likely to change in the near-term, IronPort expects image spam to remain a problem for the foreseeable future. IronPort has also seen spammers innovate rapidly in their use of image spam, suggesting that image spam will soon become even more challenging to detect.
Why Image Spam Is Difficult To Detect
Image spam has been around for years. It was originally created in order to get past "heuristic" filters, which block messages containing words and phrases commonly found in spam. Since image files are in an entirely different format than the text found in an email, heuristic filters never "see" the content of the message. Therefore, these filters were easily defeated by this type of spam.
To deal with this problem, anti-spam vendors developed "fuzzy signature" technologies. These signature-based technologies collect samples of known spam and then classify "near-identical" messages as spam. These signatures were sometimes written against just the message attachment, so that messages with different content but the same attachment would still be marked as spam.
Signature-based defenses remained effective for several years. In 2006, however, spammers began randomizing images to appear the same to the human viewer but totally different to spam filters. For example, some spammers are sending messages advertising the purchase of stocks with an attached .gif file that has random "dots" inserted in the image and borders with subtly different color and width. The signatures that most anti-spam vendors rely on to detect these attacks vary dramatically, based on these small changes to the image. This means that anti-spam vendors may publish a rule that stops one instance, but this rule doesn't stop all the rest of the spam messages in the attack.
There is an almost infinite number of ways that spammers can randomize images. In addition to inserting dots, spammers have recently used techniques such as varying the colors used in an image, changing the width and pattern of the border, altering the font style, and "slicing" images down into smaller pieces (which are then reassembled to appear as a single image to the recipient). Page 3 includes two examples of the many techniques recently used by spammers to get past signature-based defenses.
An embedded .gif file containing all "text" with dots randomly
inserted in the image to make every message appear unique
to spam filters
"Slice & Dice"
Images are broken down into many smaller files of varying
sizes and then reassembled in the mail client so as to appear
as a single image to the email recipient. The rectangle
highlighted represents the border of one of over a dozen
image files used to construct this message. This technique is
used to defeat signature-based defenses and break up words
that could be found by OCR.
Some vendors have recently introduced Optical Character Recognition (OCR) as a means of detecting image spam. OCR is a technology used to extract typewritten text from an image. While more effective than signature-based solutions alone, OCR has several limitations. First, OCR is very computationally expensive. Fully rendering each message and then looking for word matches against different character set libraries can take as long as several seconds per message. This lowers system throughput below levels acceptable to most ISPs and enterprises. OCR is also extremely vulnerable to obfuscation. While modern OCR technology can reliably detect typed letters and numbers, it can be easily fooled by basic techniques used by spammers. For example, OCR is ineffective at detecting image spam that includes hand-written text, graphics or any abstract data.
Protecting Against Image-based Threats With IronPort Anti-Spam
IronPort Anti-Spam™ uses a unique, multi-layered approach that stops over 98 percent of image-based spam, with near-zero false-positives. The first layer of defense is powered by IronPort's Context Adaptive Scanning Engine™ (CASE). This is followed by an inner layer of image spam protection powered by IronPort's patent-pending Multidimensional Pattern Recognition™ (MPR) technology.
Context Adaptive Scanning:
Most anti-spam filters depend heavily on content-analysis for stopping spam. This is like building a house on a weak foundation. These filters all share a common weakness — relying heavily on something that can easily be manipulated by spammers themselves. Image spam is just one instance where content-based filters fall short. As in the examples on page 3, the "content" of the spam is invisible to many filters because it is embedded in the image itself.
To detect image spam, IronPort has augmented traditional content-based techniques with techniques that analyze the full context in which the message was received. Specifically, CASE detects threats by analyzing four broad areas:
- Who sent the message and what do we know about this sender?
- Where does the call to action in the message take you?
- What is the nature of the message content?
- How was the message technically constructed?
Instead of generating a signature based on the content of the message, IronPort creates a specific spam profile for an image-based spam attack that combines the "who, where, what and how" of a message.
For example, one profile might be created for message that originated from a dynamic IP address, contains a certain header pattern, has an embedded image of a specific size-range and type and contains little or no text in the body of the email itself. None of these factors alone are likely to indicate with certainty that a message is spam, but they are highly accurate when combined. Context adaptive scanning allows IronPort to filter the majority of image-based spam attacks without decoding the image file. The second layer of protection is provided by Multidimensional Pattern Recognition (MPR).
Multidimensional Pattern Recognition:
To the human eye, image spam is extremely recognizable. In fact, this is one of the properties of image spam that make it attractive to the spammer — they don't have to go to nearly the same lengths to obfuscate their content when sending image spam to avoid filtering as they do with traditional text spam. But, if this spam is so obvious to the end-user, why can't spam filters identify it?
The challenge is that humans interpret the content of messages using a much richer data set than just the text displayed. Attributes such as image color, shape, font size and type, graphics and many other characteristics also shape a reader's perception of a message. This information is entirely hidden from traditional content filters — and technologies like OCR only capture a fraction of this information.
IronPort Anti-Spam developed a patent-pending technology called Multidimensional Pattern Recognition (MPR) to address this problem. After decoding the binary image files, IronPort uses MPR to analyze the decompressed image data across over 13 dimensions to determine whether or not the message is spam.
Color is an example of a dimension that provides rich information about the content of a message. IronPort analyzes the distribution of colors found in each message to establish the likelihood that the message is spam. For example, MPR can scan a .gif file to look for pixel patterns indicating that the image file is displaying "all text" to the user, a pattern that is common in spam but rare in legitimate email (most legitimate .gif files contain pictures not text). MPR can also detect anomalous "dots" in images that don't fit the "smoother" gradients of light typically found in legitimate email (these dots may represent attempts by the spammer to defeat signatures).
To make this level of inspection possible, without compromising performance, IronPort applies the concept of "early exit". This means that the more intensive MPR process is only applied to messages with images that have already passed through the regular context adaptive scanning process. This same concept is applied within MPR as well. If part of the image file has been analyzed and there is sufficient data to determine that the message is spam, the full image file will never be analyzed. The end result is a process that is not only more accurate, but also several times faster than traditional OCR technologies. Critical to the effectiveness of this technology is the real-time nature of IronPort Anti-Spam. Updates to the system are made every five minutes, ensuring immediate and accurate protection from image-based threats.
Image spam has exploded in 2006, as spammers have found it to be an effective means of bypassing traditional spam filters. The flood of image spam is frustrating end-users and taxing the already strained email infrastructures of many companies.
Spammers have rendered traditional anti-spam technologies ineffective by hiding content in embedded images and subtly randomizing these images so that each message appears unique to spam filters. Some anti-spam vendors are looking towards introducing OCR technology to stop this problem. Unfortunately, this technology is too slow for many customers and can easily be defeated by simple changes in spammer tactics.
IronPort has taken a fundamentally different approach to the problem. By interpreting image content more along the lines of how a human would interpret the image, using Multidimensional Pattern Recognition, IronPort has turned the spammers' own techniques against them. In their efforts to defeat traditional anti-spam systems, image spammers are leaving behind subtle traces that IronPort Anti-Spam is using to stop over 98 percent of their messages.
IronPort Anti-Spam is available on IronPort's email security appliances. IronPort technology protects the infrastructures of organizations worldwide — not only from today's threats, but from those certain to evolve in the future.
Download the IronPort Image Spam Report Data Sheet (PDF).