Dr. Neal Krawetz takes a look at the numbers behind reports of laptop thefts and phishing attacks, showing inconsistent metrics and the difficulty in using numbers to determine the real level of threat.
Security is about evaluating risks. And who knows more about evaluating risks than insurance companies? For example, the automobile insurance industry invests in studies about driver safety, likelihood of an accident, estimated amount of damage, and the average cost of repair. This is how they measure risk.
In the computer field, risk is based on attributes such as ease of exploitation, required skillset to conduct the exploit, number of impacted systems, estimated loss, and amount of damage. It doesn't make sense to spend $10,000 on a high-end firewall to protect a $2,000 computer containing little intellectual property.
Whether it is car, medical, or liability coverage, insurance companies have very specific metrics. My insurance agent can quickly look up my chances of being in a serious auto accident based on my occupation, distance from work, number of miles driven per year, and type of car - and that's before adding in my driving history. Banks have similar metrics and in-depth understandings of their risks. However, few computer organizations have equivalent metrics. What are your odds of being attacked? What is the likelihood of a successful attack? What is the estimated loss from an attack? Many of the metrics we use today are based on half-truths and floating numbers - random statistics without context. When we hear that a laptop was stolen and that it contained thousands of pieces of personal information, should we be worried? What is the likelihood of the compromised information actually being used?
Just as fear, uncertainty, and doubt (FUD) can sway opinions about our security, these random statistics also influence our opinion about how safe we are on-line. But exactly how safe are we?
Playing with numbers
In September 2006, the Washington Post reported that 1,137 government laptops had been stolen since 2001 from the Commerce Department. That's a big number... However, it is a number without context. How many laptops has the Commerce Department had since 2001?
The U.S. Commerce Department employs about 36,000 people. So if we assume that they all have laptops, then 1,137 lost laptops becomes 3% of their workforce. Now we have context - and it seems like a high number. The percent increases if we assume that only 10% of the people have laptops (30% lost), and decreases if we count replacement laptops. For example, few people use laptops longer than three years. Between dead batteries, damage from long-term use, and an inability to run the latest-and-greatest software, laptops get replaced. If we assume a replacement every three years, then every laptop at the Commerce Department would have been replaced twice, tripling the number of laptops that could be stolen. That initial assumption of a 3% loss rate suddenly drops to 1%, and the 30% assumption drops to 10%.
Now, 10% (and even 1%) sounds like a lot, and it accounts for a significant amount of lost personal information. However, I don't know anyone with a laptop who doesn't have some kind of personal or sensitive information on the hard drive. If a laptop is stolen, then personal or sensitive information will be stolen. The only real question is whether the information is useful to the thief. If the data is obscured or encrypted, then the answer is "maybe not." Remember: most laptops are believed to be stolen for the hardware and not the data.
Retailers, big companies, universities, and non-profit organizations expect "shrinkage" - they know that a percentage of merchandise and equipment will be lost, stolen, or broken. Knowing that every missing laptop contains something of importance, we can then start asking: Is "1%" an unexpected loss rate? Unfortunately, I cannot find any laptop-loss statistics for any big companies - we hear about individual laptop losses, but not the total percentage. However, I have worked for a couple of Fortune 500 companies and universities. Every few years (or every year, depending of the company), they do an inventory of equipment (PDF). The inventory is almost always followed by an obligatory email saying, "Does anyone know where the <equipment name> is? We're looking for the one with <tracking number>. We're also looking for <long list>."
Shrinkage. It always seems worse after a large round of layoffs. Some of the missing equipment can be physically big, like computers the size of Volvos - these are usually found. However, many items are small, such as laptops, cameras, projectors, and other portable devices. These small items rarely turn up. And remember: every missing computer contains some kind of sensitive information - the only question is whether the data is valuable to the thief. Yet, these data losses are rarely reported, even in publicly traded companies.
All of this loss adds to the amount of information potentially compromised. However, the general public does not know these numbers and cannot measure this risk.
By the way, according to law enforcement officers at JustStolen.net, one in ten laptops will be stolen. That is 10%, so the Commerce Department doesn't look that bad by comparison.
Phishing for Numbers
The percentage of lost items is not the only number regularly taken out of context. For example, consider the question: how much email is spam? In 2005, values from respected experts ranged from 70% to 95%. There was no consensus among experts, but all of the numbers sounded "bad." Today, some companies no longer report the "percentage of spam" - they only report raw values (PDF). The only thing we really know is that it is a big number. But, we don't know what the number is (such as, 86%) or the accuracy range (a 5% margin of error?). We actually have better numbers and statistics for American Idol voting than spam volume. The same issue arises when we ask where the spam comes from. The general consensus is that today's botnets generate a majority of spam. However, we do not actually know how big the majority is.
This counting problem also shows up in reports on phishing. Every few months the Anti-Phishing Working Group (APWG) releases their Phishing Trends Report. For example, the APWG Sept-Oct 2006 report (PDF) shows an increase in phishing emails. In fact, their reports over the last few years have shown a nearly steady increase intermixed with a few sharp increases in volume.
The problem with the APWG numbers is that they don't match other sightings. For example, Usenet's "news.admin.net-abuse.sightings" (NANAS) is a high-volume newsgroup where people post their spam messages. NANAS receives thousands of postings per day - approximately 40,000 spam postings just for December 2006. The postings are sample spam emails submitted by people all over the world, and the samples appear to match the distribution of world-wide spam. If you don't have access to hundreds of honeypot accounts for collecting spam and want to do spam research, then NANAS is the next best thing.