Question: Consider a disease that was fatal if left untreated, but for which there was a diagnostic test that was 99% accurate, that is, for every 100 people with the disease it detects 99 and misses 1 (a 1% false negative rate) and for every 100 people who do not have the disease, it incorrectly identifies only 1 as having the disease (a 1% false positive rate). The treatment for the disease is high risk: it kills 10% of those treated (whether or not they have the disease) but is 100% successful in curing the disease (if you survive the treatment). The disease afflicts 1 person in a 1000. If the test costs $100 per person and a life saved is worth $2,000,000, what is the approximate maximum cost for treatments for them to be cost effective?? $100, $500, $1000, $10,000, $100,000, ... Think about this before proceeding. What are your intuitions? Doing the math: Start with a population of 1 million. 1,000 people with the disease 999,000 people without After testing: 990 people with disease identified 10 people with disease not identified - will die from being untreated 9990 people without disease falsely identified as having it. After treatment: 99 people with disease die from treatment 891 people with disease survive and are cured 999 people falsely id'ed as having the disease die from treatment Number of deaths: Do nothing: 1000 people with disease die because they are untreated Test and treat: 1108 people die (10 from being untreated, and 1098 from treatment) Alternate comparison: you saved 891 people with the disease at cost of killing 999 people who did not have the disease. The cost of the tests and treatments is irrelevant: testing and treating *raises* the number of deaths (over 10%). This is known as the "Base Rate Fallacy" because when the phenomenon being sought (in this case people with the disease) has a low rate of occurrence relative to the overall population, even highly accurate tests produce counter-intuitive results. Note that the false negative and false positive rates do not have to be the same (and rarely are) -- it just makes for a simpler example. This is relevant to the current discussion of whether to administer the smallpox vaccine (and to whom) except that they know only some of the probabilities. Another example from current news (Total Information Awareness is a government program to monitor a massive number of databases and communications channels to try to spot terrorists). This example postulates a 99.9% accuracy rate. This fallacy pops up as an issue in many circumstances. >------ Forwarded Message >From: Benjamin Kuipers >Date: Sat, 14 Dec 2002 13:42:44 -0600 >To: dave@farber.net >Cc: Benjamin Kuipers >Subject: 'Total Information Awareness' as a diagnosis problem > >Dave, > >Consider the aims of the 'Total Information Awareness' program as >a problem in diagnosis. > >Out of a large population, you want to diagnose the very few cases >of a rare disease called "terrorism". Your diagnostic tests are >automated data-mining methods, supervised and checked by humans. >(The analogy is sending blood or tissue samples to a laboratory.) > >This type of diagnostic problem, looking for a rare disease, has >some very counter-intuitive properties. > >Suppose the tests are highly accurate and specific: > (a) 99.9% of the time, examining a terrorist, the test says "terrorist". > (b) 99.9% of the time, examining an innocent civilian, the test says > "innocent civilian". > >Terrorists are rare: let's say, 250 out of 250 million people in the USA. > > (a) When the tests are applied to the terrorists, they will be > detected 99.9% of the time, which means there is about a 25% > chance of missing one of them, and the other 249 will definitely > be detected. Great! > > (b) However, out of the remaining 249,999,750 innocent civilians, > 99.9% accuracy means 0.1% error, which means that 250,000 of > them will be incorrectly labeled "terrorist". Uh, oh! > >The law enforcement problem is now that we have 250,250 people who >have been labeled as "terrorist" by our diagnostic tests. Only >about 1 in 1,000 of them is actually a terrorist. > >If we were mining for gold, we would say that the ore has been >considerably enriched, since 1 in 1,000 is better than 1 in 1,000,000 >by quite a lot. There's still a long way to go, though, before >finding a nugget. > >But we are talking about people's lives, freedom, and livelihoods here. >The consequences to an innocent civilian of being incorrectly labeled >a "terrorist" (or even "suspected terrorist") can be very large. > >Suppose, out of the innocent people incorrectly labeled "terrorist", >1 in 1,000 is sufficiently traumatized by the experience so that >they, or a relative, actually *becomes* a terrorist. (This is >analogous to catching polio from the polio vaccine: extremely rare, >and impossible with killed-virus vaccine, but a real phenomenon.) > >In this case, even after catching all 250 original terrorists, >250 new ones have been created by the screening process! > >The numbers I've used give a break-even scenario. But 99.9% >accuracy and specificity is unrealistically high. More realistic >numbers make the problem worse. Nobody knows what fraction of people >traumatized as innocent victims of a government process are seriously >radicalized. 1 in 1,000 is an uninformed guess, but the number could >be significantly higher. > >A mass screening process like this is very likely to have costs that >are much higher than the benefits, even restricting the costs to >"number of free terrorists" as I have done here. Adding costs in >dollars and the suffering of innocents just makes it harder to >reach the break-even level. > >Ask your neighborhood epidemiologist to confirm this analysis. >It is applied routinely to public health policy, and applies >no less to seeking out terrorists. > >There are alternative ways to detect and defend against terrorists. >Mass screening approaches like TIA are very questionable in terms of >costs and benefits. > >Benjamin Kuipers, Professor email: kuipers@cs.utexas.edu >Computer Sciences Department tel: 1-512-471-9561 >University of Texas at Austin fax: 1-512-471-8885 >Austin, Texas 78712 USA http://www.cs.utexas.edu/users/kuipers > >------ End of Forwarded Message