Tweets by @TriggerBlog

## Categories

Make Custom Gifts at CafePress

One of the arguments being provided as support for the Assault Weapons Ban is the change in the number of firearms traces that were traces of assault weapons. This is an oft-cited number because it appears in one of the official government studies about the effectiveness of the ban, and it's one of the very few numbers in that report that suggests any benefit at all. However, anyone reading the study with a basic knowledge of statistics will understand that the number is actually meaningless.

This particular study result is usually cited as a "2/3rds reduction in assault weapons used in crime", and sometimes as a "65% drop". In actual fact, it's a drop from 3% of firearms traces to 1% of firearms traces. The studies cite the *change* rather than the absolute percentage because the absolute percentage is so low. That's understandable, since it sounds more impressive and is technically accurate. To understand what is misleading about that number, you have to dig even deeper.

For people who don't have the training in statistics to understand this, here's a very short course in statistics to show you what's going on. Most of this information can be applied to just about any public-policy poll or study, since the analysis of such issues depends on the application of statistical analysis. Without such knowledge, it's easy for the author of a study to mislead by implication about his results.

Statistics is a science that depends on extrapolating the qualities of a small sample selected from the overall population to predict or explain the behavior of an entire population. This technique is used because, in public policy questions, it's a lot cheaper to measure the sample than the entire population. In order for this extrapolation to be valid, the sample must be taken from the population *at random*.

This point cannot be overemphasized. A sample which is biased in a way related to the measurements being made will not produce valid results, and there is no simple way to detect the bias by examining only the sample. You may think that getting a random sample is easy, but in fact, it's an extraordinarily difficult thing to do.

Consider a simple telephone survey -- suppose you want to employ someone to call people on the phone, selected at random from a phone book using a computer algorithm. Every day at 9am, this person starts making phone calls to people on the list, and giving surveys to the people who answer. Every day at 5pm, he goes home. Even though you started with a computer-generated random sample drawn from a fairly-complete list of people, you've just eliminated from your survey:

- Anyone with a 9-5 job
- Anyone who can't afford a telephone
- Anyone who has moved since the last phone book was published
- Anyone whose privacy concerns are such that their number is unlisted

So how does this effect the result? Simple: *firearms traces are not a random sample*. Police departments do not trace every firearm used in a crime, because they don't *have* every firearm used in a crime. They also trace some firearms that were not used in a crime (for example, recovering stolen property). Different police departments have different policies on when to trace a firearm, meaning that some firearms used in crime will not be traced, and firearm types which police departments find "interesting" will be traced more often, and some states have local databases which they can check before submitting a trace request to the BATFE.

So on the basis of sample bias alone, the number of "assault weapons" traced by the BATFE is useless. It tells us nothing about the general population of assault weapons. Here's what the study's authors have to say about trace data:

Therefore, tracing data are a biased sample of guns recovered by police. Prior studies suggest that assault weapons are more likely to be submitted for tracing than are other confiscated firearms.

One of the most common mistakes made with statistics in the hands of a layman is mistaking *correlation*, which many statistical tools designed to analyze, and *causation*. The difference is vital. Correlation means simply that two factors -- for example, drug use and petty crime -- tend to occur together. Causation means that one factor *causes* the other, or in our example, that drug use *causes* petty crime. Statistical analysis can only determine correlation; a carefully designed experiment (that excludes all other conflating variables) is needed to determine causation.

Where that is not possible, studies can try to account for as many significant factors as possible. When this is done, the correlation for individual variables can be determined. This analysis looks a lot like causation; you typically end up with a large number of variables, each with a correlation coefficient relative to a result. So, in our study, we might have variables for the passage of the assault weapons ban, variables for the state each gun trace came from, a variable to indicate whether the state had a pre-existing assault weapons ban in place, and so on. This is done in an attempt to account for variables in the sample that can be known, but not eliminated.

Sometimes that kind of analysis is the best you can do, for cost or ethical reasons. Because it looks a lot like causation, it's often mistaken for such. But it has fundamental flaws when used in that manner. For example, even if the drop in assault weapons traces is highly correlated with the assault weapons ban and all other variables are accounted for, it's impossible to tell (with statistical tools) which direction the causation arrow points. Does the ban cause the reduction, or does the reduction cause the ban? It's easy to form logical assumptions about that arrow, but there is no statistical evidence to back them up.

In this case, people are mistaking the correlation between the passage of the assault weapons ban and the drop in assault weapon traces with the proposition that the assault weapons ban *caused* the drop. It might have, or it might not have. We just don't know. We don't even have the kind of information and analysis that would help us to eliminate other related variables, which might let us make logical guesses. We just have two raw numbers -- before, and after. That's not enough information to even suggest a *causal* connection.

To normal people, the concept of "significance" refers to importance. In statistics, it's subtly different: a correlation is significant if it is sufficiently unlikely to have occurred by chance.

Chance matters because of the randomness (at least, the *hoped for* randomness) of the sample. Even using a completely random selection process, it's possible to pick a biased sample. If a population of one hundred people has 5 people who like peanuts, and your random sample of 5 people happens to be the 5 people who like peanuts, you're going to think that the entire population likes peanuts -- and you'll be completely wrong. But picking those 5 people for your sample is unlikely.

A large part of statistical analysis is understanding just how unlikely that is, and quantifying it for your specific sample. That way the analyst can report how likely it is that his results are due to chance rather than a true correlation. Sometimes this is reported as a confidence interval (a range of values between which the true value lies some percentage of the time, usually 95%), and at other times as a simple test for significance. Exactly how wide the confidence interval is depends on how much variance is in the sample.

Variance is a term that describes how closely grouped the values of the variable being measured are. If you have a sample of 3 people, one 6 feet tall, one 3 feet tall, and one 10 feet tall, you could describe that sample as having a high variance -- because most people are more tightly clustered in the 5-6 foot range. The variance of a variable influences how broad the confidence interval will be.

Another influence on the size of the confidence interval is the sample size. The larger the sample, the less likely random selection bias will be to influence the results. Using a larger sample won't reduce the variance of the population, but it will reduce the distortion of randomly selecting a non-typical member of the population as part of the sample. In other words, it's harder to pick 500 non-typical examples than it is to pick 5 non-typical exaples.

So what does this mean if you are reading a study, rather than writing one? Simple: look for the author's tests for statistical significance. If there are no tests for statistical significance, the study is bunk; toss it. If there are tests, read them carefully. Usually the level of significance will be specified, with .95 being the most commonly used standard (that is a 5% chance that the results of the study with respect to that specific variable are due solely to chance).

If the study reports that results *did not attain significance* it's an indication that they are probably not reliable. The results could mean nothing more than the effect of randomness in the sample selection. If the results are reported as significant, check at what level the significance test was conducted. Anything less than 95% significance should be considered questionable, although 90% is sometimes used.

One of the most common tactics for obfuscating lack of results is to report results in the summary that don't pass the significance test in the detailed paper. There's no real way to catch this without reading the whole study, but when the whole thing is available, make sure to check the results reported in the summary against the significance tests for those results.

In the case of the assault weapons ban study, the change from 3% to 1% was tested and found *not to be significant*. In other words, it presents absolutely no evidence for the effectiveness of the ban. Assault weapons, it turns out, are so rarely traced in absolute numbers that there simply isn't enough data to show any results. The drop in traces, which has been widely reported, is *statistically insignificant* at the standard 5% level. The authors had to reduce their standards to the 10% level to attain significance, and (as already noted) they were working with a biased and invalid sample to start with.

It's easy to lie with statistics. But it's also easy to see through those lies, if you know the basics.

This entry was published 2005-09-24 10:43:35.0 by TriggerFinger and last updated 2005-09-24 10:43:35.0. [Tweet]

- If you are running for US Representative, do not fly the Mexican flag
- Comcast refuses to allow anonymity services
- Military cyberwar techniques used to search every computer in Washington State
- Who are the 5 new IRS employees with missing email?
- Does the IRS cover-up break any laws in itself?

- The 1968 Gun Control Act
- Rocketry Hobbyists versus the BATFE
- Third Circuit rules New Jersey can continue to confiscate firearms from travelers
- Government is just a term for things we do together
- Protestors oppose guns for upcoming ESPN Games

- Major media is paid by government agencies for specific content
- Senate ethics complaints filed against 10 Senators
- 300 days of IRS abuse
- A technical note on content versus metadata
- Boomershoot 2009: Media Day