Serious Flaws in Kellerman, et al (1993) NEJM

(December, 1993)

by Henry E. Schaffer, Ph. D.

Summary and Overview

The Kellerman, et al (1993) study in the NEJM attempts to use the case-control method (CCM) to show that gun ownership increases homicide in the home. The limitations of the CCM, and serious flaws in the study methodology, result in invalidation of the study's conclusions.

The CCM has a number of limitations in what it can accomplish, and has a number of conditions (assumptions) which must be satisfied for it to be able to satisfactorily accomplish even the limited goals for which it is suitable. The biggest limitation is that the CCM can't demonstrate causation. The CCM finds 'associations' between studied factors and the 'outcome' which defines the 'cases'. These 'associations' may suggest that there is a causal relationship, and may then be used to justify a study of causal relationships, but it is incorrect to jump from the discovery of an association to a conclusion of causation. Other weak points in the CCM have to do with susceptibility to biases in the selection of the cases, and with confounding factors which can affect the choice of the controls. These can easily lead to spurious associations when there actually are none, or to associations which are reversed in direction from what actually exists.

The Kellerman, et al (1993) study has been widely quoted as demonstrating that there is a causal relationship between handguns in the home and homicides. The paper itself doesn't go that far, but it uses suggestive language, which suggests that there is more than merely an 'association'. The flaws in the paper are such as to make the the reader suspicious of the association found. Showing flaws in the methods does not prove that the paper is wrong, but it causes a loss of confidence in the results. Conclusions which are not properly supported must be considered invalid until proper support becomes available, if ever. It is the responsibility of the authors to support their conclusions. It isn't the responsibility of the readers to go out to collect data to prove that the flaws in the paper lead to incorrect conclusions.

The detailed treatment of these flaws, with supporting data, examples and methods is necessarily quite long, but it does illustrate that the Kellerman, et al paper is based on unsupported assumptions and that the conclusions must be viewed with suspicion or rejected as being unsupported.


I was helped in this project by the advice, criticism and encouragement of Dan Day, Fran Haga, Steve Holland and Paul Stoufflet. Many other people on the net also helped. I have full responsibility for any defects.

Detailed Examination

Subgroups and confounding factors

The methods used in Kellerman, et al do not take into account subgrouping or stratification in our society, and this can be shown to be able to cause a spurious association comparable to the one found.

The case-control method (described in an Appendix) has an assumption of homogeneity for all relevant variables which are not taken into account in the study. If this is violated, it is possible to have an 'apparent association' result when there is actually no true association. Technically this would probably be considered to be "confounding" due to whatever factors were heterogeneous. Unlike other types of studies in which randomization is used to protect against unaccounted for variation, the case-control remains susceptible.

Here are some simple examples of an association in the overall data:

The computation of association is shown in an Appendix.

I) No association in subgroups - spurious harmful association overall

Consider that the population is composed of a minority subgroup which has a high risk of homicide, and a relatively high gun ownership rate. This subgroup is composed of 'career criminals', gang members and others who have a repeated history of criminal activity. The majority subgroup has a low risk of homicide and a lower gun ownership rate. This majority is the general law-abiding public. This type of subgrouping does occur in the US, and is discussed in an Appendix. Subgroup sizes of 10%/90% are used in this example to be in the range of numbers found in the studies cited in the Appendix. There is no causal relationship between homicide and gun ownership in either subgroup.

No causal effect in subgroups - spurious harmful association overall:

Gun                High Risk                  Low Risk
Ownership          dead   alive            dead    alive
Own gun            165    665,000          27.5  2,992,500
No gun             165    665,000          82.5  8,977,500
                   ---   --------         ----- ----------
     Totals        330  1,330,000         110   11,970,000

(Population Total 13,3000,000, total dead-in-home 440)

The 'odds ratio' measure of association is 1.0 in each case indicating a lack of association of gun ownership with homicide. However, when we put these two groups together into the single population which they compose we get the data:

Gun               Total Population
Ownership         dead     alive
Own gun           192.5    3,657,500
No gun            247.5    9,642,500

The 'odds ratio' now is 2.0 which indicates an association of gun ownership with homicide. This is not due to gun ownership having a causal effect, but rather there is a 'confounding' variable of subgroup membership and gun ownership is associated with subgroup. So the association of gun ownership with homicide would be called an 'apparent association' in the literature.

II) Protective effect in subgroups - spurious harmful association overall

Since the 2.0 odds ratio in (I) above is fairly large (it is comparable to the 1.6 odds ratio found in the paper,) it is clear that the same type of apparent harmful association can arise even when there is a protective effect of ownership within each of the subgroups. Arbitrarily modifying the example numbers above to introduce a similar protective effect in each subgroup produces:

Gun                High Risk                    Low Risk
Ownership          dead   alive                dead    alive
Own gun            151    665,000               24    2,992,500
No gun             179    665,000               86    8,977,500

These show an odds ratio of .84 for each of these subgroups. Note that odds ratios <1 represent protective associations. however, when we put these two groups together into the single population which they compose we get the data:

Gun               Total Population
Ownership         dead     alive
Own gun           165      3,657,500
No gun            265      9,642,500

The odds ratio is now 1.64 which is a (spurious) harmful association. This must be considered to be an "apparent association" of gun ownership with homicide because it has resulted from data in which there was a clear protective effect, and yet it resulted in a spurious indication of harm comparable to the 1.6 value given in the paper.

Note that all the above has used the entire population in the calculation - but since the odds ratio is unaffected by dividing a column by a constant, the exact same odds ratios would be produced if a sample was taken from the "alive" column (corresponding to the choice of 'alive' controls.) In this case the table immediately above would be:

Gun               Total Population
Ownership         dead     alive
Own gun           165      121
No gun            265      319

which produces the identical 1.64 odds ratio.

The Kellerman, et al (1993) study in the NEJM didn't use the same calculation that is shown above. They used the "Mantel-Haenszel chi-square analysis for matched pairs" but didn't give any analysis. This analysis is able to adjust for differences in stratified data *if* the stratification (subdivision of the overall population into the two subgroups) is known and is taken into account when matching..

Matching control pairs is an attempt to get the each case and matched control be in the same subgroup - when the population is divided into subgroups. If this is done, then it appears that the Mantel-Haenszel analysis will produce an association calculation which is free of the confounding demonstrated above. However it is not clear that the Kellerman, et al matching does select controls from the same subgroups as the cases. The control selection was done using a random selection starting outside a "one-block avoidance zone" away from the case homicide, and the matching criteria did not include any life-style or related indicators.

If the population is composed of subgroups which differ in homicide rates, then the matching procedure would be hoped to select the matching control from the same subgroup as the case it is supposed to match. This could happen with the matching method used if the subgroups were settled in distinct different large geographic areas. Because of the avoidance method used these areas would have to be larger than one-block in size (how much larger is hard to tell, since the paper doesn't say how far outside the zone it was necessary to travel to find a matching control who would agree to cooperate.) But it doesn't appear that risk sub-groups are distributed in such a coarse-grain manner. I discussed this with a colleague who is a sociologist/ criminologist who pointed out that risk subgroup factors (drug dealing, violent criminal events, violently abusive family relationships, etc.) often are fine-grained. They vary between different families in one apartment building, and certainly vary between different families in a block. Therefore choosing a control who lives 1 or more blocks away will not assure matching with respect to the sub-group. For a minority sub-group (e.g. the 10% "High Risk" group in the examples above) the chances good are that homicides in the high risk group will be matched with low risk group controls.

The Kellerman, et al paper presented all of its data in terms of the overall group numbers, similar to the total population information presented in the examples above. Therefore there is no way to rework the analyses and check on the Mantel-Haenszel analysis results. Without proper within-sub-group matching the Mantel-Haenszel result would be affected by confounding and therefore produce incorrect results just as found by the odds ratio analysis used in the above examples.

This can be shown by using two situations based on (I) above. We take, as before, the High Risk subgroup as 10% of the population and the Low Risk group as the other 90%. We have the same 440 cases as in (I) above, and they will be matched in two ways.

The 440 case individuals are

The population as a whole is

The population figures result from 10% High-Risk of which 50% are with gun and 50% without gun. The 90% Low-Risk is 25% with gun, and 75% without gun.

A) Working out the expected numbers of the four types of matched case-control pairs when the controls are selected *without* consideration of subgroup membership:

   case with gun   case with gun    case without gun  case without gun
   control with    control without  control with      control without
     45.375          119.625           75.625            199.375

The odds ratio is 119.625/75.625 = 1.58

Remember that there is no association within each of the subgroups, and therefore this is a spurious association comparable to the 1.6 found.

B) Working out the expected numbers of the four types of matched case-control pairs when the control are selected from within the same subgroup as the case.

   case with gun   case with gun    case without gun  case without gun
   control with    control without  control with      control without 
     68.75           96.25            96.25              178.75

The odds ratio is 96.25/96.25 = 1.

This is the same (no association) result which is found within each subgroup. This indicates that the Mantel-Haenszel method correctly compensates for stratification only when the stratification is recognized.

Therefore it can be seen that this type of subgrouping could, by itself, account for the results of the study.

Bias due to failure to respond honestly

The cases and the controls were asked about gun ownership in the home. The raw results were that 174 of the cases (45.4%) said that there was ownership and 139 (35.8%) said that, for a crude odds ratio of 1.6. Might there be a bias in these responses? Considering that each of the cases was a homicide reported to the police, we can expect that there was a police investigation and not only was a gun found if there was one in the home, but that there would be little reluctance to admit the fact. What about the controls?

The authors refer to "a pilot study of homes listed as the addresses of owners of registered handguns confirmed that respondents' answers to questions about gun ownership were generally valid." (This study by Kellerman, et al, 1990 is cited below.) This sounds impressive - until considering what "generally" means. In the study referred to, the authors found that 97.1% of the families (34 of 35) which were listed as being the location of a registered handgun admitted to having guns in the home, either at the time or recently. This sounds very impressive until the numbers are placed in perspective. 75 homes were chosen, but due to difficulties in address records, only 55 could be found, and of these only 35 consented to the interview. Therefore we can only conclude that 31 of the 55 homes contacted (56.4%) and 31 of the total of 75 homes (41.3%) admitted to gun ownership. This is considering only *registered* owners. One might plausibly think the difficulties in finding 20 (= 26.7%) of the registered owners might be related to their unwillingness to be connected with ownership. The refusal to be interviewed might have the same cause, and owners of unregistered guns would be even more reluctant to admit to ownership. Criminals and owners of illicit guns are likely to refuse to be interviewed, let alone admit to ownership. Therefore it appears that Kellerman is quoting his own previous work in a way which overstates its conclusion.

The reason this % is important can be seen by looking at the amount by which gun ownership is stated to be lower in the controls than in the cases. This is the root of the 'association' which is claimed to exist between gun ownership and homicide. It would take only 37 controls who possessed guns, but denied possession, to make the control ownership exactly equal to the cases (and produce a crude odds ratio of 1.0.) Note that the chance of lying in denial is raised by the fact that most of the time (51.7% of the time) the control, instead of a proxy, was interviewed, and therefore there could be maximum personal interest in denying gun ownership. If 45% of the control actually owned guns, this 'deficit' of 37 would represent a 21.1% 'false-denial-rate.' Such a rate is quite consistent with the results of the pilot study, even though the authors do not admit to it.

Therefore this bias could, by itself, account for the results of the study.

(The study is Kellermann, A. L., F. P. Rivara, J. Banton, D. Reay, and C. L. Feigner. Validating survey responses about gun ownership among owners of registered handguns. Am J Epidemiol 1990; 131:1080-4.)

Selection Bias and Response Bias

A major point is made in this study that *all* of the homicides meeting the study's 'in the home' criterion were included. This is a benefit to a CCM study since it eliminates the possibility of case 'selection bias' affecting the results. However, upon closer inspection, it appears that there is far from total inclusion and that there is room for selection bias to act. The authors try to give the impression that there was a very high response - they do this by giving 'partial' percentages several times, rather than stating the end result. There were 444 homicides meeting the 'home' criterion. 24 were excluded for "various reasons" leaving 94.6%. But then 7% were dropped because of failure to interview the proxy, and an additional 1% due to failure to find a control, leaving 388 matched pairs. This is down to 87.4%. The authors state, "Although case-control studies offer many advantages over ecologic studies, they are prone to several sources of bias. To minimize selection bias, we included *all* cases of homicide in the home and rigorously followed an explicit procedure for randomly selecting neighborhood control subjects. High response rates among case proxies (92.6 percent) and matching controls (80.6 percent) minimized nonresponse bias." (emphasis added)

Are the authors overstating their case? Perhaps just a little, but many would be willing to allow 87.4% to be described as "all". However, this is not the end - even though there were 388 matched pairs, it appears that the study did not obtain complete data on all of them, and the multivariate analyses used require complete data, and so there were only 316 matched pairs used in the final analyses. This represents 71.2% of the 444 homicides. It is very difficult to accept that "all" fairly describes this 71.2%.

This does not prove that there was any selection or response bias in this study, it just shows that there was room for such biases to act. It also shows that the authors avoided coming to grips with this issue and misled the readers into thinking that there could be little or no such bias.


Appendix on the Case Control method.

The case control technique is described in:

Designing Clinical Research Stephen B. Hulley & Steven R. Cummings, editors Williams & Wilkins 1988

Here are some quote from relevant sections, with some notes of mine on how it applies to the current topic.

Chapter 8 Designing a New Study: II. Cross-sectional and Case-control Studies by Thomas B. Newman, et al.

Case-control Studies are covered on pages 78 - 86 Emphasis marked with _ _ is in the original. [My comments are in square brackets.]

Strengths of Case-Control Studies - two are discussed: Efficiency for rare outcomes Usefulness for generating hypotheses

Weaknesses of case-control studies

Measures of association (Appendix 8A)

Predictor variable Outcome variable present absent present a b absent c d

Relative risk ~~ Odds ratio = ad/bc [~~ is used for wavy =]

[Appendix 8B is "Why the odds ratio can be used as an estimate for relative risk in a case-control study"]

[Appendix 10A is "Hypothetical example of confounding" in which an apparent association between coffee drinking and MI is shown to result from an actual zero association of coffee drinking but instead from a high association of smoking with coffee drinking.]

[This is a very relevant example, because overlooking confounding variables (such a membership in a high-risk group) can easily produce significant but spurious associations in the results. This is easy to demonstrate.]

Appendix - calculation of odds ratio

Gun                 Outcomes
Ownership         dead     alive
Own gun             a        b
No gun              c        d

The odds ratio is ad/bc.

Appendix - The Mantel-Haenszel Chi-square analysis for matched pairs is a special case of their analysis for a stratified sample in case-control studies.

The odds ratio is

B/C where B is the number of pairs where the case has gun ownership and the control doesn't and C is the opposite (disjoint pairs)

Appendix - Justification for existence of sub-group/stratification.

A sociologist colleague lent me a copy of a textbook, Criminology, 2nd ed. by John E. Conklin, 1986, Macmillin Pub. Co. It has a Chapter on "Criminal Careers" about people who commit crimes repeatedly. In this chapter, a section on 'Delinquent Careers' (starting on pg. 308) gives some direct data on subgrouping. Two research studies in different cities are discussed.

A study of a birth cohort (Delinquency in a Birth Cohort, Wolfgang, Figlio and Sellin, 1972) covering males 10 - 18 years old in Philadelphia showed the following results:

Type of Offender   % of Cohort  % of all of Cohort's
                                Police Contacts
Nondelinquents         65.1          0
One-time offenders     16.2         15.8
Nonchronic offenders   12.4         32.3
Chronic offenders       6.3         51.9

Less than 5 contacts counts as nonchronic. They point out that the one-time offender group usually were involved in relatively trivial offenses. Note that 10% of the offenders would account for roughly 2/3 of all police contacts.

Another cohort study in Racine, Wisc. of juveniles and young adults (Shannon, 1982) showed similar concentration with

  5 - 7 %   accounted for over 1/2 of all non-traffic police contacts
   ~20%      "        "    "   80%    "      "         "       "
  5 - 14%    "        "     ALL    of the felony arrests.

These are cohort studies, and are therefor not susceptible to sampling bias and other such problems as many of the other (easier to run) studies. We have the inescapable conclusion that there is subgrouping in the population, with a small fraction of the population accounting for a large portion of serious criminal behavior.

Appendix - population figures

County      population     dates of study    duration     pop-years
            (1990 census)
 Cuyahoga   1412140        1/1/90-10/23/92    2.81 years  3,970,000
 Shelby      826330        10/23/87-92        4 years     3,305,000
 King       1507319          "     "          "   "       6,030,000

Total pop-years: ~13,300,000

World-Wide-Web html format by

Scott Ostrander: