WebForm1
Username:
Password:

Forgot your login or password?

Critically Reading the Literature
A and Understanding Clinical Research

A primer on how to determine what medical literature is valuable and what can be filed in the garbage can.

By J. Robert Mannino, DO, PhD, FACOFP

Literature is the bulletin board of scientific endeavor. It is through the literature that the rest of the world is informed of discoveries and techniques that have the ability to alter or save lives. Yet, not all of the literature is equal quality. In the course of one month’s time, at least 100 articles of varying worth will cross the average practitioner’s desk. In this day and age of publish or perish, less than stellar work appears in print. It has been facetiously said that 75 percent of the literature is not worth the paper on which it is written.

How does one separate the wheat from the chaff? Is the article worth reading? Can I reasonably place any confidence in the conclusions drawn? Is the conclusion based on fact or is it merely the opinion or supposition of the author? How does one critically read the literature?

To look critically at the literature, it is incumbent upon the physician to understand the basic differences in the literature published. Not all literature has the altruistic purpose of disseminating new information. At the zenith of excellence is the category of journal that is peer reviewed and refereed; the nadir is the journal that is a captive publication, neither peer reviewed nor refereed, with the authors paid to produce articles on a theme. There are all manner of gradations from top to bottom; obviously, the best journals are those that are both peer reviewed and refereed. The easiest way to ascertain this is to look in the Index Medicus, if the journal is abstracted in that publication then it is both peer reviewed and refereed.

What about articles in an unrefereed journal? Should you automatically discount them? No, but one must be wary of anything that is in an unrefereed journal. Usually there is a basic flaw in the article that prevents publication in a refereed journal. This flaw may be something simple like inadequate numbers in a study (a study of one) or major like drawing improper or premature conclusions from limited data.

Usually, articles in unrefereed journals are anecdotal opinion, unsubstantiated ‘research,’ or written by staff on the basis of interviews. The unrefereed journal can be a good source of practice and/or patient tips (how to build a practice or how to remove a fishhook, respectively). In general, anything that remotely smacks of research that appears in an unrefereed journal should be approached with extreme caution.

Even within the category of peer reviewed and refereed journal articles, one must be critical of what was done and how it was done. In general, the obvious foibles will be attended to by the review process, these include, but are not limited to, quality and quantity of literature cited, use of textbooks as references, self reference, faulty thought processes, faulty experimental design and unsubstantiated claims. What will not be addressed is conclusions based on data presented, incomplete studies (a portion of an ongoing study is presented, but the total study result is years away), or extrapolation based on presented data sets.

For this evaluation, a rudimentary knowledge of statistics and its application to medicine is necessary. There is a vocabulary that must be understood. These specialized terms include: bias, clinically significant result, contributory cause, control(s), crossover study, dependent variable(s), double blind, experimental design, extraneous variable(s), free extrapolation, independent variable(s), p value, paired study, prospective study, reliability, retrospective study, sample, sensitivity, sham, significance, single blind, specificity, standard deviation, statistically significant result, trend(s), variable(s), variance, and validity. In addition to this vocabulary, the statistical techniques commonly encountered in the literature include: chi square, students t-test, analysis of variance, specificity and sensitivity, linear coefficients, linear regression, multiple regression, matrix analysis, and sequential analysis.

Vocabulary
While it is not the intention of this discourse either to turn the reader off or to make statisticians out of the reader, it is the intent to enable the reader to intelligently evaluate what is read. The following is not designed to be all-inclusive or to take into account ‘what if scenarios’; rather, it is intended to address the commonly used terms and methods of clinically relevant statistics and their use in the literature. First, let us define the vocabulary in terms of working definitions:

Bias: Judgment or opinion formed before fact. The exclusion of women in a hypertension study has the potential of biasing the results.
Clinically significant result: The results obtained have an important impact on patient care.
Contributory cause: The cause precedes the effect and altering the cause alters the effect.
Control(s): Part of an experimental design that serves as the frame of reference and does not participate in the active or experimental treatment.
Crossover study: A type of experimental design in which the active or experimental treatment is switched for the control and vice versa.
Dependent variable(s): The variable of interest.
Double blind: A type of experimental design in which both the investigator and the subject do not know who is receiving the active or experimental treatment.
Experimental design: The study is structured in such a way that it does not contain bias, and the results obtained are statistically significant and have validity and reliability.
Extraneous variable(s): The item or items that can affect the outcome of a study, but are not being studied.
Free extrapolation: Drawing conclusions for one population based on the results of another non-linked population.
Independent variable(s): The item or items that affect the dependent variable and are varied by the experimenter.
Meta analysis: A systematic method that uses statistical analysis to integrate the data from a number of independent studies.
P value: The degree to which the results support the hypothesis.
Paired study: An experimental design in which all study groups are paired, as closely as possible, for known variables.
Prospective study: When treatment and outcome begin after the start of, and due to the study.
Reliability: The results obtained are consistent and repeatable.
Retrospective study: When treatment and outcome have occurred or begin prior to the onset of the study.
Sample: The tested population. The statistically valid small sample is 30.
Sensitivity: In diagnostic tests, the proportion of diseased subjects who have a positive test.
Sham: The process of mimicking a procedure without performing the actual definitive procedure.
Significance: How the results support the hypothesis.
Single blind: A type of experimental design in which only the investigator knows who is receiving the active or experimental treatment.
Specificity: In diagnostic tests, the proportion of disease-free subjects who have a negative test.
Standard deviation: The square root of variance.
Statistically significant result: The results obtained support the hypothesis and occur in a fashion that is more probable than random chance.
Trend(s): The general direction of data, although not meeting all the rigid criteria of statistical significance.
Variable(s): The item or items in a study that are measured.
Variance: The distribution of an individual value versus the value of the center.
Validity: The experimental design measures what it purports to measure.
Now let us define statistical techniques and when they are to be used. Again, this is not all-inclusive and does not address the exceptions; rather, it is aimed at the majority.

Used with one variable:
Chi-square: This test examines the association between a single independent variable and a dependent variable.
Student’s t-test: This test is used for measured variables, in comparing two means.
Used with two or more variables
Analysis of variance: This test allows comparison between more than two sample means.
Linear coefficients: Is the slope of the straight line produced by a linear regression.
Linear regression: Is a statistical treatment of data by which two continuous variables are fitted to
a straight line.
Matrix analysis: Is a mathematical treatment of large quantities of variables in an attempt to ascertain which are independent and which are dependent variables.
Multiple regression: Is a statistical treatment of data by which several independent variables can be used to predict a dependent variable.
Sequential analysis: A form of integrated experimental design and statistical analysis that allows for adjustment to the effect of repeated significance testing.
Specificity and sensitivity: This type of analysis, used primarily with diagnostic testing, allows adjustment for false negatives versus false positives.

Clinical Research
Having given working definitions for the terms and techniques commonly used, what are the pitfalls to look for in clinical research? They basically fall into three categories:

  1. The number of subjects in a study
  2. The experimental design
  3. The statistical method used to ascertain the worth of the results

Perhaps the most flagrant, glaring, abuse committed under the rubric of research is the lack of sufficient numbers in a study. In general, the greater the number in a study, the greater the likelihood of usable results. Obviously, it would be great if all clinical research were conducted with a minimum of 10,000 subjects. This is neither feasible nor desired.

The Helsinki Accords require that a research protocol be terminated if a part of the protocol proves to be detrimental to the subjects. Moreover, if a part of a research treatment is obviously superior in terms of patient safety, comfort, or morbidity, the same Accords require that the project be terminated. Due to the difficulty in amassing large numbers for clinical trials, most studies are done with the statistically significant small population, which is 30. This means in a three-compartment study, there must be a minimum of 90 participants. Clearly, this is not always the case and there are mathematical adjustments that can be made.

In general, the more variation in the results, the greater the number necessary for significance, reliability, and validity. Bottom line: If there are fewer than 30 subjects in a study be wary of the results and how they are interpreted.

Experimental design revolves around how the study was conducted. Ideally, this is an a priori function. The researcher makes every attempt possible to design a study that has only one variable. In basic research, this is relatively simple, because there are genetically homogenous subjects that can be completely controlled, available. In general, clinical research cannot be reduced to one variable. There are many reasons for this. Suffice it to say that since the genetic and environmental make up of individual patients is different, the impact of treatment is different and cannot be necessarily isolated to the treatment given. This mandates the need for multivariable experimental design in most clinical research.

One of the most difficult types of clinical research to evaluate is the retrospective study. The reason for this difficulty is that the data is collected after the fact and the usual safeguards to insure random sampling are absent. Not only is the data in this type of study potentially biased, the conclusions drawn may well suffer from free extrapolation.

Even under the best of circumstances and design, the results of a retrospective study should be used only to identify trends and areas for definitive research. Prospective studies, on the other hand, are conducted with all proper research safeguards in place.

EXEMPLI GRATIA: [Royal College of General Practitioners’ Oral Contraception Study, J. Coll. Gen. Pract. 13(5): 267. 1967]. The thrust of this study was a finding that oral contraception use was associated with a sixfold to tenfold increase in the incidence of thrombophlebitis in women. And furthermore, there was also an increased mortality associated with oral contraceptive usage.

FLAWS: This was a retrospective study, i.e., records were tabulated post facto and conclusions drawn on the basis of those results.

Basically, the study looked at all cases of thromboembolic phenomena and death in women in the United Kingdom over a ten-year period and whether or not the individual had ever taken oral contraceptives. If one looks at the data carefully, it becomes readily apparent that oral contraceptive use was not the study focus; rather, the incidence of oral contraceptive use among those with a malady versus those without that malady. Moreover, free extrapolation was used in the conclusions. This immediately brings into play the concept of cause and effect relationships and the introduction of bias.

To check on the possibility of an actual association of oral contraception use with an increased mortality and morbidity, a prospective, multifaceted study was begun. This study was designed to accurately test all of the questions raised by the above-mentioned British study. The study is known as the Walnut Creek Study and the results have been published. [Walnut Creek Contraceptive Drug Study, J. Reprod. Med. 25(6 Suppl): 345-72 1980 and Long-term Follow-up of Women in the Walnut Creek Study, Obstet Gynecol 70(3 Pt 1): 289-93 1987]. The results of the Walnut Creek Study refute all of the blanket condemnations of oral contraceptive usage. In fact, in the non-smoker, the use of oral contraceptives has a salutary effect. The only verification of the British findings is in a subset of users who are over the age of 35 and also smoke.

The blinding of a protocol is a little understood and often overlooked element of clinical research, but it is integral to a good experimental design. The goal of blinding is to eliminate bias and the so-called placebo effect. Blinding may take many forms. The simplest form is a single blind study in which the subject does not know whether or not he is receiving the active or the control treatment.

Next comes the double blind study in which neither the subject nor the operator knows who is receiving the active or the control treatment. To these may be added a sham study and/or a crossover study. Blinding is one of the more difficult elements of experimental design to implement and is the principle reason that there have been no long-term studies on the efficacy of osteopathic manipulation.

In its simplest terms, experimental design is aimed at using a particular statistical treatment. The choice of analysis is dependent, among other things. On the number of subjects, the number of variables, the length of the study, and the distribution of results. Ordinarily, only one method of analysis will be used; however, as clinical research becomes more sophisticated or involved, especially multicenter studies, multiple methods may have to be used. Next are the guidelines for the statistical methods used.

If chi-square or students t-test is the statistical method of a clinical research article, reject the work. These statistical methods are used primarily in basic, not clinical research where the experimental design can be limited to one variable. This is type of analysis can ascertain probable trend, but not significance in multivariable trials.

A p of 0.025 or less is considered significant for biological systems; however, this level of significance does not address the inherent variability of living systems. The ‘gold standard’ for statistical evaluation of good clinical data is either analysis of variance or sequential analysis. A p of 0.01 or less assures that the results obtained are the product of the treatment tested and not due to sampling variation (chance). Caveat: The clinical implication of a treatment is a clinical and not a statistical decision; therefore, if the difference in outcome of a treatment is imperceptible, no level of p can alter that fact.

If the data does not meet the criteria for these gold standard tests, then lesser statistical methods may be used which may result in significant results. This significance, however, does not assure that the association observed is the result of the treatment and not sampling variation.

Among these statistical modalities are linear coefficients, linear regressions, multiple regressions, etc. Be wary of coefficients of correlation less than 0.75 (ideal is 1.0; if 0.85 to 1.0 then a gold standard test would have been used). As the correlation becomes lower than 0.85, p must be less than 0.001 to insure that based on the sample data the observations are the result of the treatment and not sample variation.

When multiple factors are involved, matrix analysis is the statistical treatment of choice. This method demonstrates trends only.

It does not take any great intellect to ascertain that a article depicting a clinical study of 10 male patients with disease ‘A’, subjected to treatment ‘B’, analyzed by the students t-test, and touted as proper for the general population is not good research and should not cause the reader to alter from accepted norms of treatment. Remember; just because it is in print does not necessarily make it true!

EXEMPLI GRATIA: [A Difference in Hypothalamic Structure Between Heterosexual and Homosexual Men. Simon LeVay Science 253(35): 1034-1037. 1991]. This study purports to have found an area in the hypothalamus that is different in the homosexual versus that of the heterosexual. On the basis of this finding, it is proposed that there is a genetic/anatomic reason for homosexuality. FLAWS: The evidence was garnered on the basis of autopsy. There are very few numbers in the study, well below the statistically significant small number of 30. All of the homosexual men (19) in the study died of acquired immunodeficiency syndrome (AIDS), which is caused by a retrovirus that has the capability of altering neuronal structure. Six of the sixteen presumed heterosexual men died of AIDS.

The findings in healthy individuals may be significantly different. The heterosexuals in the study were deemed heterosexual by lack of sexual preference data in the chart. There is significant variation in both groups, leading one to wonder whether or not the results obtained were due to inherent variation or actual difference. Women, due to the way the clinical material was obtained, were excluded from the study. Although there are problems with this study, it may have identified trends that deserve further, definitive investigation.

Armed with this knowledge, one may now critically look at the literature and make informed decisions as to whether or not what is read is worth incorporating into patient care. Moreover, when encountering ‘This was in the literature’ one can now ask the pertinent questions about the worth of the information.


J. Robert Mannino, DO, PhD, FACOFP, practices in Coral Springs, Florida and can be contacted via e-mail at jrmannino@worldnet.att.net.

    Selected Readings

    • Intuitive Biostatistics, H. Motulsky, Oxford University Press, UK, 1995.
    • Statistics Applied to Clinical Trials, T. F. Cleophas, et al., Kluwer Academic Publishers, New York, NY, 2002.Practical Statistics for Medical Research, D.G. Altman, CRC Press, Boca Raton, FL, 1990.Clinical Trials and Human Research, F.A. Rozovsky and R. K. Adams, Jossey-Bass Publisher, San Francisco, CA, 2003.Design and Analysis of Experiments, 5th Edition, D. C. Montgomery, John Wiley & Son, Hoboken, NJ, 2000Using Multivariate Statistics, 4th Edition, B. G. Tabachinick, et al., Pearson Allyn & Bacon, Upper Saddle River, NJ, 2000.Handbook of Parametric and Nonparametric Statistical Procedures, 2nd Edition, D.J. Sheskin, CRC Press, Boca Raton, FL, 2000.
    • Handbook of Statistical Analyses Using SAS, 2nd Edition, G, Der and B.S. Everitt, CRC Press, Boca Raton, FL, 2001.