The Concept of Statistical Significance Testing

A frequently asked question of:
As a postsecondary student who is studying inferential statistics, I need documentation to help me to understand the controversy that surrounds statistical significance testing.

What is the proper role of tests of statistical significance in social science research?

What are complementary methods to statistical significance testing for determining the replicability of results in research experiments?

Introduction to the Logic and Process of Significance Testing:

1. Set up a null hypothesis and an alternative about the population or populations.

2. Set up an alpha level. An alpha level is the probability level you view as low enough to constitute evidence that there is a contradiction between the data and the assumption that the null hypotheis is true in the population (often alpha is set at .05 in the behavioral sciences).

3. Gather data from a sample.

4. Compute the value of a test statistic based on the sample data.

5. Compute the probability of the value of the test statistic in Step 4 under the assumption that the null is true (usually given in a table or as part of a print-out).

6. If the probability in step 5 is less than alpha selected in step 2, then conclude that there is an inconstancy between the null hypothesis and the data. You can then reject the null hypothesis in favor of the alternative hypothesis and state that the results are statistically significant.

If the probability is greater than the alpha level, then conclude that the sample data is consistent with the null hypothesis. You must then fail to reject the null hypothesis and state that the results are not statistically significant.

Note that statistical significance is not the same as practical significance. For example, the null hypothesis is often something like

population_mean_1 = population_mean_2

Rejecting this null hypothesis only indicates that the sample data imply that there is some difference in the population; however, that difference may be small and unimportant.

The Concept of Statistical Significance Testing [1994] - Bruce Thompson

Inappropriate Statistical Practices in Counseling Research: Three Pointers for Readers of Research Literature. [1995] - Bruce Thompson

Pitfalls of Data Analysis [1996] - Clay Helberg


Statistical Methods in Psychology Journals: Guidelines and Explanations [1999] - Leland Wilkinson and Task Force on Statistical Inference, American Psychological Association Board of Scientific Affairs

Ways to Explore the Replicability of Multivariate Results (Since Statistical Significance Testing Does Not) [1997] - Frederick J. Kier, Texas A&M University

Commentaries on Significance Testing [1997] - David F. Parkhurst (Ed.), School of Public and Environmental Affairs, Indiana University

Statistical Analysis [Collection] - from the Full Text Library of the ERIC Clearinghouse on Assessment & Evaluation

Dynamic (i.e., Live) Searches of the ERIC Documents Database

The following search options employ the electronic Thesaurus of ERIC Descriptors as the search interface for the ERIC documents database. If you would like to ensure a current bibliography of ERIC documents for any of the given sub-topics, then we highly recommend that you pursue this option. Also, once you have selected a search option, you may edit the given strategy to introduce concepts into the search to accommodate your specific needs; or, you can build an entirely new search strategy in the ERIC Search Wizard. For a short, selective bibliography for an introduction to the use of statistical significance testing in educational research, please see the Selected ERIC Documents Citations below.

  • Dynamic Search of the ERIC Database for an Overview of the Controversy Pertaining to the Use & Misuse of Statistical Significance Testing

  • Dynamic Search of the ERIC Database for Methods for the Determination of Statistical Significance

  • Dynamic Search of the ERIC Database for Techniques to Complement Statistical Significance in the Determination of the Likelihood of Research Replication

Selected ERIC Documents Citations for an Overview of the Controversy Pertaining to the Use & Misuse of Statistical Significance Testing
Although less current and less plentiful than the Dynamic Search options (above), this selective bibliography presents ERIC documents citations that have been preselected on the basis of the thoroughness and/or authority and/or uniqueness that they reflect.

	Chow, S.L. (1996). Statistical significance: rationale, validity and utility. 
Thousand Oaks, CA: Sage.

	Harlow, L.L., Mulaik, S.A., & Steiger, J.H. (Eds.). (1997). What if there
were no significance tests? (Multivariate Applications Book Series). Mahwah, NJ:
Lawrence Erlbaum Associates.

	McLean, J.E. & Kaufman, A.S. (Eds.). (1998). Statistical significance testing
[Special Issue]. Research in the Schools, 5(2). Birmingham, AL: Mid-South
Educational Research Association.

	Mohr, L.B. (1990). Understanding significance testing. (Quantitative
Applications in the Social Sciences No. 07-073). Newbury Park: Sage.

	New ways in statistical methodology: from significance tests to Bayesian
inference. (1998). (European University Studies Series VI, Psychology). New York:
P. Lang.

American Educational Research Association - Division D: Measurement and Research Methodology [AERA-D]

American Statistical Association - Social Statistics Section

