>
![]() |
|
|
|
|
![]() ERIC Documents Database Citations & Abstracts for The Concept of Statistical Significance TestingInstructions for ERIC Documents Access
Search Strategy:Statistical Significance [ERIC Descriptor, with heavily weighted status]AND Research Methodology OR Educational Research OR Educational History OR Statistical Inference OR Statistical Analysis OR Hypothesis Testing OR Null Hypothesis [ERIC Descriptors/Identifiers]
ED419023 TM028329
Five Methodology Errors in Educational Research: The Pantheon of
Statistical Significance and Other Faux Pas.
Thompson, Bruce
1998
102p.; Paper presented at the Annual Meeting of the American
Educational Research Association (San Diego, CA, April 13-17, 1998).
Document Type: EVALUATIVE REPORT (142); CONFERENCE PAPER (150)
After presenting a general linear model as a framework for
discussion, this paper reviews five methodology errors that occur in
educational research: (1) the use of stepwise methods; (2) the
failure to consider in result interpretation the context specificity
of analytic weights (e.g., regression beta weights, factor pattern
coefficients, discriminant function coefficients, canonical function
coefficients) that are part of all parametric quantitative analyses;
(3) the failure to interpret both weights and structure coefficients
as part of result interpretation; (4) the failure to recognize that
reliability is a characteristic of scores, and not of tests; and (5)
the incorrect interpretation of statistical significance and the
related failure to report and interpret the effect sizes present in
all quantitative analysis. In several cases small heuristic
discriminant analysis data sets are presented to make the discussion
of each of these five methodology errors more concrete and accessible.
Four appendixes contain computer programs for some of the analyses.
(Contains 19 tables, 1 figure, and 143 references.) (SLD)
Descriptors: *Educational Research; *Effect Size; *Research
Methodology; Scores; *Statistical Significance; Tables (Data); *Test
Reliability
Identifiers: Stepwise Regression; *Weighting (Statistical)
ED416214 TM028066
Why "Encouraging" Effect Size Reporting Isn't Working: The Etiology
of Researcher Resistance to Changing Practices.
Thompson, Bruce
1998
18p.; Paper presented at the Annual Meeting of the Southwest
Educational Research Association (Houston, TX, January 1998).
Document Type: PROJECT DESCRIPTION (141); CONFERENCE PAPER (150)
Given decades of lucid, blunt admonitions that statistical
significance tests are often misused, and that the tests are somewhat
limited in utility, what is needed is less repeated bashing of
statistical tests, and some honest reflection regarding the etiology
of researchers' denial and psychological resistance (sometimes
unconscious) to improved practice. Three etiologies are briefly
explored here: (1) atavism; (2) "is/ought" logic fallacies; and (3)
confusion/desperation. Understanding the etiology of psychological
resistance may ultimately lead to improved interventions to assist in
overcoming researcher resistance to reporting effect sizes and using
non-nil nulls and other analytic improvements. (Contains 45
references.) (Author)
Descriptors: Attitudes; Change; Denial (Psychology); *Educational
Research; *Effect Size; *Etiology; *Research Methodology;
*Researchers; *Statistical Significance
ED408302 TM026504
Use of Tests of Statistical Significance and Other Analytic Choices
in a School Psychology Journal: Review of Practices and Suggested
Alternatives.
Snyder, Patricia A.; Thompson, Bruce
24 Jan 1997
25p.; Paper presented at the Annual Meeting of the Southwest
Educational Research Association (Austin, TX, January 24, 1997).
Document Type: EVALUATIVE REPORT (142); CONFERENCE PAPER (150)
The use of tests of statistical significance was explored, first by
reviewing some criticisms of contemporary practice in the use of
statistical tests as reflected in a series of articles in the
"American Psychologist" and in the appointment of a "Task Force on
Statistical Inference" by the American Psychological Association
(APA) to consider recommendations leading to improved practice.
Related practices were reviewed in seven volumes of the "School
Psychology Quarterly," an APA journal. This review found that some
contemporary authors continue to use and interpret statistical
significance tests inappropriately. The 35 articles reviewed
reported a total of 321 statistical tests for which sufficient
information was provided for effect sizes to be computed, but authors
of only 19 articles did report various magnitudes of effect indices.
Suggestions for improved practice are explored, beginning with the
need to interpret statistical significance tests correctly, using
more accurate language, and the need to report and interpret
magnitude of effect indices. Editorial policies must continue to
evolve to require authors to meet these expectations. (Contains 50
references.) (SLD)
Descriptors: Educational Psychology; *Educational Research; *Effect
Size; Elementary Secondary Education; Research Methodology; Research
Reports; *Scholarly Journals; *School Psychologists; Statistical
Inference; *Statistical Significance; Test Interpretation; *Test Use
Identifiers: American Psychological Association
EJ565847 TM520973
Statistical Significance Testing Practices in "The Journal of
Experimental Education."
Thompson, Bruce; Snyder, Patricia A.
Journal of Experimental Education, v66 n1 p75-83 Fall
1997
ISSN: 0022-0973
Document Type: JOURNAL ARTICLE (080); EVALUATIVE REPORT (142)
The use of three aspects of recommended practice (language use,
replicability analyses, and reporting effect sizes) was studied in
quantitative reports in "The Journal of Experimental Education" (JXE)
for the academic years 1994-95 and 1995-96. Examples of both errors
and desirable practices in the use and reporting of statistical
significance tests in JXE are noted. (SLD)
Descriptors: *Effect Size; *Language Usage; *Research Methodology;
Research Reports; Scholarly Journals; *Statistical Significance
Identifiers: *Research Replication
EJ564700 TM520936
Statistical Significance: Rationale, Validity and Utility book
review.
Simon, Marilyn K.
Canadian Journal of Program Evaluation/La Revue canadienne
d'evaluation de programme, v12 n2 p189-90 Aut 1997
ISSN: 0834-1516
Document Type: BOOK-PRODUCT REVIEW (072); JOURNAL ARTICLE (080)
Review states that the book gives an examination of the null-
hypothesis significance test procedure as an integral component of
data analysis of quantitative research studies in the social sciences.
It is designed for the nonmathematics student who will be doing
empirical studies involving the testing of substantive hypotheses. (SLD)
Descriptors: *Hypothesis Testing; *Research Methodology; Research
Utilization; *Social Science Research; *Statistical Significance;
*Validity
Identifiers: *Null Hypothesis
EJ559584 EC618689
Debunking the Myth of the "Highly Significant" Result: Effect Sizes
in Gifted Education Research.
Plucker, Jonathan A.
Roeper Review, v20 n2 p122-26 Dec 1997
ISSN: 0278-3193
Document Type: JOURNAL ARTICLE (080); RESEARCH REPORT (143)
Describes the utility of effect size reporting and reports on a
study that reviewed articles in three quarterly gifted journals and
40 articles in journals not directly associated with gifted
education, published over the last five years. Effect sizes were
generally not included in research articles, with results consistent
across journals. (Author/CR)
Descriptors: Educational Research; *Effect Size; *Gifted; *Research
Methodology; *Scholarly Journals; *Statistical Significance;
Technical Writing
EJ551464 UD520180
Rejoinder: Editorial Policies Regarding Statistical Significance
Tests: Further Comments.
Thompson, Bruce
Educational Researcher, v26 n5 p29-32 Jun-Jul 1997
Document Type: JOURNAL ARTICLE (080); POSITION PAPER (120)
Argues that describing results as "significant" rather than
"statistically significant" is confusing to the very people most apt
to misinterpret this telegraphic wording. The importance of
reporting the effect size and the value of both internal and external
replicability analyses are stressed. (SLD)
Descriptors: *Editing; *Educational Research; *Effect Size;
Scholarly Journals; *Statistical Significance; Test Use; *Writing for
Publication
Identifiers: *Research Replication
EJ551463 UD520179
Reflections on Statistical and Substantive Significance, with a
Slice of Replication.
Robinson, Daniel H.; Levin, Joel R.
Educational Researcher, v26 n5 p21-26 Jun-Jul 1997
Document Type: JOURNAL ARTICLE (080); EVALUATIVE REPORT (142)
Proposes modifications to the recent suggestions by B. Thompson
(1996) for an American Educational Research Association editorial
policy on statistical significance testing. Points out that,
although it is useful to include effect sizes, they can be
misinterpreted, and argues, as does Thompson, for greater attention
to replication in educational research. (SLD)
Descriptors: *Editing; *Educational Research; *Effect Size;
Research Methodology; Research Reports; Scholarly Journals;
*Statistical Significance; *Test Use; Writing for Publication
Identifiers: *Research Replication
EJ541829 SE557633
A Note on p-Values.
Evans, Gwyn
Teaching Statistics, v19 n1 p22-23 Spr 1997
ISSN: 0141-982X
Document Type: TEACHING GUIDE (052); JOURNAL ARTICLE (080)
Demonstrates the advantages of a p-value as compared with a
standard significance test procedure. Contains examples in the
discussion of testing the mean of a normal distribution and testing a
probability of proportion. (DDR)
Descriptors: British National Curriculum; Educational Strategies;
Foreign Countries; Higher Education; *Probability; *Ratios
(Mathematics); *Statistical Significance; *Statistics
Identifiers: Great Britain
ED415265 TM027966
Has Testing for Statistical Significance Outlived Its Usefulness?
McLean, James E.; Ernest, James M.
1997
21p.; Paper presented at the Annual Meeting of the Mid-South
Educational Research Association (26th, Memphis, TN, November 12-14,
1997).
Document Type: EVALUATIVE REPORT (142); CONFERENCE PAPER (150)
The research methodology literature in recent years has included a
full frontal assault on statistical significance testing. An entire
edition of "Experimental Education" explored this controversy. The
purpose of this paper is to promote the position that while
significance testing by itself may be flawed, it has not outlived its
usefulness. However, it must be considered in combination with other
criteria. Specifically, statistical significance is but one of three
criteria that must be demonstrated to establish a position
empirically. Statistical significance merely provides evidence that
an event did not happen by chance. However, it provides no
information about the meaningfulness (practical significance) of an
event or if the event is replicable. Consequently, statistical
significance testing must be accompanied by judgments of the event's
practical significance and replicability. However, the likelihood of
a chance occurrence of an event must not be ignored. It is
acknowledged that the importance of significance testing is reduced
as sample size increases. In large sample experiments, particularly
those involving multiple variables, the role of significance testing
is diminished because even small differences are often statistically
significant. In small sample studies where assumptions such as
random sampling are practical, significance testing can be quite
useful. It is important to remember that statistical significance is
but one criterion useful to inferential researchers. In addition to
statistical significance, practical significance, and replicability,
researchers must also consider Type II errors and sample size.
Furthermore, researchers should not ignore other techniques such as
confidence intervals. While all of these statistical concepts are
related, they provide different types of information that assist
researchers in making decisions. (Contains 30 references.)
(Author/SLD)
Descriptors: Criteria; Decision Making; *Research Methodology;
*Sample Size; *Statistical Significance; *Test Use
Identifiers: *Research Replication
ED413342 TM027613
If Statistical Significance Tests Are Broken/Misused, What
Practices Should Supplement or Replace Them?
Thompson, Bruce
1997
32p.; Paper presented at the Annual Meeting of the American
Psychological Association (105th, Chicago, IL, August 1997).
Document Type: POSITION PAPER (120); CONFERENCE PAPER (150)
Given some consensus that statistical significance tests are
broken, misused, or at least have somewhat limited utility, the focus
of discussion within the field ought to move beyond additional
bashing of statistical significance tests, and toward more
constructive suggestions for improved practice. Five suggestions for
improved practice are recommended: (1) required reporting of effect
sizes; (2) reporting of effect sizes in an interpretable manner; (3)
explicating the values that bear on results; (4) providing evidence
of result replicability; and (5) reporting confidence intervals.
Although the five recommendations can be followed even if statistical
significance tests are reported, social science will proceed most
rapidly when research becomes the search for replicable effects
noteworthy in magnitude in the context of both the inquiry and
personal or social values. (Contains 1 table and 74 references.)
(Author/SLD)
Descriptors: *Effect Size; *Research Methodology; *Statistical
Significance; *Test Use
Identifiers: *Confidence Intervals (Statistics); *Research
Replication
ED408336 TM026589
Statistical Significance Testing in "Educational and Psychological
Measurement" and Other Journals.
Daniel, Larry G.
Mar 1997
33p.; Paper presented at the Annual Meeting of the National Council
on Measurement in Education (Chicago, IL, March 25-27, 1997).
Document Type: EVALUATIVE REPORT (142); CONFERENCE PAPER (150)
Statistical significance tests (SSTs) have been the object of much
controversy among social scientists. Proponents have hailed SSTs as
an objective means for minimizing the likelihood that chance factors
have contributed to research results. Critics have both questioned
the logic underlying SSTs and bemoaned the widespread misapplication
and misinterpretation of the results of these tests. This paper
offers a framework for remedying some of the common problems
associated with SSTs via modification of journal editorial policies.
The controversy surrounding SSTs is reviewed, with attention given to
both historical and more contemporary criticisms of bad practices
associated with misuse of SSTs. Examples from the editorial policies
of "Educational and Psychological Measurement" and several other
journals that have established guidelines for reporting results of
SSTs are discussed, and suggestions are provided regarding additional
ways that educational journals may address the problem. These
guidelines focus on selecting qualified editors and reviewers,
defining policies about use of SSTs that are in line with those of
the American Psychological Association, and stressing effect size
reporting. An appendix presents a manuscript review form. (Contains
61 references.) (Author/SLD)
Descriptors: Editing; *Educational Assessment; Policy; Research
Problems; *Scholarly Journals; *Social Science Research; *Statistical
Significance; *Test Use
Identifiers: *Educational and Psychological Measurement
ED408303 TM026505
Use of Statistical Significance Tests and Reliability Analyses in
Published Counseling Research.
Thompson, Bruce; Snyder, Patricia A.
25 Mar 1997
24p.; Paper presented at the Annual Meeting of the American
Educational Research Association (Chicago, IL, March 1997).
Document Type: EVALUATIVE REPORT (142); CONFERENCE PAPER (150)
The mission of the "Journal of Counseling and Development" (JCD)
includes the attempt to serve as a "scholarly record of the
counseling profession" and as part of the "conscience of the
profession." This responsibility requires the willingness to engage
in self-study. This study investigated two aspects of research
practice in 25 quantitative studies reported in 1996 JCD issues, the
use and interpretation of statistical significance tests, and the
meaning of and ways of evaluating the score reliabilities of measures
used in substantive research inquiry. Too many researchers have
persisted in equating result improbability with result value, and too
many have persisted in believing that statistical significance
evaluates result replicability. In addition, too many researchers
have persisted in believing that result improbability equals the
magnitude of study effects. Authors must consistently begin to
report and interpret effect sizes to aid the interpretations they
make and those made by their readers. With respect to score
reliability evaluation, more authors need to recognize that
reliability inures to specific sets of scores and not to the test
itself. Thirteen of the JCD articles involved reports of score
reliability in previous studies and eight reported reliability
coefficients for both previous scores and those in hand. These
findings suggest some potential for improved practice in the
quantitative research reported in JCD and improved editorial policies
to support these changes. (Contains 39 references.) (SLD)
Descriptors: *Counseling; Educational Research; *Effect Size;
Evaluation Methods; Reliability; Research Methodology; *Research
Reports; *Scholarly Journals; Scores; *Statistical Significance;
*Test Use
Identifiers: *Journal of Counseling and Development; Research
Replication
ED407423 TM026445
Ways To Explore the Replicability of Multivariate Results (Since
Statistical Significance Testing Does Not).
Kier, Frederick J.
23 Jan 1997
17p.; Paper presented at the Annual Meeting of the Southwest
Educational Research Association (Austin, TX, January 23-25, 1997).
Document Type: EVALUATIVE REPORT (142); CONFERENCE PAPER (150)
It is a false, but common, belief that statistical significance
testing evaluates result replicability. In truth, statistical
significance testing reveals nothing about results replicability.
Since science is based on replication of results, methods that assess
replicability are important. This is particularly true when
multivariate methods, which capitalize on sampling error, are used.
This paper explores three methods that can give an idea of the
replicability of results in multivariate analysis without having to
repeat the study. The first method is cross validation, a
replication technique in which the entire sample is first run through
the planned analysis and then the sample is randomly split into two
unequal parts so that separate analyses are done on each half. The
jackknife is a second method of replicability that relies on
partitioning out the impact or effect of a particular subset of the
data on an estimate derived from the total sample. The bootstrap, a
third method of studying replicability, involves copying the data set
into an infinitely large "mega" data set. Many different samples are
then drawn from the file and results are computed separately for each
sample and then averaged. The main drawback of all these internal
replicability procedures is that their results are all based on the
data from the one sample being analyzed. However, internal
replication techniques are better than not addressing the issue at
all. (Contains 18 references.) (SLD)
Descriptors: Evaluation Methods; *Multivariate Analysis; *Sampling;
*Statistical Significance
Identifiers: Bootstrap Methods; Cross Validation; Jackknifing
Technique; *Research Replication
EJ535148 TM519805
What to Do with the Upward Bias in R Squared: A Comment on Huberty.
Snijders, Tom A. B.
Journal of Educational and Behavioral Statistics, v21 n3 p283-98
Fall 1996
These articles comment on a recent article by Carl J. Huberty
(1994), "A Note on Interpreting an R-squared Value," Journal of
Educational and Behavioral Statistics, v19, p351-56.
ISSN: 1076-9986
Document Type: BOOK-PRODUCT REVIEW (072); EVALUATIVE REPORT (142);
JOURNAL ARTICLE (080)
Two commentaries describe some shortcomings of a recent discussion
of the significance testing of R-squared by C. J. Huberty and upward
bias in the statistic. Both propose some modifications. A response
by Huberty acknowledges the importance of the exchange of ideas in
the field of data analysis. (SLD)
Descriptors: *Bias; *Correlation; *Effect Size; *Regression (
Statistics); *Statistical Significance
EJ533527 TM519729
Practical Significance: A Concept Whose Time Has Come.
Kirk, Roger E.
Educational and Psychological Measurement, v56 n5 p746-59 Oct
1996
Article based on the presidential address delivered to the
Southwestern Psychological Association meeting (Houston, TX, April 5,
1996).
ISSN: 0013-1644
Document Type: EVALUATIVE REPORT (142); CONFERENCE PAPER (150);
JOURNAL ARTICLE (080)
Practical significance is concerned with whether a research result
is useful in the real world. The use of procedures to supplement the
null hypothesis significance test in four journals of the American
Psychological Association is examined, and an approach to assessing
practical significance is presented. (SLD)
Descriptors: *Educational Research; *Hypothesis Testing; *Research
Utilization; Sampling; *Scholarly Journals; *Statistical Significance
Identifiers: American Psychological Association; *Null Hypothesis;
*Practical Significance
EJ525478 UD519259
AERA Editorial Policies Regarding Statistical Significance Testing:
Three Suggested Reforms.
Thompson, Bruce
Educational Researcher, v25 n2 p26-30 Mar 1996
ISSN: 0013-189X
Document Type: EVALUATIVE REPORT (142); JOURNAL ARTICLE (080)
Reviews practices regarding tests of statistical significance and
policies of the American Educational Research Association (AERA).
Decades of misuse of statistical significance testing are described,
and revised editorial policies to improve practice are highlighted.
Correct interpretation of statistical tests, interpretation of effect
sizes, and exploration of research replicability are essential. (SLD)
Descriptors: *Editing; Educational Research; *Effect Size;
*Statistical Significance; Test Interpretation; *Test Use
Identifiers: American Educational Research Association; *Editorial
Policy; *Research Replication
EJ520936 TM519322
The Impact of Data-Analysis Methods on Cumulative Research
Knowledge: Statistical Significance Testing, Confidence Intervals,
and Meta-Analysis.
Schmidt, Frank; Hunter, John E.
Evaluation and the Health Professions, v18 n4 p408-27 Dec
1995
Special issue titled "The Meta-Analytic Revolution in Health
Research: Part II."
ISSN: 0163-2787
Document Type: EVALUATIVE REPORT (142); JOURNAL ARTICLE (080)
It is argued that point estimates of effect sizes and confidence
intervals around these point estimates are more appropriate
statistics for individual studies than reliance on statistical
significance testing and that meta-analysis is appropriate for
analysis of data from multiple studies. (SLD)
Descriptors: *Effect Size; Estimation (Mathematics); *Knowledge
Level; *Meta Analysis; *Research Methodology; *Statistical
Significance; Test Use
Identifiers: *Confidence Intervals (Statistics)
ED393939 TM024976
Understanding the Sampling Distribution and Its Use in Testing
Statistical Significance.
Breunig, Nancy A.
9 Nov 1995
25p.; Paper presented at the Annual Meeting of the Mid-South
Educational Research Association (Biloxi, MS, November 1995).
Document Type: EVALUATIVE REPORT (142); CONFERENCE PAPER (150)
Despite the increasing criticism of statistical significance
testing by researchers, particularly in the publication of the 1994
American Psychological Association's style manual, statistical
significance test results are still popular in journal articles. For
this reason, it remains important to understand the logic of
inferential statistics. A fundamental concept in inferential
statistics is the sampling distribution. This paper explains the
sampling distribution and the Central Limit Theorem and their role in
statistical significance testing. Included in the discussion is a
demonstration of how computer applications can be used to teach
students about the sampling distribution. The paper concludes with
an example of hypothesis testing and an explanation of how the
standard deviation of the sampling distribution is either calculated
based on statistical assumptions or is empirically estimated using
logics such as the "bootstrap." These concepts are illustrated
through the use of hand generated and computer examples. An appendix
displays five computer screens designed to teach these topics.
(Contains 1 table, 4 figures, and 20 references.) (Author/SLD)
Descriptors: *Computer Uses in Education; *Educational Research;
*Hypothesis Testing; *Sampling; Statistical Distributions;
Statistical Inference; *Statistical Significance; Test Results
Identifiers: Bootstrap Methods; *Central Limit Theorem
ED392819 TM024458
Editorial Policies Regarding Statistical Significance Testing:
Three Suggested Reforms.
Thompson, Bruce
8 Nov 1995
24p.; Paper presented at the Annual Meeting of the Mid-South
Educational Research Association (Biloxi, MS, November 1995).
Document Type: POSITION PAPER (120); CONFERENCE PAPER (150)
Editorial practices revolving around tests of statistical
significance are explored. The logic of statistical significance
testing is presented in an accessible manner--many people who use
statistical tests might not place such a premium on them if they knew
what the tests really do, and what they do not do. The etiology of
decades of misuse of statistical tests is explored, highlighting the
bad implicit logic of persons who misuse statistical tests. Finally,
three revised editorial policies that would improve conventional
practice are discussed. The first is the use of better language,
with insistence on universal use of the phrase "statistical
significance" to emphasize that the common meaning of "significant"
has nothing to do with results being important. A second improvement
would be emphasizing effect size interpretation, and a third would be
using and reporting strategies that evaluate the replicability of
results. Internal replicability analyses such as cross validation,
the jackknife, or the bootstrap would help determine whether results
are stable across sample variations. (Contains 51 references.)
(Author/SLD)
Descriptors: *Editing; *Educational Assessment; *Effect Size;
Quality Control; *Research Methodology; *Statistical Significance;
*Test Use
Identifiers: Bootstrap Methods; Cross Validation; Jackknifing
Technique; *Research Replication
ED382639 TM023069
Effect Size as an Alternative to Statistical Significance Testing.
McClain, Andrew L.
Apr 1995
18p.; Paper presented at the Annual Meeting of the American
Educational Research Association (San Francisco, CA, April 18-22,
1995).
Document Type: REVIEW LITERATURE (070); CONFERENCE PAPER (150)
The present paper discusses criticisms of statistical significance
testing from both historical and contemporary perspectives.
Statistical significance testing is greatly influenced by sample size
and often results in meaningless information being over-reported.
Variance-accounted-for-effect sizes are presented as an alternative
to statistical significance testing. A review of the "Journal of
Clinical Psychology" (1993) reveals a continued reliance on
statistical significance testing on the part of researchers.
Finally, scatterplots and correlation coefficients are presented to
illustrate the lack of linear relationship between sample size and
effect size. Two figures are included. (Contains 24 references.)
(Author)
Descriptors: Correlation; *Effect Size; Research Methodology;
*Sample Size; *Statistical Significance; *Testing
Identifiers: Scattergrams; *Variance (Statistical)
EJ481563 EC608481
Interpretation of Statistical Significance Testing: A Matter of
Perspective.
McClure, John; Suen, Hoi K.
Topics in Early Childhood Special Education, v14 n1 p88-100 Spr
1994
Theme Issue: Methodological Issues and Advances.
ISSN: 0271-1214
Document Type: JOURNAL ARTICLE (080); POSITION PAPER (120)
Target Audience: Researchers
This article compares three models that have been the foundation
for approaches to the analysis of statistical significance in early
childhood research--the Fisherian and the Neyman-Pearson models (both
considered "classical" approaches), and the Bayesian model. The
article concludes that all three models have a place in the analysis
of research results. (JDD)
Descriptors: *Bayesian Statistics; Early Childhood Education;
Educational Research; *Hypothesis Testing; Models; *Research
Methodology; Statistical Analysis; *Statistical Significance
ED367678 TM021117
Historical Origins of Contemporary Statistical Testing Practices:
How in the World Did Significance Testing Assume Its Current Place in
Contemporary Analytic Practice?
Weigle, David C.
Jan 1994
18p.; Paper presented at the Annual Meeting of the Southwest
Educational Research Association (San Antonio, TX, January 27, 1994).
Document Type: EVALUATIVE REPORT (142); CONFERENCE PAPER (150)
The purposes of the present paper are to address the historical
development of statistical significance testing and to briefly
examine contemporary practices regarding such testing in the light of
these historical origins. Precursors leading to the advent of
statistical significance testing are examined as are more recent
controversies surrounding the issue. As the etiology of current
practice is explored, it will become more apparent whether current
practices evolved from deliberative judgment or merely developed from
happenstance that has become reified in routine. Examination of the
history of analysis suggests that the development of statistical
significance testing has indeed involved a degree of deliberative
judgment. It may be that the time for significance testing came and
went, but there is no doubt that significance testing served as an
important catalyst for the growth of science in the 20th century.
(Contains 39 references.) (Author/SLD)
Descriptors: *Data Analysis; Educational History; Etiology;
*Research Methodology; *Scientific Research; *Statistical
Significance; *Testing
EJ475203 TM517631
Statistical Significance Testing from Three Perspectives and
Interpreting Statistical Significance and Nonsignificance and the
Role of Statistics in Research.
Levin, Joel R.; And Others
Journal of Experimental Education, v61 n4 p378-93 Sum
1993
Theme issue title: "Statistical Significance Testing in
Contemporary Practice: Some Proposed Alternatives with Comments from
Journal Editors."
ISSN: 0022-0973
Document Type: COLLECTION (020); POSITION PAPER (120); JOURNAL
ARTICLE (080)
Journal editors respond to criticisms of reliance on statistical
significance in research reporting. Joel R. Levin ("Journal of
Educational Psychology") defends its use, whereas William D. Schafer
("Measurement and Evaluation in Counseling and Development")
emphasizes the distinction between statistically significant and
important. William Asher ("Journal of Experimental Education")
comments on preceding discussions. (SLD)
Descriptors: Editing; Editors; Educational Assessment; *Educational
Research; Elementary Secondary Education; Higher Education;
Hypothesis Testing; *Research Methodology; Research Reports;
Scholarly Journals; *Statistical Significance; Statistics
EJ475198 TM517626
What Statistical Significance Testing Is, and What It Is Not.
Shaver, James P.
Journal of Experimental Education, v61 n4 p293-316 Sum
1993
Theme issue title: "Statistical Significance Testing in
Contemporary Practice: Some Proposed Alternatives with Comments from
Journal Editors."
ISSN: 0022-0973
Document Type: EVALUATIVE REPORT (142); JOURNAL ARTICLE (080)
Reviews the role of statistical significance testing, and argues
that dominance of such testing is dysfunctional because significance
tests do not provide the information that many researchers assume
they do. Possible reasons for the persistence of statistical
significance testing are discussed briefly, and ways to moderate
negative effects are suggested. (SLD)
Descriptors: Educational Practices; *Educational Research;
Elementary Secondary Education; Higher Education; Research Design;
Research Methodology; *Research Problems; Scholarly Journals;
*Statistical Significance
EJ475197 TM517625
The Case against Statistical Significance Testing, Revisited.
Carver, Ronald P.
Journal of Experimental Education, v61 n4 p287-92 Sum
1993
Theme issue title: "Statistical Significance Testing in
Contemporary Practice: Some Proposed Alternatives with Comments from
Journal Editors."
ISSN: 0022-0973
Document Type: EVALUATIVE REPORT (142); JOURNAL ARTICLE (080)
Four things are recommended to minimize the influence or importance
of statistical significance testing. Researchers must not neglect to
add "statistical" to significant and could interpret results before
giving p-values. Effect sizes should be reported with measures of
sampling error, and replication can be built into the design. (SLD)
Descriptors: Educational Researchers; *Effect Size; Error of
Measurement; *Research Methodology; Research Problems; Sampling;
*Statistical Significance
Identifiers: *P Values; *Research Replication
ED364608 TM020880
Meaningfulness, Statistical Significance, Effect Size, and Power
Analysis: A General Discussion with Implications for MANOVA.
Huston, Holly L.
Nov 1993
29p.; Paper presented at the Annual Meeting of the Mid-South
Educational Research Association (22nd, New Orleans, LA, November 9-
12, 1993).
Document Type: EVALUATIVE REPORT (142); CONFERENCE PAPER (150)
This paper begins with a general discussion of statistical
significance, effect size, and power analysis; and concludes by
extending the discussion to the multivariate case (MANOVA).
Historically, traditional statistical significance testing has guided
researchers' thinking about the meaningfulness of their data. The
use of significance testing alone in making these decisions has
proved problematic. It is likely that less reliance on statistical
significance testing, and an increased use of power analysis and
effect size estimates in combination could contribute to an overall
improvement in the quality of new research produced. The more
informed researchers are about the benefits and limitations of
statistical significance, effect size, and power analysis, the more
likely it is that they will be able to make more sophisticated and
useful interpretations about the meaningfulness of research results.
One table illustrates the discussion. (Contains 37 references.)(SLD)
Descriptors: *Effect Size; *Estimation (Mathematics); *Multivariate
Analysis; *Research Methodology; Research Reports; *Statistical
Significance
Identifiers: *Meaningfulness; *Power (Statistics)
ED364593 TM020837
What Is the Probability of Rejecting the Null Hypothesis?:
Statistical Power in Research.
Galarza-Hernandez, Aitza
Nov 1993
30p.; Paper presented at the Annual Meeting of the Mid-South
Educational Research Association (22nd, New Orleans, LA, November 9-
12, 1993).
Document Type: EVALUATIVE REPORT (142); CONFERENCE PAPER (150)
Power refers to the probability that a statistical test will yield
statistically significant results. In spite of the close
relationship between power and statistical significance, there is a
consistent overemphasis in the literature on statistical significance.
This paper discusses statistical significance and its limitations and
also includes a discussion of statistical power in the behavioral
sciences. Finally, some recommendations to increase power are
provided, focusing on the necessity of paying more attention to power
issues. Changing editorial policies and practices so that editors
ask authors to estimate the power of their tests is a useful way to
improve the situation. Planning research to consider power is
another way to ensure that the question of the probability of
rejecting the null hypothesis is answered correctly. Four tables and
two figures illustrate the discussion. (Contains 28 references.)
(SLD)
Descriptors: *Behavioral Science Research; Editors; *Estimation
(Mathematics); Hypothesis Testing; Literature Reviews; *Probability;
Research Design; *Research Methodology; Scholarly Journals;
*Statistical Significance
Identifiers: *Null Hypothesis; *Power (Statistics)
EJ458887 CG542330
Simultaneous Inference: Objections and Recommendations.
Schafer, William D.
Measurement and Evaluation in Counseling and Development, v25 n4
p146-48 Jan 1993
ISSN: 0748-1756
Document Type: JOURNAL ARTICLE (080); POSITION PAPER (120)
Considers objections to comparisonwise position, which holds that,
when conducting simultaneous significance procedures, per-test Type I
error rate should be controlled and that it is unnecessary to
introduce adjustments designed to control familywise rate.
Objections collected by Saville in an attempt to refute them are
discussed along with Saville's conclusions. Recommendations are
introduced for reporting significance tests in journals. (NB)
Descriptors: *Statistical Inference; *Statistical Significance;
Statistics
Identifiers: *Simultaneous Inference
ED347169 TM018523
Statistical Significance Testing: Alternatives and Considerations.
Wilkinson, Rebecca L.
Jan 1992
28p.; Paper presented at the Annual Meeting of the Southwest
Educational Research Association (Houston, TX, January 31-February 2,
1992).
Document Type: POSITION PAPER (120); CONFERENCE PAPER (150)
Problems inherent in relying solely on statistical significance
testing as a means of data interpretation are reviewed. The biggest
problem with statistical significance testing is that researchers
have used the results of this testing to ascribe importance or
meaning to their studies where such meaning often does not exist.
Often researchers mistake statistically significant results for
important effects. Statistical procedures are too often used as
substitutes to thought, rather than as aids to researcher thinking.
Alternatives to statistical significance testing that are explored
are effect size, statistical power, and confidence intervals. Other
considerations for further data analysis that are explored are: (1)
measurement reliability; (2) data exploration; and (3) the
replicability of research results. It is suggested that statistical
significance testing be used only as a guide in interpreting one's
results. Two tables present illustrative information, and there is a
22-item list of references. (SLD)
Descriptors: *Data Interpretation; Effect Size; *Reliability;
Researchers; Research Methodology; *Research Problems; *Statistical
Significance
Identifiers: Confidence Intervals (Statistics); Power (Statistics);
Research Replication
ED344905 TM018225
What Statistical Significance Testing Is, and What It Is Not.
Shaver, James P.
Apr 1992
43p.; Paper presented at the Annual Meeting of the American
Educational Research Association (San Francisco, CA, April 20-24,
1992).
Document Type: CONFERENCE PAPER (150)
A test of statistical significance is a procedure for determining
how likely a result is assuming a null hypothesis to be true with
randomization and a sample of size n (the given size in the study).
Randomization, which refers to random sampling and random assignment,
is important because it ensures the independence of observations, but
it does not guarantee independence beyond the initial sample
selection. A test of statistical significance provides a statement
of probability of occurrence in the long run, with repeated random
sampling under the null hypothesis, but provides no basis for a
conclusion about the probability that a particular result is
attributable to chance. A test of statistical significance also does
not indicate the probability that the null hypothesis is true or
false and does not indicate whether a treatment being studied had an
effect. Statistical significance indicates neither the magnitude nor
the importance of a result, and is no indication of the probability
that a result would be obtained on study replication. Although tests
of statistical significance yield little valid information for
questions of interest in most educational research, use and misuse of
such tests remain common for a variety of reasons. Researchers
should be encouraged to minimize statistical significance tests and
to state expectations for quantitative results as critical effect
sizes. There is a 58-item list of references. (SLD)
Descriptors: Educational Research; Evaluation Problems; Hypothesis
Testing; Probability; Psychological Studies; *Research Design;
Research Problems; *Sample Size; *Statistical Significance; Test
Validity
Identifiers: *Null Hypothesis; *Randomization (Statistics);
Research Replication
ED333036 TM016545
The Place of Significance Testing in Contemporary Social Science.
Moore, Mary Ann
3 Apr 1991
23p.; Paper presented at the Annual Meeting of the American
Educational Research Association (Chicago, IL, April 3-7, 1991).
Document Type: EVALUATIVE REPORT (142); CONFERENCE PAPER (150)
This paper examines the problems caused by relying solely on
statistical significance tests to interpret results in contemporary
social science. The place of significance testing in educational
research has often been debated. Among the problems in reporting
statistical significance are questions of definition and terminology.
Problems are also found in the use, as well as the reporting, of
significance testing. One of the most important problems is the
effect of sample size on significance. An example with a fixed
effect size of 25% and samples containing 22, 23, and 24 people
illustrates these effects. The issues of validity and reliability in
significance testing with measurement studies are considered.
Although these problems are widely recognized, publishers show a
clear bias in favor of reports that claim statistical significance.
Researchers need to recognize the limitations of significance testing.
Effect size statistics aid in the interpretation of results and
provide a guide to the relative importance of the study. Two tables
illustrate the effects of sample size. A 22-item list of references
is included. (SLD)
Descriptors: *Data Interpretation; Educational Research; *Effect
Size; Research Methodology; *Research Problems; *Sample Size; *Social
Science Research; *Statistical Significance
ED325524 TM015782
Alternatives to Statistical Significance Testing.
Palomares, Ronald S.
8 Nov 1990
20p.; Paper presented at the Annual Meeting of the Mid-South
Educational Research Association (19th, New Orleans, LA, November 14-
16, 1990).
Document Type: EVALUATIVE REPORT (142); CONFERENCE PAPER (150)
Researchers increasingly recognize that significance tests are
limited in their ability to inform scientific practice. Common
errors in interpreting significance tests and three strategies for
augmenting the interpretation of significance test results are
illustrated. The first strategy for augmenting the interpretation of
significance tests involves evaluating significance test results in a
sample size context. A second strategy involves interpretation of
effect size estimates; several estimates and corrections are
discussed. A third strategy emphasizes interpretation based on
estimated likelihood that results will replicate. The bootstrap
method of B. Efron and others and cross-validation strategies are
illustrated. A 28-item list of references and four data tables are
included. (Author/SLD)
Descriptors: *Effect Size; Estimation (Mathematics); *Evaluation
Methods; *Research Design; Research Problems; *Sample Size;
*Statistical Significance
Identifiers: Bootstrap Methods; Cross Validation
ED320965 TM015274
Looking beyond Statistical Significance: Result Importance and
Result Generalizability.
Welge-Crow, Patricia A.; And Others
25 May 1990
23p.; Paper presented at the Annual Meeting of the American
Psychological Society (Dallas, TX, June 9, 1990).
Document Type: EVALUATIVE REPORT (142); CONFERENCE PAPER (150)
Three strategies for augmenting the interpretation of significance
test results are illustrated. Determining the most suitable indices
to use in evaluating empirical results is a matter of considerable
debate among researchers. Researchers increasingly recognize that
significance tests are very limited in their potential to inform the
interpretation of scientific results. The first strategy involves
evaluating significance test results in a sample size context. The
researcher is encouraged to determine at what smaller sample size a
statistically significant fixed effect size would no longer be
significant, or conversely, at what larger sample size a non-
significant result would become statistically significant. The
second strategy would involve interpreting effect size as an index of
result importance. The third strategy emphasizes interpretation
based on the estimated likelihood that results will replicate. These
applications are illustrated via small heuristic data sets to make
the discussion more concrete. A 37-item list of references, seven
data tables, and an appendix illustrating relevant computer commands
are provided. (TJH)
Descriptors: Educational Research; *Effect Size; Estimation
(Mathematics); *Generalizability Theory; Heuristics; Mathematical
Models; Maximum Likelihood Statistics; *Research Methodology; *Sample
Size; *Statistical Significance; *Test Interpretation; Test Results
Identifiers: Empirical Research; Research Replication
EJ404813 CG537044
Multiple Criteria for Evaluating the Magnitude of Experimental
Effects.
Haase, Richard F.; And Others
Journal of Counseling Psychology, v36 n4 p511-16 Oct
1989
Document Type: JOURNAL ARTICLE (080); POSITION PAPER (120)
Contends that tests of statistical significance and measures of
magnitude in counseling psychology research do not provide same
information. Argues interpreting magnitude of experimental effects
must be two-stage decision process with the second stage of process
being conditioned on results of a test of statistical significance
and entailing evaluation of absolute magnitude of effect.
(Author/ABL)
Descriptors: *Research Methodology; *Research Needs; *Statistical
Significance; *Test Interpretation
Identifiers: *Counseling Psychology
ED314450 TM014265
Comments on Better Uses of and Alternatives to Significance
Testing.
Davidson, Betty M.; Giroir, Mary M.
9 Nov 1989
18p.; Paper presented at the Annual Meeting of the Mid-South
Educational Research Association (Little Rock, AR, November 8-10,
1989).
Document Type: REVIEW LITERATURE (070); EVALUATIVE REPORT (142);
CONFERENCE PAPER (150)
Controversy over the proper place of significance testing within
scientific methodology has continued for some time. The suggestion
that effect sizes are more important than whether results are
significant is presented. Effect size can be defined as an estimate
of how much of the dependent variable is accounted for by the
independent variables. Interpretations of statistical significance
can be seriously incorrect when the researcher underinterprets an
outcome with a large effect size that is nonsignificant or
overinterprets an outcome that involves a small effect size but which
is statistically significant. These problems can be avoided if the
researcher includes effect size in result interpretation. It has
been stated that statistical significance was never intended to take
the place of replication in research. Researchers must begin drawing
conclusions based on effect sizes and not statistical significance
alone; and the replicability and reliability of results must be
recognized, analyzed, and interpreted. Two tables illustrate effect
sizes. (SLD)
Descriptors: *Effect Size; *Reliability; Research Design;
Researchers; *Scientific Methodology; Statistical Analysis;
*Statistical Significance
Identifiers: *Significance Testing
ED314449 TM014264
Ways of Estimating the Probability That Results Will Replicate.
Giroir, Mary M.; Davidson, Betty M.
9 Nov 1989
17p.; Paper presented at the Annual Meeting of the Mid-South
Educational Research Association (Little Rock, AR, November 8-10,
1989).
Document Type: EVALUATIVE REPORT (142); CONFERENCE PAPER (150)
Replication is important to viable scientific inquiry; results that
will not replicate or generalize are of very limited value.
Statistical significance enables the researcher to reject or not
reject the null hypothesis according to the sample results obtained,
but statistical significance does not indicate the probability that
results will be replicated. Three techniques for evaluating the
sampling specificity of results are described: (1) the jackknife
technique of J. W. Tukey (1969); (2) the bootstrap technique of
Efron, described by P. Diaconis and E. Bradley (1983); and (3) cross-
validation methods described by B. Thompson (1989). A small data set
developed by B. Thompson in 1979 is used to demonstrate the cross-
validation procedure in detail. These three procedures allow the
researcher to examine the replicability and generalizability of
results and should be used frequently. Two tables present the study
results, and an appendix gives examples of commands for the
Statistical Analysis System computer package used for the cross-
validation example. (SLD)
Descriptors: *Estimation (Mathematics); *Generalizability Theory;
Hypothesis Testing; Probability; Research Design; Sample Size;
*Sampling; Scientific Methodology; *Statistical Significance
Identifiers: Bootstrap Hypothesis; *Cross Validation; Jackknifing
Technique; *Research Replication; Research Results
ED303514 TM012775
Statistical Significance Testing: From Routine to Ritual.
Keaster, Richard D.
Nov 1988
15p.; Paper presented at the Annual Meeting of the Mid-South
Educational Research Association (Louisville, KY, November 9-11,
1988).
Document Type: CONFERENCE PAPER (150); EVALUATIVE REPORT (142);
REVIEW LITERATURE (070)
Target Audience: Researchers
An explanation of the misuse of statistical significance testing
and the true meaning of "significance" is offered. Literature about
the criticism of current practices of researchers and publications is
reviewed in the context of tests of significance. The problem under
consideration occurs when researchers attempt to do more than just
establish that a relationship has been observed. More often than
not, too many researchers assume that the difference, and even the
size of the difference, proves or at least confirms the research
hypothesis. Statistical significance is not a measure of
"substantive' significance or what might be called scientific
importance. Significance testing was designed to yield yes/no
decisions. It is suggested that authors or research projects should
not try to interpret the magnitudes of their significance findings.
Significance testing must be returned to its proper place in the
scientific process. (SLD)
Descriptors: Educational Assessment; Research Design; Research
Methodology; *Research Problems; Statistical Analysis; *Statistical
Significance; Statistics
EJ352091 CG531911
How Significant Is a Significant Difference? Problems With the
Measurement of Magnitude of Effect.
Murray, Leigh W.; Dosser, David A., Jr.
Journal of Counseling Psychology, v34 n1 p68-72 Jan
1987
Document Type: JOURNAL ARTICLE (080); GENERAL REPORT (140)
The use of measures of magnitude of effect has been advocated as a
way to go beyond statistical tests of significance and to identify
effects of a practical size. They have been used in meta-analysis to
combine results of different studies. Describes problems associated
with measures of magnitude of effect (particularly study size) and
implications for researchers. (Author/KS)
Descriptors: *Effect Size; *Meta Analysis; Research Design;
Research Methodology; *Sample Size; *Statistical Analysis;
*Statistical Inference; *Statistical Significance
ED285902 TM870488
The Use of Invariance and Bootstrap Procedures as a Method to
Establish the Reliability of Research Results.
Sandler, Andrew B.
30 Jan 1987
19p.; Paper presented at the Annual Meeting of the Southwest
Educational Research Association (Dallas, TX, January 29-31, 1987).
Document Type: CONFERENCE PAPER (150); RESEARCH REPORT (143)
Target Audience: Researchers
Statistical significance is misused in educational and
psychological research when it is applied as a method to establish
the reliability of research results. Other techniques have been
developed which can be correctly utilized to establish the
generalizability of findings. Methods that do provide such estimates
are known as invariance or cross-validation procedures and the
bootstrap method. Invariance procedures split the total sample into
two subgroups and apply techniques to analyze each subgroup and
compare results, often by using parameters obtained from one subgroup
to evaluate the other subgroup. A simulated data set is presented
and analyzed by invariance procedures for: (1) canonical correlation;
(2) regression and discriminant analysis; (3) analysis of variance
and covariance; and (4) bivariate correlation. Whereas invariance
procedures split a sample into two parts, the bootstrap method
creates multiple copies of the data set. The number of copies could
exceed millions with current computer capability. The copies are
shuffled and artificial samples of 20 cases each, called bootstrap
samples, are randomly selected. The value of the Pearson product-
moment correlation (or other statistics) is then calculated for each
bootstrap sample to assess the generalizability of the results. (LPG)
Descriptors: Analysis of Covariance; Analysis of Variance;
Correlation; Discriminant Analysis; *Generalizability Theory;
*Mathematical Models; Regression (Statistics); *Reliability; Research
Design; Research Problems; *Sample Size; Sampling; Simulation;
Statistical Inference; *Statistical Significance; Statistical Studies;
Validity
Identifiers: *Bootstrap Hypothesis; *Cross Validation; Invariance
Principle
ED281852 TM870223
A Primer on MANOVA Omnibus and Post Hoc Tests.
Heausler, Nancy L.
30 Jan 1987
21p.; Paper presented at the Annual Meeting of the Southwest
Educational Research Association (Dallas, TX, January 30, 1987).
Document Type: CONFERENCE PAPER (150); RESEARCH REPORT (143)
Target Audience: Researchers
Each of the four classic multivariate analysis of variance (MANOVA)
tests of statistical significance may lead a researcher to different
decisions as to whether a null hypothesis should be rejected: (1)
Wilks' lambda; (2) Lawley-Hotelling trace criterion; (3) Roy's
greatest characteristic root criterion; and (4) Pillai's trace
criterion. These four omnibus test statistics are discussed and
their optimal uses illustrated using hypothetical data sets.
Discriminant analysis as a post hoc method to MANOVA is illustrated
in detail. Once a significant MANOVA has been found, the next step
is to interpret the non-chance association between dependent and
independent variables. (Author/GDC)
Descriptors: Analysis of Variance; Discriminant Analysis; Factor
Analysis; *Hypothesis Testing; *Multivariate Analysis; *Statistical
Significance; Statistical Studies
Identifiers: Omnibus Test; Post Hoc Methods
EJ327959 EA519388
Chance and Nonsense: A Conversation about Interpreting Tests of
Statistical Significance, Part 2.
Shaver, James P.
Phi Delta Kappan, v67 n2 p138-41 Oct 1985
For Part 1, see EJ 326 611 (September 1985 "Phi Delta Kappan").
Document Type: JOURNAL ARTICLE (080); RESEARCH REPORT (143);
POSITION PAPER (120)
Target Audience: Researchers; Practitioners
The second half of a dialogue between two fictional teachers
examines the significance of statistical significance in research and
considers the factors affecting the extent to which research results
provide important or useful information. (PGD)
Descriptors: Educational Research; *Research Methodology; Research
Problems; Sampling; Statistical Analysis; *Statistical Significance
EJ326611 EA519370
Chance and Nonsense: A Conversation about Interpreting Tests of
Statistical Significance, Part 1.
Shaver, James P.
Phi Delta Kappan, v67 n1 p57-60 Sep 1985
For Part 2, see EJ 327 959 (October 1985 "Phi Delta Kappan").
Document Type: JOURNAL ARTICLE (080); RESEARCH REPORT (143);
POSITION PAPER (120)
Target Audience: Researchers; Practitioners
A dialog between two fictional teachers provides some basic
examples of how research that uses approved methodology may provide
results that are significant statistically but not significant
practically. (PGD)
Descriptors: Educational Research; Research Methodology; Research
Problems; *Sampling; Statistical Analysis; *Statistical Significance
EJ326117 UD511911
Mind Your p's and Alphas.
Stallings, William M.
Educational Researcher, v14 n9 p19-20 Nov 1985
Document Type: JOURNAL ARTICLE (080); POSITION PAPER (120);
GENERAL REPORT (140)
In the educational research literature, alpha and p are often
conflated. Paradoxically, alpha retains a prominent place in
textbook discussions, but it is often supplanted by p in the results
sections of journal articles. Because alpha and p have unique uses,
researchers should continue to employ both conventions in summarizing
the outcomes of tests of significance. (KH)
Descriptors: *Educational Research; *Research Methodology;
Statistical Analysis; *Statistical Significance
Identifiers: *Alpha Coefficient; *p Coefficient
ED253566 TM850106
Mind Your p's and Alphas.
Stallings, William M.
1985
11p.; Paper presented at the Annual Meeting of the American
Educational Research Association (69th, Chicago, IL, March 31-April
4, 1985).
Document Type: CONFERENCE PAPER (150); POSITION PAPER (120);
REVIEW LITERATURE (070)
Target Audience: Researchers
In the educational research literature alpha, the a priori level of
significance, and p, the a posteriori probability of obtaining a test
statistic of at least a certain value when the null hypothesis is
true, are often confused. Explanations for this confusion are
offered. Paradoxically, alpha retains a prominent place in textbook
discussions of such topics as statistical hypothesis testing,
multivariate analysis, power, and multiple comparisons while it seems
to have been supplanted by p in current journal articles. The unique
contributions of both alpha and p are discussed and a plea is made
for using both conventions in reporting empirical studies. (Author)
Descriptors: Educational Research; *Hypothesis Testing;
Multivariate Analysis; *Probability; *Research Methodology; Research
Problems; *Statistical Significance; Statistical Studies
Identifiers: *Alpha Coefficient
EJ307832 TM510187
Policy Implications of Using Significance Tests in Evaluation
Research.
Schneider, Anne L.; Darcy, Robert E.
Evaluation Review, v8 n4 p573-82 Aug 1984
Document Type: JOURNAL ARTICLE (080); RESEARCH REPORT (143)
The normative implications of applying significance tests in
evaluation research are examined. The authors conclude that
evaluators often make normative decisions, based on the traditional
.05 significance level in studies with small samples. Additional
reporting of the magnitude of impact, the significance level, and the
power of the test is recommended. (Author/EGS)
Descriptors: *Evaluation Methods; *Hypothesis Testing; *Research
Methodology; Research Problems; Sample Size; *Statistical
Significance
Identifiers: Data Interpretation; *Evaluation Problems; Evaluation
Research
ED249266 TM840619
Power Differences among Tests of Combined Significance.
Becker, Betsy Jane
Apr 1984
21p.; Paper presented at the Annual Meeting of the American
Educational Research Association (68th, New Orleans, LA, April 23-27,
1984).
Document Type: CONFERENCE PAPER (150); RESEARCH REPORT (143)
Target Audience: Researchers
Power is an indicator of the ability of a statistical analysis to
detect a phenomenon that does in fact exist. The issue of power is
crucial for social science research because sample size, effects, and
relationships studied tend to be small and the power of a study
relates directly to the size of the effect of interest and the sample
size. Quantitative synthesis methods can provide ways to overcome
the problem of low power by combining the results of many studies.
In the study at hand, large-sample (approximate) normal distribution
theory for the non-null density of the individual p value is used to
obtain power functions for significance value summaries. Three p-
value summary methods are examined: Tippett's counting method,
Fisher's inverse chi-square summary, and the logit method. Results
for pairs of studies and for a set of five studies are reported.
They indicate that the choice of a "most-powerful" summary will
depend on the number of studies to be summarized, the sizes of the
effects in the populations studied, and the sizes of the samples
chosen from those populations. (BW)
Descriptors: Effect Size; Hypothesis Testing; *Meta Analysis;
Research Methodology; Sample Size; *Statistical Analysis;
*Statistical Significance
Identifiers: *Power (Statistics)
Return to FAQ on The Concept of Statistical Signicance Testing |
|
|
|||
|
Full-text Library | Search ERIC | Test Locator | ERIC System | Assessment Resources | Calls for papers | About us | Site map | Search | Help Sitemap 1 - Sitemap 2 - Sitemap 3 - Sitemap 4 - Sitemap 5 - Sitemap 6
©1999-2012 Clearinghouse on Assessment and Evaluation. All rights reserved. Your privacy is guaranteed at
ericae.net. |