The concept of statistical significance testing. Thompson, Bruce

Volume:

A peer-reviewed electronic journal. ISSN 1531-7714

Permission is granted to distribute this article for nonprofit, educational purposes if it is copied in its entirety and the journal is credited. Please notify the editor if an article is to be used in a newsletter.

	Find similar papers in
		ERICAE Full Text Library Pract Assess, Res & Eval ERIC RIE & CIJE 1990- ERIC On-Demand Docs
	Find articles in ERIC written by
		Thompson, Bruce

Thompson, Bruce (1994). The concept of statistical significance testing. Practical Assessment, Research & Evaluation, 4(5). Retrieved August 18, 2006 from http://edresearch.org/pare/getvn.asp?v=4&n=5 . This paper has been viewed 15,404 times since 11/13/99.

TheConceptofStatisticalSignificanceTesting.

Bruce Thompson,
Texas A & M

Toofewresearchersunderstandwhatstatisticalsignificancetestingdoesanddoesn'tdo,andconsequentlytheirresultsaremisinterpreted.Evenmorecommonly,researchersunderstandelementsofstatisticalsignificancetesting,buttheconceptisnotintegratedintotheirresearch.Forexample,theinfluenceofsamplesizeonstatisticalsignificancemaybeacknowledgedbyaresearcher,butthisinsightisnotconveyedwheninterpretingresultsinastudywithseveralthousandsubjects.

Thisarticlewillhelpyoubetterunderstandtheconceptofsignificancetesting.Themeaningofprobabilities,theconceptofstatisticalsignificance,argumentsagainstsignificancetesting,misinterpretation,andalternativesarediscussed.

WHATARETHOSEPROBABILITIESINSTATISTICALSIGNIFICANCETESTING?

Researchersmayinvokestatisticalsignificancetestingwhenevertheyhavearandomsamplefromapopulation,orasamplethattheybelieveapproximatesarandom,representativesample.Statisticalsignificancetestingrequiressubjectivejudgmentinsettingapredeterminedacceptableprobability(rangingbetween0and1.0)ofmakinganinferentialerrorcausedbythesamplingerror--gettingsampleswithvaryingamountsof"flukiness"--inherentinsampling.Samplingerrorcanonlybeeliminatedbygatheringdatafromtheentirepopulation.

Oneprobability(p),theprobabilityofdecidingtorejectanullhypothesis(e.g.,ahypothesisspecifyingthat Mean₁=Mean₂=Mean₃,orR² =0)whenthenullhypothesisisactuallytrueinthepopulation,iscalled"alpha,"andalso p_(CRITICAL).Whenwepickanalphalevel,wesetanupperlimitontheprobabilityofmakingthiserroneousdecision,calledaTypeIerror.Therefore,alphaistypicallysetsmall,sothattheprobabilityofthiserrorwillbelow.Thus, p_(CRITICAL)isselectedbasedonsubjectivejudgmentregardingwhattheconsequencesofTypeIerrorwouldbeinagivenresearchsituation,andgivenpersonalvaluesregardingtheseconsequences.

Asecondprobability,p_(CALCULATED)(which,likeallp's,rangesbetween.0and1.0),iscalculated.Probabilitiescanonlybecalculatedinthecontextofassumptionssufficienttoconstrainthecomputationssuchthatagivenproblemhasonlyoneanswer.

What'stheprobabilityofgettingmeanIQscoresof99and101intwosamplegroups?Itdepends,first,ontheactualstatisticalparameters(e.g.,means)inthepopulationsfromwhichthesamplesweredrawn.Thesetwosamplestatistics (Mean₁=99andMean₂=101)wouldbemostprobable(yieldingthehighest p_(CALCULATED)ifthepopulationmeanswererespectively99and101.Thesetwosamplestatisticswouldbelesslikely(yieldingasmaller p_(CALCULATED)ifthepopulationmeanswereboth100.Sincetheactualpopulationparametersarenotknown,wemustassumewhattheparametersare,andinstatisticalsignificancetestingweassumetheparameterstobecorrectlyspecifiedbythenullhypothesis,i.e.,weassumethenullhypothesistobeexactlytrueforthesecalculations.

Asecondfactorthatinfluencesthecalculationofpinvolvesthesamplesizes.Samples(andthusthestatisticscalculatedforthem)willpotentiallybelessrepresentativeofpopulations("flukier")assamplesizesaresmaller.Forexample,drawingtwosamplesofsizes5and5mayyield"flukier"statistics(means,r's,etc.)thantwosamplesofsizes50and50.Thus,the p_(CALCULATED)computationsalsomust(anddo)takesamplesizeinfluencesintoaccount.Ifthetwosamplesbothofsize5hadmeansof100and90,andthetwosamplesbothofsize50alsohadmeansof100and90,thetestofthenullthatthemeansareequalwouldyieldasmaller p_(CALCULATED)forthelargersamples,becauseassumingthenullisexactlytrue,unequalsamplestatisticsareincreasinglylesslikelyassamplesizesincrease.Summarizing,the p_(CALCULATED)probabilityaddressesthequestion:

Assumingthesampledatacamefromapopulationinwhichthe nullhypothesisis(exactly)true,whatistheprobability ofobtainingthesamplestatisticsonegotforone'ssample datawiththegivensamplesize(s)?

Evenwithoutcalculatingthisp,wecanmakelogicaljudgmentsaboutp_{_(CALCULATED)}.Inwhichoneofeachofthefollowingpairsofstudieswillthe p_{_(CALCULATED)}besmaller?

Intwostudies,eachinvolvingthreegroupsof30subjects: inonestudythemeanswere100,100,and90;inthesecond studythemeanswere100,100,and100.

Intwostudies,eachcomparingthestandarddeviations(SD) ofscoresonthedependentvariableoftwogroupsofsubjects,inbothstudies SD₁=4andSD₂=3,butin studyonethesamplesizeswere100and100,whileinstudy twothesamplessizeswere50and50.

Intwostudiesinvolvingamultipleregressionpredictionof YusingpredictorsX₁,X₂,andX₃,andbothwithsamples sizesof75,instudyoneR² =.49andinstudytwoR² =.25.

WHATDOESSTATISTICALSIGNIFICANCEREALLYTELLUS?

Statisticalsignificanceaddressesthequestion:

"Assumingthesampledatacamefromapopulationinwhichthe nullhypothesisis(exactly)true,andgivenoursample statisticsandsamplesize(s),isthecalculatedprobability ofoursampleresultslessthantheacceptablelimit(p_(CRITICAL))imposedregardingaTypeIerror?"

Whenp_{_(CALCULATED)}islessthanp_{_(CRITICAL)},weuseadecisionrulethatsayswewill"reject"thenullhypothesis.Thedecisiontorejectthenullhypothesisiscalleda"statisticallysignificant"result.Allthedecisionmeansisthatwebelieveoursampleresultsarerelativelyunlikely,givenourassumptions,includingourassumptionthatthenullhypothesisisexactlytrue.

However,thoughitiseasytoderivep_(CRITICAL),calculatingp_(CALCULATED)canbetedious.Traditionally,teststatistics(e.g.,F,t,Xsquared)havebeenusedasequivalent(butmoreconvenient)reexpressionsofp's,becauseTestStatistics_(CALCULATED)areeasiertoderive.TheTS_(CRITICAL)exactlyequivalenttoagivenp_(CRITICAL)canbederivedfromwidelyavailabletables;thetabledvalueisfoundgivenalphaandthesamplesize(s).DifferentTS_(CALCULATED)arecomputeddependingonthehypothesisbeingtested.Theonlydifferenceininvokingteststatisticsinourdecisionruleisthatwerejectthenull(called"statisticallysignificant")whenTS_(CALCULATED)isgreaterthanTS_(CRITICAL).However,comparingp'sandTS'sforagivendatasetwillalwaysyieldthesamedecision.

Remember,knowingsampleresultsarerelativelyunlikely,assumingthenullistrue,maynotbehelpful.Animprobableresultisnotnecessarilyanimportantresult,asShaver(1985,p.58)illustratesinhishypotheticaldialoguebetweentwoteachers:

Chris:...Isetthelevelofsignificanceat.05,asmythesis advisorsuggested.Soadifferencethatlargewouldoccurby chancelessthanfivetimesinahundredifthegroups weren'treallydifferent.Anunlikelyoccurrencelikethat surelymustbeimportant.
Jean:Waitaminute.Remembertheotherdaywhenyouwentinto theofficetocallhome?Justasyoucompleteddialingthe number,yourlittleboypickedupthephonetocallsomeone. Soyouwereconnectedandtalkingtooneanotherwithoutthe phoneeverringing...Well,thatmusthavebeenatruly importantoccurrencethen?

WHYNOTUSESTATISTICALSIGNIFICANCETESTING?

Statisticalsignificancetestingmayrequireaninvestmentofeffortthatlacksacommensuratebenefit.Scienceisthebusinessofisolatingrelationshipsthat(re)occurunderstatedconditions,sothatknowledgeiscreatedandcanbecumulated.Butstatisticalsignificancedoesnotadequatelyaddresswhethertheresultsinagivenstudywillreplicate(Carver,1978).Asscientists,wemustask(a)whatthemagnitudesofsampleeffectsareand(b)whethertheseresultswillgeneralize;statisticalsignificancetestingdoesnotrespondtoeitherquestion(Thompson,inpress).Thus,statisticalsignificancemaydistractattentionfrommoreimportantconsiderations.

MISINTERPRETINGSTATISTICALSIGNIFICANCETESTING

Manyoftheproblemsincontemporaryusesofstatisticalsignificancetestingoriginateinthelanguageresearchersuse.Severalnamescanrefertoasingleconcept(e.g., "SOS_(BETWEEN)"="SOS_(EXPLAINED)"= "SOS_(MODEL)"="SOS_(REGRESSION)"),anddifferentmeaningsaregiventotermsindifferentcontexts(e.g.,"univariate"meanshavingonlyonedependentvariablebutpotentiallymanypredictorvariables,butmayalsorefertoastatisticthatcanbecomputedwithonlyasinglevariable).

Overcomingthreehabitsoflanguagewillhelpavoidunconsciousmisinterpretations:

Say"statisticallysignificant"ratherthan"significant." Referringtotheconceptasaphrasewillhelpbreakthe erroneousassociationbetweenrejectinganullhypothesis andobtaininganimportantresult.

Don'tsaythingslike"myresultsapproachedstatistical significance."Thislanguagemakeslittlesenseinthe contextofthestatisticalsignificancetestinglogic.My favoriteresponsetothisisofferedbyafelloweditorwho responds,"Howdidyouknowyourresultswerenottryingto avoidbeingstatisticallysignificant?".

Don'tsaythingslike"thestatisticalsignificancetesting evaluatedwhethertheresultswere'duetochance'."This languagegivestheimpressionthatreplicabilityis evaluatedbystatisticalsignificancetesting.

WHATANALYSESAREPREFERREDTOSTATISTICALSIGNIFICANCETESTING?

Twoanalysesshouldbeemphasizedoverstatisticalsignificancetesting(JournalofExperimentalEducation,1993).First,effectsizesshouldbecalculatedandinterpretedinallanalyses.Thesecanbersquared-typeeffectsizes(e.g.,Rsquared,etasquared,omegasquared)thatevaluatetheproportionofvarianceexplainedintheanalysis,orstandardizeddifferencesinstatistics(e.g.,standardizeddifferencesinmeans),orboth.Second,thereplicabilityofresultsmustbeempiricallyinvestigated,eitherthroughactualreplicationofthestudy,orbyusingmethodssuchascross-validation,thejackknife,orthebootstrap(seeThompson,inpress).

RECOMMENDEDREADING

Carver,R.P.(1978).Thecaseagainststatisticalsignificancetesting.HarvardEducationalReview,48,378-399.

Cohen,J.(1990).ThingsIhavelearned(sofar).AmericanPsychologist,45(12),1304-1312.

JournalofExperimentalEducation.(1993).SpecialIssue--"Theroleofstatisticalsignificancetestingincontemporaryanalyticpractice:Alternativeswithcommentsfromjournaleditors".Washington,DC:HeldrefPublications. (Available from ERIC/AE).

Rosnow,R.L.,&Rosenthal,R.(1989).Statisticalproceduresandthejustificationofknowledgeinpsychologicalscience.AmericanPsychologist,44,1276-1284.

Shaver,J.(1985).Chanceandnonsense.PhiDeltaKappan,67(1),57-60.

Thompson,B.(inpress).Thepivotalroleofreplicationinpsychologicalresearch:Empiricallyevaluatingthereplicabilityofsampleresults.JournalofPersonality.

-----

Descriptors: Data Analysis; Data Interpretation; Decision Making; *Effect Size; Hypothesis Testing; Probability; Research Methodology; Research Problems; *Sampling; *Statistical Analysis; *Statistical Significance; Test Interpretation; *Test Use; *Testing

Sitemap 1 - Sitemap 2 - Sitemap 3 - Sitemap 4 - Sitemape 5 - Sitemap 6

The Concept of Statistical Significance Testing.

TheConceptofStatisticalSignificanceTesting.