<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Emilee Rader &#187; statistics</title>
	<atom:link href="http://bierdoctor.com/category/statistics/feed/" rel="self" type="application/rss+xml" />
	<link>http://bierdoctor.com</link>
	<description>Assistant Professor, Technology &#38; Social Behavior @ Northwestern University</description>
	<lastBuildDate>Thu, 02 Sep 2010 04:50:39 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>statistics. sigh.</title>
		<link>http://bierdoctor.com/2010/09/01/statistics-sigh/</link>
		<comments>http://bierdoctor.com/2010/09/01/statistics-sigh/#comments</comments>
		<pubDate>Thu, 02 Sep 2010 04:50:39 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[analysis]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[infrastructure]]></category>
		<category><![CDATA[research design]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://bierdoctor.com/?p=595</guid>
		<description><![CDATA[I find myself once again this week reading stats papers that range from &#8220;slightly over my head&#8221; to &#8220;I have no idea what you people are talking about,&#8221; in an attempt to figure out the right thing to do with a dataset involving observations that are not independent. The dataset consists of conversations between dyads [...]]]></description>
			<content:encoded><![CDATA[<p>I find myself once again this week reading stats papers that range from &#8220;slightly over my head&#8221; to &#8220;I have no idea what you people are talking about,&#8221; in an attempt to figure out the right thing to do with a dataset involving observations that are not independent.</p>
<p>The dataset consists of conversations between dyads that took place while they completed two different interactive tasks. The conversations were recorded, transcribed, and segmented into utterances according to some criteria. This means that there are repeated utterances from each participant, and from each dyad. Different research areas use different terms to refer to this kind of setup: repeated measures, panel data, clustered data, etc. The analysis is further complicated by the fact that the predictors and variables are all categorical. Some are binary, the presence or absence of something. The more interesting variables have more than two categories (in some cases, MANY more).</p>
<p>I am trying to estimate the strength with which each of a set of 15+ utterance goals is associated with one of three roles participants assumed as part of the study. To do this, I need to specify a mixed-effects multinomial logit model, with a set of fixed-effects categorical predictors and a hierarchical random effects control for participant within dyad. This involves choosing a reference category of the response variable, and then running a series of binomial logit models that compare all the other levels of the response variable in turn with the reference category.</p>
<p>Here is where I am running into a situation, again, where I am pushing up against what mainstream statistical software packages are reliably capable of, and even R does not seem to be able to do what I want without more programming than my meager statistical background has prepared me for. The problem as I understand it is, each one of the binomial logit models that makes up the multinomial results uses a different subset of the data, excluding those observations that are related to the levels of the response variable not included in the model. This means that the random effects are estimated differently for each binomial logit model, depending on which observations are included in the subset. The upshot of all of this is the overall multinomial model estimates come out differently, depending substantially on which category is chosen as the reference category.</p>
<p>So that&#8217;s the problem. However, I did not write this to whine about how I am stuck. I&#8217;ve been trying to figure out a solution that I can live with&#8230; do I bail completely? Hire a real statistician? How can I figure out how biased the results would be if I were to to do a purely fixed-effects model? (Without random effects controls, any results produced might in fact be due to some unique aspect of the conversation within a particular dyad in a particular role, rather than indicative of something that shows up across all of the dyads.)</p>
<p>Researchers in many fields work with categorical data, and at least some of them over the years must have encountered this problem, whether they knew it or not, and were faced with the same tradeoffs. In order to get the paper out the door they had to just pick a compromise and go with it. But, any results reached due to a compromise are biased in some way. Models like this are just now becoming possible for people like me, with just enough stats knowledge to be dangerous, to run using fairly standard statistical software packages. But what about all the research that has come before &#8212; how accurate are those models, and the results they produced? How much do people allow what is statistically feasible to determine their research design, vs. compromising on the analysis after the fact? We all stand on the shoulders of giants, but how often were the giants using naive or incorrect statistics?</p>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2010/09/01/statistics-sigh/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>an infrastructure of social information</title>
		<link>http://bierdoctor.com/2010/04/09/an-infrastructure-of-social-information/</link>
		<comments>http://bierdoctor.com/2010/04/09/an-infrastructure-of-social-information/#comments</comments>
		<pubDate>Fri, 09 Apr 2010 17:14:21 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[research]]></category>
		<category><![CDATA[social filtering]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://bierdoctor.com/?p=518</guid>
		<description><![CDATA[In my last post, I wrote about my reaction to statistical arguments presented in a paper titled &#8220;Of Beauty, Sex and Power&#8221; by Andrew Gelman and David Weakliem (American Scientist, 97(4), 310-316). My second reaction to the paper has to do with the distortion of the effect size as it moved from journal paper to [...]]]></description>
			<content:encoded><![CDATA[<p>In my last post, I wrote about my reaction to statistical arguments presented in a paper titled &#8220;<a href="http://www.americanscientist.org/issues/page2/2009/4/of-beauty-sex-and-power">Of Beauty, Sex and Power</a>&#8221; by Andrew Gelman and David Weakliem (American Scientist, 97(4), 310-316).</p>
<p>My second reaction to the paper has to do with the distortion of the effect size as it moved from journal paper to book to popular press to blog entry. It is easy to dismiss this as just an instance of the &#8220;<a href="http://en.wikipedia.org/wiki/Telephone_game">telephone game</a>&#8221; &#8212;the message is not replicated exactly as it moves from one venue to the next, and the distortions are biased in the direction of making the results sound stronger and more controversial and therefore more interesting and worthy of attention. Nobody wants to read a blog entry that says, &#8220;Yep, in line with previous research, beautiful people are maybe 4.3% more likely to have female babies, but that result isn&#8217;t statistically significant so maybe there&#8217;s no noticeable effect at all.&#8221; Well, possibly nobody except people who are really interested in this type of research, I guess.</p>
<p>An interesting thing to me about this example is that the results were published and re-published not by some random blogger (like me!), but in reputable venues: the Journal of Theoretical Biology http://www.elsevier.com/locate/yjtbi, the Freakonomics blog on the New York Times website http://freakonomics.blogs.nytimes.com/, and Psychology Today http://www.psychologytoday.com/magazine. The author of the original paper, Satoshi Kanazawa (http://personal.lse.ac.uk/KANAZAWA/), even ended up writing a book about it, with Alan S. Miller, titled &#8220;Why Beautiful People Have More Daughters&#8221;. In each case, the quality filter was broken, but the reputation of the venue vouched for the results just the same.</p>
<p>It is easy to argue that these venues should just be more careful in the future about &#8220;quality control&#8221;; they should insist that reviewers consider statistical power and effect size, or do a better job of tracking down related work so they would be more likely to spot an effect size that was out of whack. However, it is more difficult to think about the role of the social structures in enhancing the validity of the information, and in communicating to readers that they don&#8217;t need to do any of the legwork themselves.</p>
<p>Thinking about the social structures that enabled and even encouraged this distortion to happen, it seems like similar reputation effects and influences exist in social media systems. If I&#8217;d posted a link to the Freakonomics story, for example, either to delicious.com or Facebook or Twitter, would anybody have thought twice about the excessively large effect size that was reported? How about the people who found it interesting enough to re-post? Would the people who found it interesting enough to re-post maybe have something in common, perhaps that gave them a more credible source for this kind of information than a random user? These contributions make up the information infrastructure of the so-called &#8220;social search&#8221; phenomenon, and yet we understand very little about the social forces that shape this infrastructure.</p>
<p>Another implication of this example just occurred to me while writing this post: the problem with playing around analyzing available datasets without knowing the related literature. I think the recent emphasis on more data as the answer to everything could mean that people who don&#8217;t know much about an area are suddenly immersed in a dataset fishing for results they aren&#8217;t necessarily equipped to interpret accurately. If Kanazawa truly didn&#8217;t know the sex differences literature, how might he be expected to know that an 8% effect is really incredibly large (aside from the multiple comparisons problem).</p>
<div><span style="font-family: 'Helvetica Neue Light', 'Times New Roman', 'Bitstream Charter', Times, serif;font-size: medium"><br />
</span></div>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2010/04/09/an-infrastructure-of-social-information/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>statistical power and effect size</title>
		<link>http://bierdoctor.com/2010/04/05/statistical-power-and-effect-size/</link>
		<comments>http://bierdoctor.com/2010/04/05/statistical-power-and-effect-size/#comments</comments>
		<pubDate>Mon, 05 Apr 2010 21:07:48 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://bierdoctor.com/?p=515</guid>
		<description><![CDATA[I really love reading about statistics. When I&#8217;m having one of those days where I think about what else I might have chosen to do with my life, &#8220;become a statistician&#8221; is close to the top of the list. (&#8220;Become a meterologist&#8221; is usually #1; they also seem to have a lot of really cool [...]]]></description>
			<content:encoded><![CDATA[<p>I really love reading about statistics. When I&#8217;m having one of those days where I think about what else I might have chosen to do with my life, &#8220;become a statistician&#8221; is close to the top of the list. (&#8220;Become a meterologist&#8221; is usually #1; they also seem to have a lot of really cool data to play with!)</p>
<p>I recently read this short paper about statistical power, and how to interpret non-significant results:</p>
<p>Gelman, A., &amp; Weakliem, D. (2009). <a href="http://www.americanscientist.org/issues/page2/2009/4/of-beauty-sex-and-power">Of Beauty, Sex and Power</a>. American Scientist, 97(4), 310-316. [ <a href="http://www.stat.columbia.edu/~gelman/research/published/power4r.pdf">PDF</a>, on <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/06/of_beauty_sex_a.html">Gelman's blog</a> ]</p>
<p>I&#8217;m having two very different reactions to the paper:</p>
<p>1. Why didn&#8217;t anyone talk much about statistical power and effect size in my intro stats classes?</p>
<p>2. There are implications for social media here, and I don&#8217;t just mean the &#8220;<a href="http://en.wikipedia.org/wiki/Telephone_game">telephone game</a>&#8220;.</p>
<p>The paper is only 7 pages long, so if you ever do any kind of statistical analysis, or even if you just read news stores about science results, you really should read this paper. It presents an example of what I think is a common problem in the reporting of statistical results in scientific journals&#8212;when analyzing a dataset, not enough consideration is given to what effect size one might expect for a given phenomenon. It is very important to think about (and explicitly report) statistical power, because as the authors write, &#8220;&#8216;underpowered&#8217; studies are unlikely to reach statistical significance and, perhaps more importantly, they drastically overestimate effect size estimates. Simply put, the noise is stronger than the signal.&#8221;</p>
<p>As an example of this, the paper critiques the interpretation of statistical results published by <a href="http://personal.lse.ac.uk/KANAZAWA/">Satoshi Kanazawa</a>, from which he drew the provocative conclusion that &#8220;beautiful people have more daughters.&#8221; Incidentally, according to his website he also has a new book coming out tentatively titled, &#8220;Escaping Biology: Why Intelligent People Are the Ultimate Losers in Life&#8221;.</p>
<p>Kanazawa took a publicly available dataset, the National Longitudinal Study of Adolescent Health, and did an analysis that consisted of multiple pairwise comparisons between 4 subjective &#8220;attractiveness&#8221; categories (ratings were made by the interviewers who collected the data). The results he reported came from a comparison of the group with the highest attractiveness vs. everyone else; this comparison yielded an 8% difference, or &#8220;&#8230;a 52 percent chance of girl births for the parents in the highest attractiveness category, compared to a 44 percent chance for the average of the four lower categories&#8221; (p311). However, Gelman &amp; Weakliem did their OWN analysis of the SAME data&#8212;this time a more appropriate linear regression&#8212;and found a 4.7%, non-significant effect.</p>
<p>Gelman &amp; Weakliem also did a review of other published research investigating factors that affect whether a baby will be born male or female, and found that effect sizes tend to be exceedingly small, &#8220;typically less than one percent&#8221; (p311). So they were surprised to see the following report of Kanazawa&#8217;s results in the <a href="http://freakonomics.blogs.nytimes.com/2006/08/02/why-do-beautiful-women-sometimes-marry-unattractive-men/">Freakonomics blog</a>: &#8220;&#8230;good-looking parents are 36% more likely to have a baby daughter as their first child than a baby son&#8230;&#8221;.</p>
<p>They argue that anytime you see an effect size that big, you should be skeptical, for two reasons. The first is that &#8220;&#8230;studies with insufficient statistical power will spit out random results that will occasionally be statistically significant and, even more often, be suggestive&#8230;&#8221; (p315). The second is that &#8220;&#8230;most of the low-hanging fruit in social science research has presumably been plucked, leaving researchers to study small effects&#8221; (also p315). So, in other words, even underpowered studies will produce statistically significant results from time to time, but these results should not be trusted&#8212;especially social science results!</p>
<p>Gelman &amp; Weakliem also make an important point about the scholarly publishing process that allowed these results to be published, repeatedly and with increasing visibility and distortion: &#8220;&#8230;the papers managed to survive the review process because reviewers did not recognize that the power of the studies was such that only very large estimated effects could make it through the statistical-significance filter. The result is essentially a machine for producing exaggerated claims&#8230;&#8221;</p>
<p>So what we have here is a paper that presents unbelievably large effect sizes that got through peer review. For all the attention I pay to the statistics in papers I review, I can&#8217;t say that I&#8217;ve ever paid that much attention to statistical power. I definitely will be looking more closely at it in the future, and being more explicit about it in my own work. But I also continue to be disappointed in most intro stats education; from what I remember, the parts about probability and the logic of significance testing and practical vs. statistical significance take a back seat to the mechanics of it&#8212;which formula should be used where, etc. Maybe this is unavoidable, and students can&#8217;t understand the application and interpretation of statistics until one masters the arithmetic. But I think the application and interpretation parts are the most fun!</p>
<p>In my next post, I&#8217;ll write more about my second reaction to the paper.</p>
<div><span style="font-family: 'Helvetica Neue Light', 'Times New Roman', 'Bitstream Charter', Times, serif;font-size: medium"><br />
</span></div>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2010/04/05/statistical-power-and-effect-size/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>thinking about bibliometrics</title>
		<link>http://bierdoctor.com/2010/03/20/thinking-about-bibliometrics/</link>
		<comments>http://bierdoctor.com/2010/03/20/thinking-about-bibliometrics/#comments</comments>
		<pubDate>Sat, 20 Mar 2010 16:35:57 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[measures]]></category>
		<category><![CDATA[operationalization]]></category>
		<category><![CDATA[publications]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://bierdoctor.com/?p=486</guid>
		<description><![CDATA[This is a post that I&#8217;ve had in my list of drafts since last year&#8217;s CHI conference, which included a special session for a paper titled &#8220;Scientometric Analysis Of The CHI Proceedings&#8220;. The special session was held because the paper was apparently controversial&#8212;it criticized the best paper nomination and selection process: Judging quality remains a [...]]]></description>
			<content:encoded><![CDATA[<p>This is a post that I&#8217;ve had in my list of drafts since last year&#8217;s CHI conference, which included a special session for a paper titled &#8220;<a href="http://portal.acm.org/citation.cfm?id=1518701.1518810">Scientometric Analysis Of The CHI Proceedings</a>&#8220;. The special session was held because the paper was apparently controversial&#8212;it criticized the best paper nomination and selection process:</p>
<blockquote><p>Judging quality remains a difficult task for the initial reviewers, but also for the best paper award committee. Despite its honest efforts, the best paper award committee has not selected papers that are cited more often than other papers. In other words, the best paper award committee did not perform better than random chance&#8230;</p>
<p>We may speculate whether the task of selecting papers that will be highly cited in the future is simply too difficult, if not impossible. In any case, we do have to ask ourselves what the purpose of the award is if it does not correlate with the views of the HCI community. We speculate if it might be worthwhile considering whether the conference attendees should be allowed to vote for the best paper.</p></blockquote>
<p>I&#8217;m not actually sure whether the controversy was due more to disagreement with the speculation that the best paper award might be worthless as a signal of quality (i.e, &#8220;did not perform better than random chance&#8221;), or because the statistical analysis was flawed. I remember both points being raised during the discussion of the paper at the conference, although at the time I was more focused on the statistical arguments. But I&#8217;ve since come across an interesting paper comparing various scientific impact measures, and so I&#8217;m revisiting both criticisms of the paper.</p>
<p>1. The stats are used incorrectly</p>
<p>The conclusion that the CHI best paper nominees and recipients are no better than a randomly selected set of papers was based the results of an ANOVA that compared three samples: the nominees, those nominees selected to receive the award, and a random sample of papers, all from 2004-2007. The dependent variable was number of citations each paper had received.</p>
<p>There&#8217;s little chance these data are normally distributed. In fact, this would be extremely bizarre if it were true. The paper includes evidence supporting this argument: &#8220;Figure 6 shows that there are not only one or two outliers, but a considerable number of them.&#8221; This is because these are count data, and distribution is likely power-law shaped: a couple of papers get a comparably large number of citations, and most are only cited a couple of times. The &#8220;considerable number&#8221; of outliers signals that ANOVA is probably not the best statistical choice. It just doesn&#8217;t make sense to use statistics based on a comparison of means and variances, when the mean does not represent the &#8220;central tendency&#8221; of the distribution.</p>
<p>Also, the sample sizes of the three groups in their ANOVA are extremely unequal (normal=76, nominee=64, winner=12). Their choice of n=76 for the &#8220;normal&#8221; paper group seems like they tried to create equal groups&#8212;but this would only have been true if they lumped together the nominee and winner groups (64+12=76) and compared those with the random set. This is another problem that causes ANOVA results to be biased. So it isn&#8217;t clear how accurately these results represent what&#8217;s going on in the data.</p>
<p>My favorite moment of the panel discussion was when one of the panelists suggested that statistics based on the Poisson distribution would be more appropriate. My memory (almost a year later) is that the presenter (the first author of the paper) responded with something like &#8220;the majority of the CHI audience probably doesn&#8217;t know what that is&#8221; and then asked the audience &#8220;how many of you have heard of the Poisson distribution?&#8221; From where I was sitting, it looked like maybe half the audience raised their hands. It is unlikely that the attendees of this session were a representative sample of the CHI community. But still, it gave me hope that CHI is becoming more statistically sophisticated.</p>
<p>2. The &#8220;quality&#8221; measure doesn&#8217;t measure quality</p>
<p>Most of the analyses reported in the paper use the H-Index, but this is an author- or organization-based indicator, not a paper-based indicator. For the best paper analysis, Bartneck &amp; Hu used the simplest possible measure: &#8220;number of citations&#8221;. However, the paper does not include an argument for how this measure might be related to quality, or in what ways it might be a biased.</p>
<p>At the session last year, I wasn&#8217;t sure how I felt about using &#8220;number of citations&#8221; as a proxy for quality. It certainly is an operationalization that makes the data easy to collect. However, after reading Bollen et al. (2009), &#8220;<a href="http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0006022">A Principal Component Analysis of 39 Scientific Impact Measures</a>&#8221; I really wish the Bartneck &amp; Hu paper had given this operationalization more attention. Bollen et al. point out:</p>
<blockquote><p>In fact, we do not even have a workable definition of the notion of &#8220;scientific impact&#8221; itself, unless we revert to the tautology of defining it as the number of citations received by a publication. As most abstract concepts &#8220;scientific impact&#8221; may be understood and measured in many different ways. The issue thus becomes which impact measures best express its various aspects and interpretations.</p></blockquote>
<p>Just as the title says, Bollen et al. presented a PCA of a bunch of scientific impact measures, essentially illustrating how related several types of measures are to each other (see the paper if you want to read more about the different types of measures&#8212;it&#8217;s really interesting). They plotted the measures according to a solution that included just the top two components. Their qualitative interpretation of this graphical representation is that one component represents <em>rapid vs. delayed</em> impact, and the other represents <em>popularity vs. prestige</em>. Citation-based measures fall in the <em>delayed popularity</em> quadrant of the graph; &#8220;number of citations&#8221; is the simplest citation-based measure.</p>
<p>It makes intuitive sense to me that number of citations is a proxy for delayed popularity. As such, it seems unlikely that the best paper committee is making selections based on their projection of future popularity. Because nomination is part of the review process, it seems like more of a merit-based award to me.</p>
<p>What the Bartneck &amp; Hu best paper critique may be indicating, if you accept the argument that the people who are making the nominations and selections are experts qualified to evaluate merit, is that popularity (measured by number of citations) and merit (as recognized by experts) are unrelated. Which to me, is actually a much more interesting interpretation of these results! Now I really want to redo the analysis using different statistics&#8230; anybody know someone who has already done this?</p>
<p>Finally, there are a couple of things I should point out. First, in the spirit of full disclosure, I have received a best paper nomination&#8212;at last year&#8217;s CHI in fact, for a Note. Regardless, I have no personal discomfort with criticism toward the nomination and selection process. My issue is with the operationalization of &#8220;quality&#8221; in the Bartneck &amp; Hu paper, and with the statistics used to support the criticism. Second, the &#8216;best paper&#8217; section was actually a small part of the paper; most of the analysis focused on presenting a historical account of participation in the conference (represented by accepted papers), broken down by organization and country (of the organization, not the authors).</p>
<p>The papers:</p>
<p>1. Bartneck, C., &amp; Hu, J. (2009). <a href="http://portal.acm.org/citation.cfm?id=1518701.1518810">Scientometric Analysis Of The CHI Proceedings</a>. In CHI &#8217;09 (pp. 699-708).</p>
<p>2. Bollen, J., Sompel, H. V., Hagberg, A., &amp; Chute, R. (2009). <a href="http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0006022">A Principal Component Analysis of 39 Scientific Impact Measures</a>. PLoS ONE, 4(6).</p>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2010/03/20/thinking-about-bibliometrics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>supplemental statistics</title>
		<link>http://bierdoctor.com/2010/02/27/supplemental-statistics/</link>
		<comments>http://bierdoctor.com/2010/02/27/supplemental-statistics/#comments</comments>
		<pubDate>Sat, 27 Feb 2010 07:51:47 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[analysis]]></category>
		<category><![CDATA[in the news]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://bierdoctor.com/?p=472</guid>
		<description><![CDATA[I came across a really interesting paper recently after seeing it referred to in a news story: Female teachers&#8217; math anxiety affects girls&#8217; math achievement (Beilock et al., 2010, PNAS, with Supplemental information) The researchers recruited 17 first- and second-grade teachers (all female) and assessed the math achievement of the students in their classrooms at the [...]]]></description>
			<content:encoded><![CDATA[<p>I came across a really interesting paper recently after seeing it referred to in a news story: <a href="http://www.pnas.org/content/early/2010/01/14/0910967107.abstract">Female teachers&#8217; math anxiety affects girls&#8217; math achievement </a> (Beilock et al., 2010, PNAS, with <a href="http://www.pnas.org/content/early/2010/01/14/0910967107/suppl/DCSupplemental">Supplemental information</a>)</p>
<p>The researchers recruited 17 first- and second-grade teachers (all female) and assessed the math achievement of the students in their classrooms at the beginning and end of the year, as well as the teachers&#8217; anxiety level. They also measured &#8220;students’ beliefs about gender and academic success in domains like math&#8221;. They found that higher teacher math anxiety was associated with an increase in girls&#8217; tendency to adhere to &#8220;boys are good at math, girls are good at reading&#8221; gender stereotypes. They also found that girls who were more likely to hold &#8220;boys are good at math, girls are good at reading&#8221; stereotypes had lower end-of-year math achievement scores. Interestingly, when they put these predictors together in one regression (teacher anxiety and gender stereotypes predicting math achievement), teacher anxiety was &#8220;no longer a significant predictor&#8221; (and the coefficient decreased from -3.33 to -2.48). The paper presents a &#8220;mediation analysis&#8221; called &#8220;bias-corrected bootstrapping&#8221; that suggests math anxiety in female teachers affects girls&#8217; gender stereotypes, which affects math achievement scores. I don&#8217;t know much about this analysis method, so I dug up a couple of papers so I can learn more about it. Yay, stats!</p>
<p>I have two issues with the way the results are presented in this paper. First, it took me way too long to figure out what they actually did. I didn&#8217;t notice the supplemental material initially, which is where the all of the analyses are described, and the text of the actual article is too vague about the statistics for me to believe the results from just that part of the text. I realize that <a href="http://www.pnas.org/site/misc/iforc.shtml#length">PNAS limits submissions to 6 pages</a>, but I feel that for this particular paper the supplemental material is not supplemental at all&#8212;it is essential. After reading the supplement, it is pretty clear that the analysis was adequate.</p>
<p>But, my second issue is that the interpretation of the result seems more concerned with sign and significance than with effect size. The paper doesn&#8217;t ground the numbers in real-world implications, nor does it present descriptive statistics on the instruments used (these are relegated to the online appendix as well). For example, it is impossible to interpret the coefficient in this statement without having some idea what the units mean for both teacher anxiety and math achievement: &#8220;In addition, the more girls at the end of the year endorsed the notion that boys are good at math and girls are good at reading, the lower was their math achievement (r = −0.28, P = 0.025).&#8221; This oversight surprises me. So what if gender stereotype belief is a significant predictor of math achievement for girls, if this is only associated with very small differences in test scores? Yes, it is still interesting that the effect was present in girls and not in boys, but if the magnitude of the effect is small, in my mind the implications of this particular study are more about gender stereotypes and behavior modeling, and less about figuring out how to help girls do better in math. (There&#8217;s a brief acknowledgement of effect size in the third to last paragraph: &#8220;It is important to note that the effects reported in the current work, although significant, are small.&#8221;)</p>
<p>Finally, it&#8217;s interesting to me, and a bit depressing, that in a paper about math anxiety and achievement, the complicated statistics are relegated to an appendix. There is no way to know if the authors expected the statistical analyses to be transparent/obvious enough they didn&#8217;t need to include the details in the paper, or if they felt the paper would be more understandable for readers without the stats. This is something I struggle with&#8212;how to appropriately describe complicated quantitative analyses for a multi-disciplinary audience that may or may not understand what I&#8217;m talking about, or even want to learn. I&#8217;m not sure I like the stats appendix solution, but I like it a lot more than two other alternatives I&#8217;ve seen: 1) the &#8220;sink or swim&#8221; approach&#8212;describing the analyses as if to an expert, and less experienced readers are left to flounder; and 2) only using stats one believes most members of the community should be familiar with.</p>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2010/02/27/supplemental-statistics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>beyond significance testing</title>
		<link>http://bierdoctor.com/2009/12/12/beyond-significance-testing/</link>
		<comments>http://bierdoctor.com/2009/12/12/beyond-significance-testing/#comments</comments>
		<pubDate>Sat, 12 Dec 2009 06:33:42 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[analysis]]></category>
		<category><![CDATA[methods]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://bierdoctor.com/?p=405</guid>
		<description><![CDATA[I&#8217;ve been reading a book lately before bed, a little bit at a time: Beyond Significance Testing, by Rex B. Kline. It isn&#8217;t exactly a suspenseful page-turner; maybe if I tried reading it some other time of day than when I am already sleepy I might be able to get through it faster. The purpose [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been reading a book lately before bed, a little bit at a time: <a href="http://www.amazon.com/Beyond-Significance-Testing-Reforming-Behavioral/dp/1591471184/">Beyond Significance Testing</a>, by Rex B. Kline. It isn&#8217;t exactly a suspenseful page-turner; maybe if I tried reading it some other time of day than when I am already sleepy I might be able to get through it faster.</p>
<p>The purpose of the book is to convince readers that Null Hypothesis Significance Testing (NHST) should no longer be practiced, and to suggest alternatives like using confidence intervals and always reporting effect sizes. I think my favorite quote from the book so far is this one, in a paragraph devoted to the suggestion that one way to fix the NHST problem is just to use more careful, less overreaching language in talking about p-values and significance tests (like phasing out the word &#8220;significant&#8221;):</p>
<blockquote><p>You can put candles in a cow pie, but that does not<br />
make it a birthday cake.</p></blockquote>
<p>You can tell what the author thinks of that idea. Ouch.</p>
<p>Part I of the book also makes an interesting argument that NHST is not only bad social science, it is bad FOR social science. The idea is that because p-values are colloquially understood to mean something they actually do not, researchers believe the findings of a single study are more robust and reliable than they actually are. For example, a p-value represents the conditional probability of the data given the null hypothesis, NOT the probability that the null hypothesis is true given the data. According to the book, this and other misinterpretations about the logic of significance testing cause the literature to be biased toward research results &#8220;about fad topics that clutter the research literature but have little scientific value&#8221;, that are never replicated:</p>
<blockquote><p>&#8230;if one believes that <em>p</em> &lt; .01 implies that the result is likely to be repeated more than 99 times out of 100, why bother to replicate? A related cognitive error is the belief that statistically significant findings should be replicated, but not ones for which [the null hypothesis] was not rejected (F. Schmidt &amp; Hunter, 1997).</p></blockquote>
<p>This bias perpetuates research for which the practical, meaningful significance of the results is not clear.</p>
<p>A lot of these arguments make sense to me&#8212;it seems like the process of NHST hides a lot of the error and uncertainty that is part of doing science, making it seem like the results of individual studies are more definitive and certain than they actually are. I&#8217;m looking forward to making it through the rest of the book and starting to practice the alternatives it suggests.</p>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2009/12/12/beyond-significance-testing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>model selection book</title>
		<link>http://bierdoctor.com/2009/07/09/model-selection-book/</link>
		<comments>http://bierdoctor.com/2009/07/09/model-selection-book/#comments</comments>
		<pubDate>Thu, 09 Jul 2009 21:40:59 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[books]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://madmission.bierdoctor.com/2009/07/09/model-selection-book/</guid>
		<description><![CDATA[In working on the experiment chapter of my dissertation, I found that I was yet again stretching the limits of my statistical knowledge. Fortunately, the Internets and Amazon.com came to my rescue, as they have so many times before. It is amazing how just-in-time access to web pages and online journals (and second-day delivery!) has helped [...]]]></description>
			<content:encoded><![CDATA[<p>In working on the experiment chapter of my dissertation, I found that I was yet again stretching the limits of my statistical knowledge. Fortunately, the Internets and Amazon.com came to my rescue, as they have so many times before. It is amazing how just-in-time access to web pages and online journals (and second-day delivery!) has helped me to improve my research, just in the past few months.</p>
<p>The latest topic of interest is model selection. I won&#8217;t go into the details here &#8212; I need to spend my writing energy getting a draft out the door. But I&#8217;m reading a great little book right now that not only covers the statistical concepts I wanted to learn about, but also has an interesting &#8220;philosophy of science&#8221;-type discussion. I have a feeling the author of this book might have a few problems with <a href="http://www.wired.com/science/discoveries/magazine/16-07/pb_theory">The End of Theory</a>&#8230;</p>
<p>Here&#8217;s the book:  <a href="http://www.amazon.com/Model-Based-Inference-Life-Sciences/dp/0387740732">Model Based Inference in the Life Sciences: A Primer on Evidence</a>, by <a href="http://welcome.warnercnr.colostate.edu/~anderson/">David R. Anderson</a></p>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2009/07/09/model-selection-book/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>results!</title>
		<link>http://bierdoctor.com/2009/05/17/results/</link>
		<comments>http://bierdoctor.com/2009/05/17/results/#comments</comments>
		<pubDate>Sun, 17 May 2009 06:13:59 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[dissertation]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://madmission.bierdoctor.com/2009/05/17/results/</guid>
		<description><![CDATA[after about a month of on-and-off panic about my dissertation experiment analysis (and too many marathon sessions with crappy R documentation), i now feel confident in saying that YES, i do have some pretty interesting results! i&#8217;ll be working on writing everything up for a rapidly approaching paper deadline; i&#8217;m guessing i&#8217;ll be posting bits [...]]]></description>
			<content:encoded><![CDATA[<p>after about a month of on-and-off panic about my dissertation experiment analysis (and too many marathon sessions with crappy R documentation), i now feel confident in saying that YES, i do have some pretty interesting results! i&#8217;ll be working on writing everything up for a rapidly approaching paper deadline; i&#8217;m guessing i&#8217;ll be posting bits as i work on figuring out how to say what needs to be said in the results section, and how best to describe what i think it all means.</p>
<p>i also have to say, what the heck did people do before the internet? i mean, here i am using open-source statistical software, doing fairly nonstandard analyses for my field. and yet, it seems like no matter what problem i encountered either with the tool or in trying to specify the model correctly, somebody else had already figured it out or written a paper on it. for example, i will definitely be citing this really fabulous journal article on generalized linear mixed models <a href="http://www.cell.com/trends/ecology-evolution/abstract/S0169-5347(09)00019-6">(Bolker et. al, 2009)</a> with not one, but TWO online supplements that are equally fabulous. who knows what i would have done without having all this information at my fingertips!!</p>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2009/05/17/results/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>R in the NY Times</title>
		<link>http://bierdoctor.com/2009/01/07/r-in-the-ny-times/</link>
		<comments>http://bierdoctor.com/2009/01/07/r-in-the-ny-times/#comments</comments>
		<pubDate>Thu, 08 Jan 2009 00:21:01 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[in the news]]></category>
		<category><![CDATA[software tools]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://madmission.bierdoctor.com/2009/01/07/r-in-the-ny-times/</guid>
		<description><![CDATA[if any of you haven&#8217;t seen this yet, the NY Times published an article about R! for some strange reason, it didn&#8217;t make the &#8220;most emailed&#8221; feed. go figure. Data Analysts Captured By R&#8217;s Power the first thing i thought when i saw it was, way to go R! i had never heard of R [...]]]></description>
			<content:encoded><![CDATA[<p>if any of you haven&#8217;t seen this yet, the NY Times published an article about R! for some strange reason, it didn&#8217;t make the &#8220;most emailed&#8221; feed. go figure.</p>
<blockquote><p><a href="http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html">Data Analysts Captured By R&#8217;s Power</a></p></blockquote>
<p>the first thing i thought when i saw it was, way to go R! i had never heard of R when i first started my graduate program &#8212; i was still using SPSS. but as a poor graduate student, i didn&#8217;t want to pay yearly renewal fees for expensive licenses, and i started learning R because it is free. i&#8217;ve been hooked ever since.</p>
<p>as for the article in the nytimes, i think this statement isn&#8217;t quite accurate:</p>
<blockquote><p>But R has also quickly found a following because statisticians, engineers and scientists without computer programming skills find it easy to use.</p></blockquote>
<p>i think you definitely need some programming experience to get the command-line interface of R, which is where the power is. i never had much luck with the GUI point-and-click add-on. but i do agree with Daryl Pregibon, who is apparently a research scientist at Google and a fan of R:</p>
<blockquote><p>R is really important to the point that it’s hard to overvalue it.</p></blockquote>
<p>and, i love the part in the article about the reaction of SAS Institute to R:</p>
<blockquote><p>“R has really become the second language for people coming out of grad school now, and there’s an amazing amount of code being written for it,” said Max Kuhn, associate director of nonclinical statistics at Pfizer. “You can look on the SAS message boards and see there is a proportional downturn in traffic.”</p>
<p>SAS says it has noticed R’s rising popularity at universities, despite educational discounts on its own software, but it dismisses the technology as being of interest to a limited set of people working on very hard tasks.</p>
<p>“I think it addresses a niche market for high-end data analysts that want free, readily available code,&#8221; said Anne H. Milley, director of technology product marketing at SAS. She adds, “We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.”</p></blockquote>
<p>sounds a little defensive, eh?</p>
<p>for me, one of the biggest benefits of using R is the really nice looking, professional quality, complex <a href="http://addictedtor.free.fr/graphiques/">graphs and charts R produces</a> &#8212; they kick the ass of anything Excel or SPSS can generate. and, there are times when R reminds me of <a href="http://en.wikipedia.org/wiki/Lisp_programming_language">Lisp</a>, which makes me happy. using R confidently also requires that one be more knowledgeable about the statistics one is performing (and why) than the point-and-click stats packages do &#8212; and i believe this is a good thing! i feel like i&#8217;ve become a much better researcher and user of statistics because of R, and i&#8217;ve been able to do data munging and analysis that i think would be difficult-to-impossible in SPSS.</p>
<p>however, the R documentation can be a bit maddening at times. either it is too terse, or doesn&#8217;t quite cover your exact situation&#8230; but R is open source after all, so my expectations are lower than they might be if i had actually paid for the software. a little persistence usually does the trick.</p>
<p>if you&#8217;re interested in learning more about R, you can find the software at <a href="http://www.r-project.org/">r-project.org</a>, and my favorite introduction to doing basic stuff with R is hosted by the <a href="http://www.ats.ucla.edu/stat/r/notes/">UCLA Statistical Computing service</a>.</p>
<p>UPDATE: the NY Times article hit the &#8220;most emailed&#8221; feed finally, only after it was <a href="http://developers.slashdot.org/article.pl?sid=09/01/07/2316227">posted to Slashdot</a> yesterday evening.</p>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2009/01/07/r-in-the-ny-times/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>even more on NHST</title>
		<link>http://bierdoctor.com/2008/07/25/even-more-on-nhst/</link>
		<comments>http://bierdoctor.com/2008/07/25/even-more-on-nhst/#comments</comments>
		<pubDate>Fri, 25 Jul 2008 04:11:51 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[analysis]]></category>
		<category><![CDATA[reflection]]></category>
		<category><![CDATA[research design]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://madmission.bierdoctor.com/2008/07/25/even-more-on-nhst/</guid>
		<description><![CDATA[Good news on the dvt front: I walked a little over 2 miles yesterday for the first time, and did it again today. Walking still hurts a bit, especially going uphill. But I made it &#8212; and I am definitely feeling better than I was a week ago. Yeah!! So, this is my final post [...]]]></description>
			<content:encoded><![CDATA[<p>Good news on the dvt front: I walked a little over 2 miles yesterday for the first time, and did it again today. Walking still hurts a bit, especially going uphill. But I made it &#8212; and I am definitely feeling better than I was a week ago. Yeah!!</p>
<p>So, this is my final post on NHST, or Null Hypothesis Significance Testing. Not, as Cohen (1994) &#8212; see last post for citation &#8212; pseudo-jokingly wrote: Statistical Hypothesis Inference Testing. I am working my way through a really comprehensive paper on the subject:</p>
<p>Nickerson R. (2000) Null Hypothesis Significance Testing: A Review of an Old and Continuing Controversy. <em>Psychological Methods</em>, 5(2), 241-301. [ <a href="http://cliff.uconn.edu/STAT379/Cohen%20(1994).pdf">pdf</a> ]</p>
<p>I am finding that the authors of papers damning NHST and all those who practice it seem to take an extreme stance in making arguments about the damage being done to science by researchers who have not yet seen the light about Bayesian statistics. Nickerson (2000) seems to agree with me. He wrote:</p>
<blockquote><p>Although many of the participants have stated their positions objectively and gracefully, I have been struck with the stridency of the attacks by some on views that oppose their own. NHST has been described as &#8220;thoroughly discredited,&#8221; a &#8220;perversion of the scientific method,&#8221; a &#8220;bone-headedly misguided procedure,&#8221; &#8220;grotesquely fallacious,&#8221; a &#8220;disaster,&#8221; and &#8220;mindless,&#8221; among other things. Positions, pro or con, have been labeled &#8220;absurd,&#8221; &#8220;senseless,&#8221; &#8220;nonsensical,&#8221; &#8220;ridiculous,&#8221; and &#8220;silly.&#8221; The surety of the pronouncements of some participants on both sides of the debate is remarkable. (p289)</p></blockquote>
<p>Nickerson (2000) is a fairly balanced treatment of the issue. If you&#8217;re only going to read one thing on the subject, his is the paper I recommend. For example, he makes this point about an implicit assumption behind NHST experiments, and it is one I haven&#8217;t seen other authors make:</p>
<blockquote><p>If one is willing to make the assumption that <em>p(D | H<sub>A</sub>)</em> [the probability of the data, assuming the alternative hypothesis is true] is large relative to <em>p(D | H<sub>0</sub>)</em> [the probability of the data, assuming the null hypothesis is true], then one has a legitimate basis for interpreting a small <em>p</em> as evidence for increasing the likelihood of <em>H<sub>A</sub></em> relative to that of <em>H<sub>0</sub></em>. Perhaps this assumption underlines many applications of NHST, but seldom does one see an explicit acknowledgment of it. (p263)</p></blockquote>
<p>What I think he is saying is that researchers conducting an experiment have implicit assumptions and expectations, based on previous work and their own intuition, guiding both the design of the experiment and their interpretation of patterns in the data they&#8217;ve collected. If they&#8217;ve done a thorough job and designed a good experiment, it seems reasonable to interpret small <em>p</em> values as support for the alternate hypothesis (rather than just a sample that is unlikely to occur under the null hypothesis. However, you rarely see an explicit discussion of this type of assumption in the results section of a paper. This also seems like an inherently Bayesian assumption, interpreting <em>p</em> in light of expectations at the beginning of the experiment (or prior probability?).</p>
<p>There are two things I have certainly learned from my foray into NHST &#8220;controversy&#8221;. The first is that the history of science is really interesting and I should take advantage of opportunities to learn more about it. The second is that I really don&#8217;t know enough about probability &#8212; this is something I absolutely should work on.</p>
<p>So, what am I going to do with all of this new knowledge? I am definitely going to be thinking more about prior probabilities and implicit assumptions when I design experiments. And, I will be sure to include a discussion of practical significance whenever I write up research results, including placing more emphasis on power analysis and effect sizes. It seems like the Bayesian approach on a conceptual level (i.e. ignoring the math) involves being more explicit about assumptions and predictions in an experiment, and assigning specific, numeric probabilities to potential outcomes. This level of specificity then allows one to be more precise about where the experiment data fit in the bigger picture. In general, this seems like good advice for becoming a better scientist; however, I am not ready to make the leap and become a Bayesian statistician just yet. Nickerson (2000) has convinced me that it is possible to do real, valid science with NHST as long as one is aware of its weaknesses and the mistaken assumptions that researchers sometimes make:</p>
<blockquote><p>What is the great harm if many people who use NHST believe that <em>p</em> is the probability that the null hypothesis is true, or that a small <em>p</em> is evidence of replicability, or that <em>α</em> is the probability that if one has rejected the null hypothesis one has made a Type I error? Claims to the contrary notwithstanding, there is room for doubt as to whether acquisition of psychological knowledge through experimentation has been greatly impeded by the prevalence of such beliefs or by any of the many other shortcomings of NHST that have been ably identified by its critics. (p289)</p></blockquote>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2008/07/25/even-more-on-nhst/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
