<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Emilee Rader</title>
	<atom:link href="http://bierdoctor.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://bierdoctor.com</link>
	<description>Assistant Professor, Technology &#38; Social Behavior @ Northwestern University</description>
	<lastBuildDate>Thu, 02 Sep 2010 04:50:39 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>statistics. sigh.</title>
		<link>http://bierdoctor.com/2010/09/01/statistics-sigh/</link>
		<comments>http://bierdoctor.com/2010/09/01/statistics-sigh/#comments</comments>
		<pubDate>Thu, 02 Sep 2010 04:50:39 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[analysis]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[infrastructure]]></category>
		<category><![CDATA[research design]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://bierdoctor.com/?p=595</guid>
		<description><![CDATA[I find myself once again this week reading stats papers that range from &#8220;slightly over my head&#8221; to &#8220;I have no idea what you people are talking about,&#8221; in an attempt to figure out the right thing to do with a dataset involving observations that are not independent. The dataset consists of conversations between dyads [...]]]></description>
			<content:encoded><![CDATA[<p>I find myself once again this week reading stats papers that range from &#8220;slightly over my head&#8221; to &#8220;I have no idea what you people are talking about,&#8221; in an attempt to figure out the right thing to do with a dataset involving observations that are not independent.</p>
<p>The dataset consists of conversations between dyads that took place while they completed two different interactive tasks. The conversations were recorded, transcribed, and segmented into utterances according to some criteria. This means that there are repeated utterances from each participant, and from each dyad. Different research areas use different terms to refer to this kind of setup: repeated measures, panel data, clustered data, etc. The analysis is further complicated by the fact that the predictors and variables are all categorical. Some are binary, the presence or absence of something. The more interesting variables have more than two categories (in some cases, MANY more).</p>
<p>I am trying to estimate the strength with which each of a set of 15+ utterance goals is associated with one of three roles participants assumed as part of the study. To do this, I need to specify a mixed-effects multinomial logit model, with a set of fixed-effects categorical predictors and a hierarchical random effects control for participant within dyad. This involves choosing a reference category of the response variable, and then running a series of binomial logit models that compare all the other levels of the response variable in turn with the reference category.</p>
<p>Here is where I am running into a situation, again, where I am pushing up against what mainstream statistical software packages are reliably capable of, and even R does not seem to be able to do what I want without more programming than my meager statistical background has prepared me for. The problem as I understand it is, each one of the binomial logit models that makes up the multinomial results uses a different subset of the data, excluding those observations that are related to the levels of the response variable not included in the model. This means that the random effects are estimated differently for each binomial logit model, depending on which observations are included in the subset. The upshot of all of this is the overall multinomial model estimates come out differently, depending substantially on which category is chosen as the reference category.</p>
<p>So that&#8217;s the problem. However, I did not write this to whine about how I am stuck. I&#8217;ve been trying to figure out a solution that I can live with&#8230; do I bail completely? Hire a real statistician? How can I figure out how biased the results would be if I were to to do a purely fixed-effects model? (Without random effects controls, any results produced might in fact be due to some unique aspect of the conversation within a particular dyad in a particular role, rather than indicative of something that shows up across all of the dyads.)</p>
<p>Researchers in many fields work with categorical data, and at least some of them over the years must have encountered this problem, whether they knew it or not, and were faced with the same tradeoffs. In order to get the paper out the door they had to just pick a compromise and go with it. But, any results reached due to a compromise are biased in some way. Models like this are just now becoming possible for people like me, with just enough stats knowledge to be dangerous, to run using fairly standard statistical software packages. But what about all the research that has come before &#8212; how accurate are those models, and the results they produced? How much do people allow what is statistically feasible to determine their research design, vs. compromising on the analysis after the fact? We all stand on the shoulders of giants, but how often were the giants using naive or incorrect statistics?</p>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2010/09/01/statistics-sigh/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>digg censorship</title>
		<link>http://bierdoctor.com/2010/08/15/digg-censorship/</link>
		<comments>http://bierdoctor.com/2010/08/15/digg-censorship/#comments</comments>
		<pubDate>Sun, 15 Aug 2010 19:46:04 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[in the news]]></category>
		<category><![CDATA[social filtering]]></category>

		<guid isPermaLink="false">http://bierdoctor.com/?p=581</guid>
		<description><![CDATA[In a recent post, I mentioned Facebook&#8217;s &#8220;Like&#8221; button for the web, and wrote about how using the information contributed through all those &#8220;Like&#8221; button presses is more complicated than just inferring that a &#8220;Like&#8221; means that someone likes the web page. I recently came across mention of an alleged &#8220;censorship&#8221; controversy related to Digg.com [see [...]]]></description>
			<content:encoded><![CDATA[<p>In a <a href="http://bierdoctor.com/2010/08/03/traffic-accidents-and-social-media-part-iii/">recent post</a>, I mentioned Facebook&#8217;s &#8220;Like&#8221; button for the web, and wrote about how using the information contributed through all those &#8220;Like&#8221; button presses is more complicated than just inferring that a &#8220;Like&#8221; means that someone likes the web page.</p>
<p>I recently came across mention of an alleged <a href="http://blogs.alternet.org/oleoleolson/2010/08/05/massive-censorship-of-digg-uncovered/">&#8220;censorship&#8221; controversy</a> related to <a href="http://digg.com/">Digg.com</a> [see <a href="http://www.guardian.co.uk/technology/2010/aug/06/digg-investigates-claims-conservative-censorship">here</a>, and <a href="http://www.fastcompany.com/1678342/digg-censorship-wikileaks-conservatives">here</a> for mentions in mainstream media], in which a group of coordinated users apparently succeeded in preventing certain stories they found politically objectionable from reaching the front page of Digg, so these stories would not receive wide exposure. The users achieved this end through what is essentially a thumbs up / thumbs down mechanism fundamental to the way Digg works, by which users vote on whether stories should be promoted or buried. As <a href="http://www.examiner.com/conservative-in-national/the-digg-censorship-controversy-an-alternative-view">one blogger</a> points out, opinions about censorship aside, these users were operating within Digg&#8217;s available functionality and did not necessarily violate any rules. People who are upset about this use of Digg&#8217;s voting mechanism claim the group of users were gaming the system &#8212; coordinating which stories to target via another social media application (Yahoo! Groups).</p>
<p>The &#8220;gaming the system&#8221; and &#8220;censorship&#8221; aspects of this controversy are less interesting to me personally, than the flexibility of such a simple voting mechanism, used to express an entire political agenda rather than individual, personal preferences. This is an instance of the point I was trying to make (badly?) a few days ago &#8212; that even simple mechanisms can be tools for expressing a wide variety of meaning, but that meaning is not obvious from single contributions. In this case, the coordinated intentions of the group only became apparent in aggregate, and only after people who were pissed off about having their stories consistently targeted for &#8220;burial&#8221; were motivated enough to figure out what was going on. In other words, the meaning behind these actions was not present in the aggregate voting data; was only visible if you already knew where to look.</p>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2010/08/15/digg-censorship/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>to share or not to share?</title>
		<link>http://bierdoctor.com/2010/08/10/to-share-or-not-to-share/</link>
		<comments>http://bierdoctor.com/2010/08/10/to-share-or-not-to-share/#comments</comments>
		<pubDate>Tue, 10 Aug 2010 20:04:12 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[advice]]></category>
		<category><![CDATA[reflection]]></category>
		<category><![CDATA[writing]]></category>

		<guid isPermaLink="false">http://bierdoctor.com/?p=576</guid>
		<description><![CDATA[With the CSCW 2011 deadline looming (by the time this post appears it will already have passed), I&#8217;ve been thinking about how it wasn&#8217;t until I had experienced a bunch of rejections in the first couple years of graduate school that I started having any successes at all. There weren&#8217;t a lot of opportunities for [...]]]></description>
			<content:encoded><![CDATA[<p>With the <a href="http://cscw2011.org/">CSCW 2011</a> deadline looming (by the time this post appears it will already have passed), I&#8217;ve been thinking about how it wasn&#8217;t until I had experienced a bunch of rejections in the first couple years of graduate school that I started having any successes at all. There weren&#8217;t a lot of opportunities for me to collaborate with senior people on papers, so I did most of my learning the hard way, by trial and error. I wonder whether it might have helped me get up to speed faster if I had asked around for permission to read rejected papers and the accompanying reviews. I also wonder how people would have felt about those requests.</p>
<p>In the last year of so of grad school, several of my fellow students at a similar stage in the program started doing &#8220;paper swaps&#8221; before a big deadline. This was an awesome idea brought to us by <a href="http://www.jennthom.com/">@jennthom</a>. Each person who was submitting a paper agreed to review at least one other paper, in exchange for feedback on their own paper. This brilliant plan had many benefits: it encouraged each of us to finish things a *little* bit earlier than we would have otherwise, we got to learn more about what our colleagues were working on, and of course we both received feedback on our own papers and got to practice giving feedback to others. The main drawback was that it created more work at an already busy time.</p>
<p>An added benefit not obvious at first was that when it came time to write rebuttals to reviews for submitted papers, we had a group of people who were familiar enough with the papers in question that we could read each others&#8217; reviews and make suggestions for the rebuttals. The great thing about this group of people was that it seemed like nobody was overly sensitive about sharing their reviews &#8212; and I think that this was a great learning too for all of us.</p>
<p>I have two questions based on this reflection about paper swaps and sharing reviews, and I&#8217;d love feedback if anybody happens to notice this post and wants to share:</p>
<p>1. How do I get something like this started at a new institution? I think what we did in grad school worked because we were a fairly small group who both trusted each other to be helpful, and were in serious need of feedback. I certainly learned a LOT from the experience, and think it would be super valuable for other students to participate in something similar. But how do I convince people the extra work is worth it, and that there is nothing to fear from sharing reviews? To that end, I am perfectly willing to share my own reviews on both accepted and rejected papers, which brings me to my next question&#8230;</p>
<p>2. Is it appropriate to share publicly, like on the Internets, reviews for one&#8217;s own papers? Would it just be too confusing for people if there were multiple versions of a paper, or even papers that never ended up being published, available on an author&#8217;s website along with the accepted papers (even if there were a separate page for them or something)? Would anyone even be interested in seeing these things? Also, do reviewers expect that what they write will be held in confidence? Personally, I always write reviews (and everything else for that matter) as if I am writing for an unknown, public audience &#8212; it is so easy to share these things, you never know who might see them. And I don&#8217;t want to say anything in a review that I would be unwilling to say to someone in person. I just have no idea how others feel about this.</p>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2010/08/10/to-share-or-not-to-share/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Q&amp;A</title>
		<link>http://bierdoctor.com/2010/08/07/qa/</link>
		<comments>http://bierdoctor.com/2010/08/07/qa/#comments</comments>
		<pubDate>Sat, 07 Aug 2010 22:18:07 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[in the news]]></category>
		<category><![CDATA[reflection]]></category>

		<guid isPermaLink="false">http://bierdoctor.com/?p=572</guid>
		<description><![CDATA[Facebook announced last week that they are introducing a new feature, called &#8220;Facebook Questions&#8221;. From the description on the Facebook blog, it seems like this new feature is intended to be similar to Yahoo! Answers. I have to admit, I don&#8217;t really &#8220;get&#8221; Q&#38;A sites. Who are these people that ask questions like, &#8220;what is [...]]]></description>
			<content:encoded><![CDATA[<p>Facebook <a href="http://blog.facebook.com/blog.php?post=411795942130">announced</a> last week that they are introducing a new feature, called &#8220;Facebook Questions&#8221;. From the description on the Facebook blog, it seems like this new feature is intended to be similar to <a href="http://answers.yahoo.com/">Yahoo! Answers</a>.</p>
<p>I have to admit, I don&#8217;t really &#8220;get&#8221; Q&amp;A sites. Who are these people that ask questions like, &#8220;<a href="http://answers.yahoo.com/question/index;_ylt=AtClLQImqbT0zH7T9q3LIADj1KIX;_ylv=3?qid=20100804152022AAPIB59">what is $16 and $8.50 american become in canadian?</a>&#8220;, &#8220;<a href="http://answers.yahoo.com/question/index;_ylt=AnLAp20YaICrlkZwuK3UaZjj1KIX;_ylv=3?qid=20100804152017AA75vrD">why do people believe biggie is better than tupac?</a>&#8221; and &#8220;<a href="http://answers.yahoo.com/question/index;_ylt=ApdY8QNnpiBqmWeZ8xArPBXj1KIX;_ylv=3?qid=20100804152013AAuikg2">My turtle has broken his hand? Please Help!!!?</a>&#8221; &#8212; all from the front page of Yahoo answers. Why do people seem to believe they will get informative, useful answers from random folks on the Internet? Do very many people receive satisfactory answers this way? I know that when Yahoo! Answers appear my search results, they are never helpful for me.</p>
<p>One might argue that Facebook users are already asking and answering questions, via the status updates and comments that are already supported. So what&#8217;s the point of &#8220;Facebook Questions&#8221;? I think there are two:</p>
<p>- By choosing to post a question in &#8220;Facebook Questions&#8221; rather than as a status update, users are essentially adding metadata to what otherwise would be a status update post, informing Facebook that the contents of this post are a question or an answer. If the question had been asked as a normal status update post, it would be very hard for Facebook to automatically determine whether a status update was in fact a question or an answer. Marking something as a question or an answer makes the information that much more useable for data mining and search.</p>
<p>- Because posts to &#8220;Facebook Questions&#8221; are public by default (unlike status update posts which can be protected), Facebook has invented a way to circumvent privacy controls for a certain class of posts, allowing them to build up a corpus that could generate more ad revenue, and might even be data others would pay to use.</p>
<p>The question I have, then, is this. It seems pretty clear why <em>Facebook</em> would want people to use &#8220;Facebook Questions&#8221;. But why would Facebook&#8217;s <em>users</em> choose to post their questions to a bunch of strangers this way, rather than doing what they are already doing &#8212; posing questions to their friends via their status updates? I guess &#8220;if you build it they will come&#8221; has pretty much been true for Facebook so far&#8230; but it is hard for me to imagine what would motivate people to change their behavior in this way. What&#8217;s in it for them?</p>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2010/08/07/qa/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>traffic accidents and social media? Part III</title>
		<link>http://bierdoctor.com/2010/08/03/traffic-accidents-and-social-media-part-iii/</link>
		<comments>http://bierdoctor.com/2010/08/03/traffic-accidents-and-social-media-part-iii/#comments</comments>
		<pubDate>Wed, 04 Aug 2010 00:18:01 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[infrastructure]]></category>

		<guid isPermaLink="false">http://bierdoctor.com/?p=541</guid>
		<description><![CDATA[In the previous post, I wrote about how adapting to the invisible pattern of behavior at a traffic intersection requires repeated visits with shared context and visibility of cause and effect, and how social media systems don&#8217;t necessarily provide this information. For example, consider Facebook&#8217;s &#8220;Like&#8221; button for the web. Users can contribute metadata to some [...]]]></description>
			<content:encoded><![CDATA[<p>In the <a href="http://bierdoctor.com/2010/07/31/traffic-accidents-and-social-media-part-ii/">previous post</a>, I wrote about how adapting to the invisible pattern of behavior at a traffic intersection requires repeated visits with shared context and visibility of cause and effect, and how social media systems don&#8217;t necessarily provide this information.</p>
<p>For example, consider <a href="http://techcrunch.com/2010/04/21/facebook-like-button/">Facebook&#8217;s &#8220;Like&#8221; button for the web</a>. Users can contribute metadata to some web pages by clicking a &#8220;Like&#8221; button that appears on the page. But what did all these users really *mean* by the contribution? Does clicking the &#8220;Like&#8221; button represent an endorsement of the content? Support for the author of the content or the content provider? Is it a straightforward or sarcastic &#8220;Like&#8221;? Is it intended to express a sincere endorsement, or was the contribution motivated by some external incentive? How do other users interpret the fact that one page has 300 &#8220;Likes&#8221; and another only has 3? How should I interpret it? What does that mean for the next web page I visit with a &#8220;Like&#8221; button?</p>
<p>There is a cumulative aspect to social media systems that makes them very difficult to design and to study. Right now, one has to actually BUILD the &#8220;Like&#8221; button to find out how it will be used in practice, because today&#8217;s utility or benefit derived from participating depends upon contributions by yesterday&#8217;s users, and tomorrow&#8217;s contributions are shaped by today&#8217;s experiences.</p>
<p>This is a big challenge to the traditional HCI development cycle. Think about designing a spreadsheet or word processor application &#8212; the functionality is the same every time. It doesn&#8217;t change based on who uses it. Not so with social media systems. The contributions of others are an essential component of the system &#8212; PEOPLE, and the data they generate, are a part of the infrastructure of these systems in fundamentally different ways from most other kinds of computing systems. Think about designing a new traffic flow paradigm, vs. a new heads-up display for a car. Driving is at once an isolating and inherently social activity. If an individual driver doesn&#8217;t obey the rules, both those codified into law and the norms that have developed, consequences spread well beyond him or her.</p>
<p>The people, and their choices, and the traces left behind by their choices (ever been stuck in a <a href="http://en.wikipedia.org/wiki/Gapers_block">gapers block</a> or delayed by <a href="http://en.wikipedia.org/wiki/Rubbernecking">rubbernecking</a> on the highway?) are part of the infrastructure. I argue that users&#8217; contributions are as important to the social media infrastructure as the application features and internet protocols and mobile devices and wireless spectrum are. So design requirements for social media systems are really &#8220;enabling technologies&#8221; for experiences; the Facebook status update is both a feature and an enabler of the future.</p>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2010/08/03/traffic-accidents-and-social-media-part-iii/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>traffic accidents and social media? Part II</title>
		<link>http://bierdoctor.com/2010/07/31/traffic-accidents-and-social-media-part-ii/</link>
		<comments>http://bierdoctor.com/2010/07/31/traffic-accidents-and-social-media-part-ii/#comments</comments>
		<pubDate>Sat, 31 Jul 2010 23:00:59 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[infrastructure]]></category>

		<guid isPermaLink="false">http://bierdoctor.com/?p=540</guid>
		<description><![CDATA[In the previous post, I described a traffic accident near-miss where a pedestrian was nearly hit by a car that he didn&#8217;t see coming because it was someplace it shouldn&#8217;t have been according to the physical design of the intersection, but that was exhibiting a very common practice at that particular intersection. I believe the [...]]]></description>
			<content:encoded><![CDATA[<p>In the <a href="http://bierdoctor.com/2010/07/28/traffic-accidents-and-social-media-part-i/">previous post</a>, I described a traffic accident near-miss where a pedestrian was nearly hit by a car that he didn&#8217;t see coming because it was someplace it shouldn&#8217;t have been according to the physical design of the intersection, but that was exhibiting a very common practice at that particular intersection.</p>
<p>I believe the near miss I witnessed in traffic resulted from the existence of invisible infrastructure, developed over time from social patterns. The &#8220;zoom around stopped cars to make the left turn arrow&#8221; behavior happens all the time at this intersection &#8212; so often that when I am driving from the opposite direction, I now leave room for the idiots who think it is ok to suddenly swerve into the oncoming lane. Which of course makes it easier for them to behave this way. This is an informal social practice at this particular intersection &#8212; enabled by the fact that the street is wide enough to accommodate an extra lane of traffic, but perpetuated by the choices made by both the the drivers who choose to zoom ahead and the drivers who choose to get out of the way and allow them room to do so. When I first moved here, I didn&#8217;t know about this practice and was quite surprised to see oncoming traffic driving straight at me in what I thought was my lane. I was obeying the physical signals &#8212; the lines painted on the street to divide the lanes; I did not yet have the same context as the other drivers, which they had built up based on repeated experiences over time.</p>
<p>I am not a <a href="http://en.wikipedia.org/wiki/Traffic_psychology">traffic psychologist</a> or a <a href="http://en.wikipedia.org/wiki/Transport_engineering">transportation engineer</a>. I am sure there are ways to redesign the intersection, or patrol it more carefully, or in some other way make it safer for both motorists and pedestrians. However, I *am* somebody who has a background in Human Computer Interaction and who has spent quite a lot of time wondering what the invisible patterns of behavior look like in social media, and how we might detect, learn from and use these patterns to help users make better choices about what to share, and find the information they need.</p>
<p>For example, I did not adapt to the invisible pattern of behavior at the intersection until I had experienced repeated visits in which I observed and was surprised by the behavior of others. Social media does not provide the information necessary for this kind of adaptation very well:</p>
<p>1. People do visit social media sites repeatedly. But rarely do two different people have exactly the same experience, and it usually isn&#8217;t possible for one person to know how much their experience coincides with another person&#8217;s experience. For example, Facebook doesn&#8217;t look the same to everybody; in fact, because few if any people share the same set of FB friends, no two people experience the very same thing when they visit Facebook. My news feed is unique to me, just as yours is to you. In this way, social media is strangely isolating in ways that we don&#8217;t readily perceive.</p>
<p>2. While social media applications make SO MUCH information contributed by SO MANY people available ALL THE TIME, they do a pretty terrible job of organizing and distilling and presenting the information in such a way that it might be possible to see the hidden patterns. We cannot see and interpret signals that might help us theorize and understand what influenced the content or format or timing of a post or its consequences. Nor can we easily understand raw information about how the aggregate of posts changes over time. We see what other social media users choose to contribute or post; but we don&#8217;t always see the pattern of reactions or consequences that branch out from a post. It was easy for me to see the consequences of the pedestrian&#8217;s expectations about the traffic pattern at the intersection &#8212; he was nearly run over. But it requires a third party, like the news media, to convey information about potential consequences to <a href="http://www.dailymail.co.uk/news/article-1244091/Man-arrested-Twitter-joke-bombing-airport-Terrorism-Act.html">posting what might be perceived as a threatening status update</a> to Twitter.</p>
<p>In Part III, I talk about the cumulative aspect of social media systems, and how the users are part of the infrastructure.</p>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2010/07/31/traffic-accidents-and-social-media-part-ii/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>traffic accidents and social media? Part I</title>
		<link>http://bierdoctor.com/2010/07/28/traffic-accidents-and-social-media-part-i/</link>
		<comments>http://bierdoctor.com/2010/07/28/traffic-accidents-and-social-media-part-i/#comments</comments>
		<pubDate>Wed, 28 Jul 2010 21:40:26 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[infrastructure]]></category>

		<guid isPermaLink="false">http://bierdoctor.com/?p=538</guid>
		<description><![CDATA[I am back from a hiatus that involved finding two apartments, two separate moves, vacation, and a new car purchase, and am finally having science-y thoughts again that I feel like sharing on this blog. Yay! In fact, I have just written a much longer essay than I started out to write about traffic (like, [...]]]></description>
			<content:encoded><![CDATA[<p>I am back from a hiatus that involved finding two apartments, two separate moves, <a href="http://www.flickr.com/photos/bierdoctor/sets/72157624495719834/">vacation</a>, and a <a href="http://www.flickr.com/photos/bierdoctor/4825010409/">new car</a> purchase, and am finally having science-y thoughts again that I feel like sharing on this blog. Yay! In fact, I have just written a much longer essay than I started out to write about traffic (like, physical road traffic, not Internet traffic), social media, and invisible infrastructure created by patterns of behavior. It is probably longer than anybody would want to read at once. So, I&#8217;ve decided to divide it into three parts, posted over the next several days. Part I appears below.</p>
<p>On my drive to campus last week, I witnessed a pedestrian nearly get run over in the street, complete with squealing, smoking tires and a frantic leap to get out of the way. It is hard to say whose fault it was. I was waiting at a red light, maybe 5 or 6 cars back from the intersection. Interestingly, there was no honking involved.</p>
<p>The street I was on is a two-lane street with a short left turn lane, only big enough for maybe 2 cars to wait for the light to change. The pedestrian was crossing the street, but not at the crosswalk &#8212; he decided to walk out into the stopped traffic right in front of the giant SUV I was driving, which I am sure he could not see around. He was wearing a bright orange shirt. He paused as he was passing my car to look to his right, making sure no traffic was approaching from the other direction, before entering the other lane of traffic.</p>
<p>Unfortunately for the pedestrian, unique and invisible traffic rules prevail at this particular intersection. Despite the fact that the left turn lane is quite short, the street is wide enough to potentially accommodate a longer left turn lane that stretches farther back from the intersection. So despite the fact that this a two-lane street, people wanting to turn left often have enough room to create an informal, longer turn lane. I regularly see people pull out into oncoming traffic and speed toward the intersection, passing the waiting cars, in order to make the left turn arrow. This is what happened last week. The man looked to his right, saw no one coming, and proceeded ahead without noticing the car speeding up from his left that shouldn&#8217;t have been there in the first place. He didn&#8217;t even look to his left at all; I presume he was thinking he had already crossed the lane of traffic coming from that direction.</p>
<p>Fortunately, the car&#8217;s brakes worked, and the man was nimble enough to jump out of the way. The driver looked pretty freaked out &#8212; I know I was. The pedestrian got into his car which was parked on the other side of the street. Then the light changed and I continued on my way.</p>
<p>In the forthcoming Part II, I answer the question, what does this all have to do with social media?</p>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2010/07/28/traffic-accidents-and-social-media-part-i/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>now that&#8217;s a lot of link shorteners</title>
		<link>http://bierdoctor.com/2010/05/01/now-thats-a-lot-of-link-shorteners/</link>
		<comments>http://bierdoctor.com/2010/05/01/now-thats-a-lot-of-link-shorteners/#comments</comments>
		<pubDate>Sat, 01 May 2010 20:56:15 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[methods]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[sampling]]></category>

		<guid isPermaLink="false">http://bierdoctor.com/?p=530</guid>
		<description><![CDATA[I&#8217;m writing a script to parse links out of tweets on Twitter (for example, this tweet contains a link), and then look up the URLs in other social media applications like delicious.com and digg.com. One challenge I&#8217;m facing is the plethora of URL shorteners available to people who post to Twitter. A URL shortener is [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m writing a script to parse links out of tweets on Twitter (for example, <a href="http://twitter.com/tfinholt/status/13027426535">this tweet</a> contains a <a href="http://cli.gs/baYUq">link</a>), and then look up the URLs in other social media applications like <a href="http://delicious.com/">delicious.com</a> and <a href="http://digg.com/">digg.com</a>. One challenge I&#8217;m facing is the plethora of URL shorteners available to people who post to Twitter.</p>
<p>A URL shortener is an online service that assigns a short URL to a web page&#8217;s original address, such that the web page can be accessed either by the original URL or by the new, shorter URL. People use these services in conjunction with Twitter, when they&#8217;re including a URL in a tweet. Because tweets can only be 140 characters long, a shorter URL means more characters left over for saying other stuff.</p>
<p>When I say there is a plethora of URL shorteners available, I truly do mean it in the &#8220;extreme excess&#8221; sense of the word. For example, there&#8217;s a Flickr set of screen captures that contains images of the home pages of <a href="http://www.flickr.com/photos/factoryjoe/sets/72157602178338004/">129 different link shorteners</a>, including <a href="http://www.shadyurl.com/">shadyurl.com</a> for when you want your &#8220;shortened&#8221; link to be suspicious and frightening. Bizarre.</p>
<p>Speaking of shady URLs, one issue I have with URL shorteners is that I am never sure what I am going to get when I click on a link from a service like <a href="http://bit.ly/">bit.ly</a>, which is the &#8220;default&#8221; URL shortener of Twitter. Phishing attacks in social network applications (like Facebook and Twitter) are becoming more common, and tricking people into clicking on links that execute code intended to steal passwords, etc., is often the goal.  As a result, I rarely ever follow links that show up in my Twitter feed, or in my email. It just seems to risky to me to click on links from URL shorteners that disguise the ultimate destination.</p>
<p>Interestingly, the distribution of Twitter posts using different URL shorteners has shifted quite a bit over time, as described by a blogger on TechCrunch.com in &#8220;<a href="http://techcrunch.com/2010/01/06/bit-ly-market-share/">What happened to bit.ly&#8217;s market share?</a>&#8221; The article describes a new &#8220;pro&#8221; service offered by <a href="http://bit.ly/">bit.ly</a> that allows content providers to offer shortened links that resemble the actual domain name more closely (i.e., TechCrunch.com becomes <a href="http://twitter.com/#search?q=tcrn.ch">tcrn.ch</a>), and how this makes it appear on the surface that the proportion of tweets using bit.ly has decreased quite a bit.</p>
<p><a href="http://bit.ly/">bit.ly</a> is still the most-used link shortener on Twitter, according to this <a href="http://tweetmeme.com/about/statistics" class="broken_link">lovely pie-chart</a> created by tweetmeme.com and refreshed daily. But bit.ly accounts for 50% or so of shortened links on Twitter, and the next most popular service, <a href="http://tinyurl.com/">tinyurl.com</a>, only 5%. So, back to thinking about my data collection script, I have a couple of options for resolving the URLs I am parsing out of tweets. I can stick with the most popular link shorteners as reported by tweetmeme.com, use the available APIs provided by those services to look up the original URLs, and end up throwing out 30-50% of links. Or, I can resolve all of the links I find, to figure out if they redirect to a different place or not. I&#8217;m leaning towards resolving all the links at this point, but will have to do some testing to make sure it will actually work the way I think it will work.</p>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2010/05/01/now-thats-a-lot-of-link-shorteners/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>datasets available online</title>
		<link>http://bierdoctor.com/2010/04/28/datasets-available-online/</link>
		<comments>http://bierdoctor.com/2010/04/28/datasets-available-online/#comments</comments>
		<pubDate>Thu, 29 Apr 2010 04:09:23 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[sampling]]></category>

		<guid isPermaLink="false">http://bierdoctor.com/?p=525</guid>
		<description><![CDATA[This is a mini-rant about datasets. Specifically, other people&#8217;s datasets that they&#8217;ve made available online. In the past few days I&#8217;ve taken a look at Twitter datasets made available on Infochimps.com, and a tagging dataset made available by Yahoo! Labs through its Sandbox website. The first part of my rant is about how the people [...]]]></description>
			<content:encoded><![CDATA[<p>This is a mini-rant about datasets. Specifically, other people&#8217;s datasets that they&#8217;ve made available online. In the past few days I&#8217;ve taken a look at <a href="http://infochimps.org/collections/twitter-census">Twitter datasets made available on Infochimps.com</a>, and a <a href="http://webscope.sandbox.yahoo.com/">tagging dataset made available by Yahoo! Labs</a> through its Sandbox website.</p>
<p>The first part of my rant is about how the people providing these datasets don&#8217;t give me enough information to decide whether or not the dataset will be useful to me before shelling out hundreds of dollars or filling out a lengthy form or pestering my supervisor for approval.</p>
<p>For example, take a look at the &#8220;<a href="http://infochimps.org/datasets/twitter-census-hashtags-urls-smileys-by-hour">Twitter Census: Hashtags, URLs, Smileys by Hour</a>&#8221; dataset on Infochimps. They want $300 for this dataset without providing a clear description of what it contains, or even how big it is. Nor is there much documentation about how these data were collected, and what, if any, sampling bias may exist.</p>
<p>And the free datasets aren&#8217;t really usable either&#8212;the &#8220;<a href="http://infochimps.org/datasets/twitter-census-tweets-by-hour-tweeted">Twitter Census: Tweets by Hour Tweeted</a>&#8221; dataset has the same problem as the &#8220;Hashtags&#8230;&#8221; one; in addition, when you download the file, you get a tab-delimited text file that contains four columns with no labels! Come on, people, what am I supposed to do with this? Guess?</p>
<p>Regarding the Yahoo! Labs data, one must fill out a lengthy form, including providing a phone number, and describe the &#8220;proposed research project&#8221;, before even being told what types of data might be available. To actually obtain access to a dataset, one must enter the name and email address of their department head, presumably to provide some kind of accountability or oversight of how the data will be used. I definitely don&#8217;t want to bug my boss before I have even seen a snippet of the data.</p>
<p>Places like <a href="http://www.icpsr.umich.edu/icpsrweb/ICPSR/index.jsp">ICPSR</a> at the University of Michigan have been <a href="http://www.icpsr.umich.edu/icpsrweb/ICPSR/access/index.jsp">making data available</a> for a long time; there should be plenty of examples for organizations like these to follow regarding what kind of information researchers need to determine whether a dataset will be useful.</p>
<p>I also want to briefly consider the difference between &#8220;raw data&#8221;, &#8220;dataset&#8221;, and &#8220;sample&#8221;. I don&#8217;t know how others think about using these terms to represent chunks of data they happen to be working with. Take this description of the Yahoo! Labs delicious.com dataset:</p>
<blockquote><p>This dataset represents 100,000 URLs that were bookmarked on Delicious by users of the service. Each URL has been saved at least 100 times. For each URL, the date that it was first bookmarked by a Delicious user is indicated, along with the total number of saves. Also indicated are the ten most commonly used tags for each URL, along with the number of times each tag was used. This dataset provides a view into the nature of popular content in the Delicious social bookmarking system, including how users apply tags to individual items.</p></blockquote>
<p>To me, this describes a very specific, targeted sample taken from the entire delicious.com database (i.e., the &#8220;raw data&#8221;). I think of a &#8220;dataset&#8221; as something smaller than the whole database, but without the degree of aggregation seen in the &#8220;samples&#8221; I mentioned above (i.e., counts of tweets per hour rather than the actual tweets, and tag usage counts rather than the actual bookmarking history). Maybe I am the only one thinking about it in this way.</p>
<p>Ultimately, the Yahoo! Labs delicious.com sample is not very useful to me. For our <a href="http://bierdoctor.com/papers/delicious-cscw-logistic+simulations.pdf">CSCW 2008 tagging paper</a>, we also limited our sample of URLs to those that were bookmarked by 100+ people. But, we scraped delicious.com to obtain the *complete* bookmark histories for each URL, including the timestamp and ALL tags applied by every single user that had bookmarked the web page. The temporal information and complete history were very important for our analysis. However, since we were scraping, we only obtained the data for 30 URLs; I&#8217;d love to be able to do this kind of analysis with a bigger sample, so that we might be able to find clusters of people who all bookmarked the same URLs. But alas, this dataset from Yahoo! Labs is not suitable for this purpose.</p>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2010/04/28/datasets-available-online/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>managing data analysis scripts</title>
		<link>http://bierdoctor.com/2010/04/13/managing-data-analysis-scripts/</link>
		<comments>http://bierdoctor.com/2010/04/13/managing-data-analysis-scripts/#comments</comments>
		<pubDate>Tue, 13 Apr 2010 06:25:00 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[analysis]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[software tools]]></category>

		<guid isPermaLink="false">http://bierdoctor.com/?p=521</guid>
		<description><![CDATA[I&#8217;ve been revisiting the various scripts I wrote to analyze my thesis data, so I can use them again on a new dataset. The problem is, I&#8217;m finding it both easier and harder than I expected to reconstruct what I did. The &#8220;easy&#8221; part is due to the fact that I was apparently totally anal [...]]]></description>
			<content:encoded><![CDATA[<div>I&#8217;ve been revisiting the various scripts I wrote to analyze my thesis data, so I can use them again on a new dataset. The problem is, I&#8217;m finding it both easier and harder than I expected to reconstruct what I did. The &#8220;easy&#8221; part is due to the fact that I was apparently totally anal about writing down EVERYTHING I was doing, and sometimes even why I was doing it. The &#8220;hard&#8221; part is because I wasn&#8217;t always as consistent as I should have been, and I recorded a lot of useless stuff along with what I really needed to keep track of.</div>
<p><div>For this project, I ended up writing scripts in both Ruby and R, and lots of SQL both incorporated into the scripts and in standalone text files. The experiment application I used to collect the data has its own specific implementation details that exist only in the head of the developer (not me), the experiment itself has a structure that is important for the analysis and is incorporated into the structure of the backend database, and I used a bunch of different R packages for connecting to the database and for specific analyses that have their own requirements and constraints. This is all stuff I had to document and keep track of, in addition to the actual analysis scripts. I also kept a detailed record of ALL the data cleaning I did, so that if I ever had to re-create the final dataset, it would actually be possible.</div>
<p><div>As I worked through the analysis over a period of 4-5 months, I was apparently pretty obsessed with keeping a record of *everything* I tried&#8212;meaning every script, data file, graph, or other product of analysis, even if it didn&#8217;t work out very well&#8212;on the off chance I might want to use it later. I thought I was doing myself a favor, and indeed, it is WAY better to have gone a little overboard with this than not to have done it at all.</div>
<p><div>However, one problem I&#8217;m running into is that while I have documentation (of varying levels of detail) in nearly every script file, the intermediate data files are not themselves commented. So I have to make guesses based on which script file names go with what data file names (also an area where I was pretty consistent, but not 100%) and go crawling through various scripts to figure out which one produced a particular data file and which other one takes it as input. I kept a &#8220;lab notebook&#8221; of sorts&#8212;just a text file, stored in the Mac app <a href="http://www.barebones.com/products/Yojimbo/">Yojimbo</a> with the rest of my research-related notes and ideas&#8212;but this is yet ANOTHER separate file I have to look at, and it doesn&#8217;t have all the information I need about dependencies.</div>
<p><div>Another problem I&#8217;m having is that I didn&#8217;t know exactly what might end up being garbage and what would actually be useful while I was doing the analysis; typically, I don&#8217;t figure this out until a non-trivial chunk of time has passed after I have written a paper that used a particular set of scripts and data files analysis. But months after a paper has been submitted, it is really hard to go back and separate the useful from the useless bits of analysis; enough time has passed that I don&#8217;t remember off the top of my head what actually ended up being used, and both useful and useless code is mixed together in the same files so it would require re-acquainting myself with everything before I&#8217;d be able to separate things out. The impetus to do this housekeeping work just doesn&#8217;t exist at any point in the research cycle for me, I guess.</div>
<p><div>What I&#8217;m looking for is a better way to manage all of this information, that isn&#8217;t too onerous when I&#8217;m in the throes of analysis, but also makes it relatively painless to reconstruct what I did at a later time. For example, after spending several hours poring over my &#8220;lab notebook&#8221; and files, I feel like I probably have all of the information I need to reconstruct my thesis analysis; but, that reconstruction is going to hurt.</div>
<p><div>I&#8217;m not even sure how to ask the Internets for a solution to my problem&#8230; version control might help with part of it, but to my knowledge that type of system won&#8217;t help me manage dependencies between three different file types and a bunch of intermediate data files. Maybe my analysis process is at fault&#8212;my scripts are too big and cumbersome (i.e., they try to do too many things), and my (irrational?) need to save out *every* data table to a file rather than re-computing it when I need it just confuses things. Anybody solve this problem for themselves, and want to give me some tips?</div>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2010/04/13/managing-data-analysis-scripts/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
