Emilee Rader Rotating Header Image

statistics. sigh.

I find myself once again this week reading stats papers that range from “slightly over my head” to “I have no idea what you people are talking about,” in an attempt to figure out the right thing to do with a dataset involving observations that are not independent.

The dataset consists of conversations between dyads that took place while they completed two different interactive tasks. The conversations were recorded, transcribed, and segmented into utterances according to some criteria. This means that there are repeated utterances from each participant, and from each dyad. Different research areas use different terms to refer to this kind of setup: repeated measures, panel data, clustered data, etc. The analysis is further complicated by the fact that the predictors and variables are all categorical. Some are binary, the presence or absence of something. The more interesting variables have more than two categories (in some cases, MANY more).

I am trying to estimate the strength with which each of a set of 15+ utterance goals is associated with one of three roles participants assumed as part of the study. To do this, I need to specify a mixed-effects multinomial logit model, with a set of fixed-effects categorical predictors and a hierarchical random effects control for participant within dyad. This involves choosing a reference category of the response variable, and then running a series of binomial logit models that compare all the other levels of the response variable in turn with the reference category.

Here is where I am running into a situation, again, where I am pushing up against what mainstream statistical software packages are reliably capable of, and even R does not seem to be able to do what I want without more programming than my meager statistical background has prepared me for. The problem as I understand it is, each one of the binomial logit models that makes up the multinomial results uses a different subset of the data, excluding those observations that are related to the levels of the response variable not included in the model. This means that the random effects are estimated differently for each binomial logit model, depending on which observations are included in the subset. The upshot of all of this is the overall multinomial model estimates come out differently, depending substantially on which category is chosen as the reference category.

So that’s the problem. However, I did not write this to whine about how I am stuck. I’ve been trying to figure out a solution that I can live with… do I bail completely? Hire a real statistician? How can I figure out how biased the results would be if I were to to do a purely fixed-effects model? (Without random effects controls, any results produced might in fact be due to some unique aspect of the conversation within a particular dyad in a particular role, rather than indicative of something that shows up across all of the dyads.)

Researchers in many fields work with categorical data, and at least some of them over the years must have encountered this problem, whether they knew it or not, and were faced with the same tradeoffs. In order to get the paper out the door they had to just pick a compromise and go with it. But, any results reached due to a compromise are biased in some way. Models like this are just now becoming possible for people like me, with just enough stats knowledge to be dangerous, to run using fairly standard statistical software packages. But what about all the research that has come before — how accurate are those models, and the results they produced? How much do people allow what is statistically feasible to determine their research design, vs. compromising on the analysis after the fact? We all stand on the shoulders of giants, but how often were the giants using naive or incorrect statistics?

digg censorship

In a recent post, I mentioned Facebook’s “Like” button for the web, and wrote about how using the information contributed through all those “Like” button presses is more complicated than just inferring that a “Like” means that someone likes the web page.

I recently came across mention of an alleged “censorship” controversy related to Digg.com [see here, and here for mentions in mainstream media], in which a group of coordinated users apparently succeeded in preventing certain stories they found politically objectionable from reaching the front page of Digg, so these stories would not receive wide exposure. The users achieved this end through what is essentially a thumbs up / thumbs down mechanism fundamental to the way Digg works, by which users vote on whether stories should be promoted or buried. As one blogger points out, opinions about censorship aside, these users were operating within Digg’s available functionality and did not necessarily violate any rules. People who are upset about this use of Digg’s voting mechanism claim the group of users were gaming the system — coordinating which stories to target via another social media application (Yahoo! Groups).

The “gaming the system” and “censorship” aspects of this controversy are less interesting to me personally, than the flexibility of such a simple voting mechanism, used to express an entire political agenda rather than individual, personal preferences. This is an instance of the point I was trying to make (badly?) a few days ago — that even simple mechanisms can be tools for expressing a wide variety of meaning, but that meaning is not obvious from single contributions. In this case, the coordinated intentions of the group only became apparent in aggregate, and only after people who were pissed off about having their stories consistently targeted for “burial” were motivated enough to figure out what was going on. In other words, the meaning behind these actions was not present in the aggregate voting data; was only visible if you already knew where to look.

to share or not to share?

With the CSCW 2011 deadline looming (by the time this post appears it will already have passed), I’ve been thinking about how it wasn’t until I had experienced a bunch of rejections in the first couple years of graduate school that I started having any successes at all. There weren’t a lot of opportunities for me to collaborate with senior people on papers, so I did most of my learning the hard way, by trial and error. I wonder whether it might have helped me get up to speed faster if I had asked around for permission to read rejected papers and the accompanying reviews. I also wonder how people would have felt about those requests.

In the last year of so of grad school, several of my fellow students at a similar stage in the program started doing “paper swaps” before a big deadline. This was an awesome idea brought to us by @jennthom. Each person who was submitting a paper agreed to review at least one other paper, in exchange for feedback on their own paper. This brilliant plan had many benefits: it encouraged each of us to finish things a *little* bit earlier than we would have otherwise, we got to learn more about what our colleagues were working on, and of course we both received feedback on our own papers and got to practice giving feedback to others. The main drawback was that it created more work at an already busy time.

An added benefit not obvious at first was that when it came time to write rebuttals to reviews for submitted papers, we had a group of people who were familiar enough with the papers in question that we could read each others’ reviews and make suggestions for the rebuttals. The great thing about this group of people was that it seemed like nobody was overly sensitive about sharing their reviews — and I think that this was a great learning too for all of us.

I have two questions based on this reflection about paper swaps and sharing reviews, and I’d love feedback if anybody happens to notice this post and wants to share:

1. How do I get something like this started at a new institution? I think what we did in grad school worked because we were a fairly small group who both trusted each other to be helpful, and were in serious need of feedback. I certainly learned a LOT from the experience, and think it would be super valuable for other students to participate in something similar. But how do I convince people the extra work is worth it, and that there is nothing to fear from sharing reviews? To that end, I am perfectly willing to share my own reviews on both accepted and rejected papers, which brings me to my next question…

2. Is it appropriate to share publicly, like on the Internets, reviews for one’s own papers? Would it just be too confusing for people if there were multiple versions of a paper, or even papers that never ended up being published, available on an author’s website along with the accepted papers (even if there were a separate page for them or something)? Would anyone even be interested in seeing these things? Also, do reviewers expect that what they write will be held in confidence? Personally, I always write reviews (and everything else for that matter) as if I am writing for an unknown, public audience — it is so easy to share these things, you never know who might see them. And I don’t want to say anything in a review that I would be unwilling to say to someone in person. I just have no idea how others feel about this.

Q&A

Facebook announced last week that they are introducing a new feature, called “Facebook Questions”. From the description on the Facebook blog, it seems like this new feature is intended to be similar to Yahoo! Answers.

I have to admit, I don’t really “get” Q&A sites. Who are these people that ask questions like, “what is $16 and $8.50 american become in canadian?“, “why do people believe biggie is better than tupac?” and “My turtle has broken his hand? Please Help!!!?” — all from the front page of Yahoo answers. Why do people seem to believe they will get informative, useful answers from random folks on the Internet? Do very many people receive satisfactory answers this way? I know that when Yahoo! Answers appear my search results, they are never helpful for me.

One might argue that Facebook users are already asking and answering questions, via the status updates and comments that are already supported. So what’s the point of “Facebook Questions”? I think there are two:

- By choosing to post a question in “Facebook Questions” rather than as a status update, users are essentially adding metadata to what otherwise would be a status update post, informing Facebook that the contents of this post are a question or an answer. If the question had been asked as a normal status update post, it would be very hard for Facebook to automatically determine whether a status update was in fact a question or an answer. Marking something as a question or an answer makes the information that much more useable for data mining and search.

- Because posts to “Facebook Questions” are public by default (unlike status update posts which can be protected), Facebook has invented a way to circumvent privacy controls for a certain class of posts, allowing them to build up a corpus that could generate more ad revenue, and might even be data others would pay to use.

The question I have, then, is this. It seems pretty clear why Facebook would want people to use “Facebook Questions”. But why would Facebook’s users choose to post their questions to a bunch of strangers this way, rather than doing what they are already doing — posing questions to their friends via their status updates? I guess “if you build it they will come” has pretty much been true for Facebook so far… but it is hard for me to imagine what would motivate people to change their behavior in this way. What’s in it for them?

traffic accidents and social media? Part III

In the previous post, I wrote about how adapting to the invisible pattern of behavior at a traffic intersection requires repeated visits with shared context and visibility of cause and effect, and how social media systems don’t necessarily provide this information.

For example, consider Facebook’s “Like” button for the web. Users can contribute metadata to some web pages by clicking a “Like” button that appears on the page. But what did all these users really *mean* by the contribution? Does clicking the “Like” button represent an endorsement of the content? Support for the author of the content or the content provider? Is it a straightforward or sarcastic “Like”? Is it intended to express a sincere endorsement, or was the contribution motivated by some external incentive? How do other users interpret the fact that one page has 300 “Likes” and another only has 3? How should I interpret it? What does that mean for the next web page I visit with a “Like” button?

There is a cumulative aspect to social media systems that makes them very difficult to design and to study. Right now, one has to actually BUILD the “Like” button to find out how it will be used in practice, because today’s utility or benefit derived from participating depends upon contributions by yesterday’s users, and tomorrow’s contributions are shaped by today’s experiences.

This is a big challenge to the traditional HCI development cycle. Think about designing a spreadsheet or word processor application — the functionality is the same every time. It doesn’t change based on who uses it. Not so with social media systems. The contributions of others are an essential component of the system — PEOPLE, and the data they generate, are a part of the infrastructure of these systems in fundamentally different ways from most other kinds of computing systems. Think about designing a new traffic flow paradigm, vs. a new heads-up display for a car. Driving is at once an isolating and inherently social activity. If an individual driver doesn’t obey the rules, both those codified into law and the norms that have developed, consequences spread well beyond him or her.

The people, and their choices, and the traces left behind by their choices (ever been stuck in a gapers block or delayed by rubbernecking on the highway?) are part of the infrastructure. I argue that users’ contributions are as important to the social media infrastructure as the application features and internet protocols and mobile devices and wireless spectrum are. So design requirements for social media systems are really “enabling technologies” for experiences; the Facebook status update is both a feature and an enabler of the future.