I find myself once again this week reading stats papers that range from “slightly over my head” to “I have no idea what you people are talking about,” in an attempt to figure out the right thing to do with a dataset involving observations that are not independent. The dataset consists of conversations between dyads [...]
Posts under ‘data’
digg censorship
In a recent post, I mentioned Facebook’s “Like” button for the web, and wrote about how using the information contributed through all those “Like” button presses is more complicated than just inferring that a “Like” means that someone likes the web page. I recently came across mention of an alleged “censorship” controversy related to Digg.com [see [...]
now that’s a lot of link shorteners
I’m writing a script to parse links out of tweets on Twitter (for example, this tweet contains a link), and then look up the URLs in other social media applications like delicious.com and digg.com. One challenge I’m facing is the plethora of URL shorteners available to people who post to Twitter. A URL shortener is [...]
datasets available online
This is a mini-rant about datasets. Specifically, other people’s datasets that they’ve made available online. In the past few days I’ve taken a look at Twitter datasets made available on Infochimps.com, and a tagging dataset made available by Yahoo! Labs through its Sandbox website. The first part of my rant is about how the people [...]
managing data analysis scripts
I’ve been revisiting the various scripts I wrote to analyze my thesis data, so I can use them again on a new dataset. The problem is, I’m finding it both easier and harder than I expected to reconstruct what I did. The “easy” part is due to the fact that I was apparently totally anal [...]