Emilee Rader Rotating Header Image

sampling del.icio.us

so it seems like there might be a way to get del.icio.us to return random URLs. it isn’t clear to me how “random” they might be, and i guess there’s truly no way of knowing for sure short of asking Y!… anyway, this link

http://del.icio.us/recent?random&min=10

brings up a random web page recently posted by someone to del.icio.us that has been posted by at least 10 other people. i tried it without the &min, and it seemed to work then too. what it is *actually* doing, i have no idea. plus, i’m not really sure how “recent” is defined.

it is then possible (i think), using the information on this page

http://del.icio.us/help/json/url

to obtain the del.icio.us ID for the URL and then get the entire history for that URL. so it seems like this is one way to get a random sample of web pages recently posted to del.icio.us — only, just because it was recently posted doesn’t mean it hasn’t been posted before. so i guess this is a way to get a reasonably random sample. we could do this at 5 min intervals for a couple of days, collecting web pages to analyze for interuser agreement, and from which to obtain a large random sample of users (from whom to download their entire posting history).

unfortunately, there’s no reason to expect that in that random sample we’ll be able to identify groups that might be expected to have enough common ground to tag in a similar fashion. rick is probably right in that we should do some ‘purposeful’ sampling to find people for whom we expect this to be the case, to stack the deck so to speak, in order to find out whether it is even POSSIBLE that common ground and tag convergence are correlated (we’d have to run an experiment to determine causality — maybe an idea for rick’s experimental methods class… IV = control vs. common ground vs. interface feedback, DV = interuser agreement. i like it!).

my problem with this is, so what if we find an isolated group of users who use the same tags. we haven’t proven anything except it is possible for that to occur. how do we know whether it is a common enough occurrence for it to be some kind of phenomenon? how big do these “groups” need to be before we can really start calling them “groups”? can we learn something by analyzing one “group” that will help us identify other “groups”, maybe in a larger more random sample?

one difficulty we face is that we have a limited amount of possible information about each del.icio.us user to work with, and most of what is available is stuff we’d want to use for outcome measures, rather than predictive variables. for example, one way of finding people who are interested in similar things is to look for people who use a lot of the same tags. but the hypothesis is that people who have common ground will also tag similarly! we need some other, orthogonal measure of common ground to use as a criterion for selecting people whose tags we want to examine for evidence of convergence, otherwise we have a circular argument.

so, the next step is to figure out what criteria to use to select users who we believe might be ‘related’ enough to tag similarly. by tag similarly i don’t just mean using the same words — i mean using the same words to mean the same thing, either in the same ‘sense’ of the word (i.e. bass the fish vs. bass the musical instrument), or maybe referring to the same URL. it will be a lot harder to distinguish different ‘senses’ in which a tag is used, and then figure out whether two people are using the tag in the same way. it might be easier to calculate what percentage of the time two users use the same tag for the same URL. so, this is the outcome measure (dependent variable) we’re looking for. now, how to identify groups of people for whom this might be true?

hmm. what about finding bloggers who link to each other, and include their del.icio.us ID in their blog? or people who subscribe to each other? would there be too much noise in these samples? what if we restrict it to people who link to each other AND have some number of bookmarks in common? this is like online snowball sampling, using public information. the hypothesis would be that people who link to each other have some amount of common ground, which might be positively correlated with tag convergence (or using tags similarly). i guess the next step is to start the snowball sampling.

Comments are closed.