<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Emilee Rader &#187; software tools</title>
	<atom:link href="http://bierdoctor.com/category/software-tools/feed/" rel="self" type="application/rss+xml" />
	<link>http://bierdoctor.com</link>
	<description>Assistant Professor, Technology &#38; Social Behavior @ Northwestern University</description>
	<lastBuildDate>Thu, 02 Sep 2010 04:50:39 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>managing data analysis scripts</title>
		<link>http://bierdoctor.com/2010/04/13/managing-data-analysis-scripts/</link>
		<comments>http://bierdoctor.com/2010/04/13/managing-data-analysis-scripts/#comments</comments>
		<pubDate>Tue, 13 Apr 2010 06:25:00 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[analysis]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[software tools]]></category>

		<guid isPermaLink="false">http://bierdoctor.com/?p=521</guid>
		<description><![CDATA[I&#8217;ve been revisiting the various scripts I wrote to analyze my thesis data, so I can use them again on a new dataset. The problem is, I&#8217;m finding it both easier and harder than I expected to reconstruct what I did. The &#8220;easy&#8221; part is due to the fact that I was apparently totally anal [...]]]></description>
			<content:encoded><![CDATA[<div>I&#8217;ve been revisiting the various scripts I wrote to analyze my thesis data, so I can use them again on a new dataset. The problem is, I&#8217;m finding it both easier and harder than I expected to reconstruct what I did. The &#8220;easy&#8221; part is due to the fact that I was apparently totally anal about writing down EVERYTHING I was doing, and sometimes even why I was doing it. The &#8220;hard&#8221; part is because I wasn&#8217;t always as consistent as I should have been, and I recorded a lot of useless stuff along with what I really needed to keep track of.</div>
<p><div>For this project, I ended up writing scripts in both Ruby and R, and lots of SQL both incorporated into the scripts and in standalone text files. The experiment application I used to collect the data has its own specific implementation details that exist only in the head of the developer (not me), the experiment itself has a structure that is important for the analysis and is incorporated into the structure of the backend database, and I used a bunch of different R packages for connecting to the database and for specific analyses that have their own requirements and constraints. This is all stuff I had to document and keep track of, in addition to the actual analysis scripts. I also kept a detailed record of ALL the data cleaning I did, so that if I ever had to re-create the final dataset, it would actually be possible.</div>
<p><div>As I worked through the analysis over a period of 4-5 months, I was apparently pretty obsessed with keeping a record of *everything* I tried&#8212;meaning every script, data file, graph, or other product of analysis, even if it didn&#8217;t work out very well&#8212;on the off chance I might want to use it later. I thought I was doing myself a favor, and indeed, it is WAY better to have gone a little overboard with this than not to have done it at all.</div>
<p><div>However, one problem I&#8217;m running into is that while I have documentation (of varying levels of detail) in nearly every script file, the intermediate data files are not themselves commented. So I have to make guesses based on which script file names go with what data file names (also an area where I was pretty consistent, but not 100%) and go crawling through various scripts to figure out which one produced a particular data file and which other one takes it as input. I kept a &#8220;lab notebook&#8221; of sorts&#8212;just a text file, stored in the Mac app <a href="http://www.barebones.com/products/Yojimbo/">Yojimbo</a> with the rest of my research-related notes and ideas&#8212;but this is yet ANOTHER separate file I have to look at, and it doesn&#8217;t have all the information I need about dependencies.</div>
<p><div>Another problem I&#8217;m having is that I didn&#8217;t know exactly what might end up being garbage and what would actually be useful while I was doing the analysis; typically, I don&#8217;t figure this out until a non-trivial chunk of time has passed after I have written a paper that used a particular set of scripts and data files analysis. But months after a paper has been submitted, it is really hard to go back and separate the useful from the useless bits of analysis; enough time has passed that I don&#8217;t remember off the top of my head what actually ended up being used, and both useful and useless code is mixed together in the same files so it would require re-acquainting myself with everything before I&#8217;d be able to separate things out. The impetus to do this housekeeping work just doesn&#8217;t exist at any point in the research cycle for me, I guess.</div>
<p><div>What I&#8217;m looking for is a better way to manage all of this information, that isn&#8217;t too onerous when I&#8217;m in the throes of analysis, but also makes it relatively painless to reconstruct what I did at a later time. For example, after spending several hours poring over my &#8220;lab notebook&#8221; and files, I feel like I probably have all of the information I need to reconstruct my thesis analysis; but, that reconstruction is going to hurt.</div>
<p><div>I&#8217;m not even sure how to ask the Internets for a solution to my problem&#8230; version control might help with part of it, but to my knowledge that type of system won&#8217;t help me manage dependencies between three different file types and a bunch of intermediate data files. Maybe my analysis process is at fault&#8212;my scripts are too big and cumbersome (i.e., they try to do too many things), and my (irrational?) need to save out *every* data table to a file rather than re-computing it when I need it just confuses things. Anybody solve this problem for themselves, and want to give me some tips?</div>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2010/04/13/managing-data-analysis-scripts/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>web applications</title>
		<link>http://bierdoctor.com/2010/02/25/web-applications/</link>
		<comments>http://bierdoctor.com/2010/02/25/web-applications/#comments</comments>
		<pubDate>Thu, 25 Feb 2010 06:50:36 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[advice]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[software tools]]></category>

		<guid isPermaLink="false">http://bierdoctor.com/?p=467</guid>
		<description><![CDATA[Alina Lungeanu and I started collecting data last week on our experiment! I don&#8217;t want to say too much about the hypotheses, etc. in case potential participants google me and find this blog post, so instead today I&#8217;m writing about why I&#8217;m glad I&#8217;m not a web application developer. For the experiment we are using [...]]]></description>
			<content:encoded><![CDATA[<p>Alina Lungeanu and I started collecting data last week on our experiment! I don&#8217;t want to say too much about the hypotheses, etc. in case potential participants google me and find this blog post, so instead today I&#8217;m writing about why I&#8217;m glad I&#8217;m not a web application developer.</p>
<p>For the experiment we are using the same web application created for my dissertation research, with a few small tweaks, and a new set of materials. Whenever you&#8217;re doing a study that involves participants using a prototype or other system built specifically for the experiment, it is imperative to do a lot of testing. The last thing you want is for the results of the study to reflect bugs or usability problems and not the actual phenomena of interest. So, before using the experiment app for my dissertation research, I set aside plenty of time for testing and recruited people to bang on the system and try to break it.</p>
<p>This time around, the tweaks to the system were so minor that I basically tested use cases that involved the new features, and nothing else. I figured not much had changed, so I could assume what worked before would still be working. This, as it turns out, is an assumption that doesn&#8217;t hold true in the wonderful world of web application development. With a web application, it isn&#8217;t just the application code itself you have to worry about. About a year has gone by since my initial data collection, and in that time web browsers have gone through several rounds of updates and major releases. Also, we&#8217;re using a different web server this time around. And finally, there&#8217;s been an update to one of the toolkits the application uses for the file-and-folder interface. So in reality, a LOT has changed from a year ago.</p>
<p>Fortunately, in the first experiment session we uncovered a minor &#8220;<a href="http://en.wikipedia.org/wiki/Race_condition">race condition</a>&#8221; bug that hadn&#8217;t presented itself in either my dissertation data collection, or testing for this experiment (I say &#8220;fortunately&#8221; because we discovered the problem early). A race condition exists when multiple related (but separate) requests are sent from the client to the web server. Because these are *separate* requests, there&#8217;s no explicit sequencing, and unpredictable or undesirable application behavior can result if/when these requests are processed in the wrong order. This was a simple bug to fix, and so far no other bugs have presented themselves.</p>
<p>The reason I am glad I&#8217;m not a web application developer, is with all these infrastructural components that can change (browsers, servers, toolkits&#8230;), keeping a web application working seems to be like hitting a moving target. Firefox 3.6 included optimizations to <a href="http://hacks.mozilla.org/2010/01/javascript-speedups-in-firefox-3-6/">speed up javascript</a>, for example, which may have contributed to the race condition bug in the experiment app. A new version of Internet Explorer was released, and the toolkit the experiment app uses also released a new version with changes based on the changes to IE. It amazes me that Gmail and all those other web apps I use on a daily basis continue to work at all!</p>
<p>So my advice to anyone considering using a home-grown web application in their research is, come up with a <a href="http://en.wikipedia.org/wiki/Test_suite">test suite</a>, document it, and run through all the test cases *every time* you intend to use the application in a new study. Even if the application itself hasn&#8217;t changed.</p>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2010/02/25/web-applications/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>productivity</title>
		<link>http://bierdoctor.com/2009/06/02/productivity/</link>
		<comments>http://bierdoctor.com/2009/06/02/productivity/#comments</comments>
		<pubDate>Tue, 02 Jun 2009 18:27:48 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[administrivia]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[reflection]]></category>
		<category><![CDATA[software tools]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://madmission.bierdoctor.com/2009/06/02/productivity/</guid>
		<description><![CDATA[well, the paper is submitted. but man, i NEVER want to do that again. and by &#8220;that&#8221; i mean write a single-author paper in about a week. i&#8217;d been working on analysis (along with all my other dissertation- and work-related stuff) for months, but when we returned from the holiday weekend &#8212; where i tried [...]]]></description>
			<content:encoded><![CDATA[<p>well, the paper is submitted. but man, i NEVER want to do that again. and by &#8220;that&#8221; i mean write a single-author paper in about a week. i&#8217;d been working on analysis (along with all my other dissertation- and work-related stuff) for months, but when we returned from the holiday weekend &#8212; where i tried and failed to write &#8212; all i had done was a bunch of statistics, graphs, and notes.</p>
<p>i&#8217;ve been using this service called <a href="http://www.rescuetime.com/">RescueTime</a> for the past several weeks as a way to track my hours for different projects i am working on, and as an indicator of my productivity in general. basically, you install a little app on your computer, and it sends data about what applications are active to the RescueTime server. you can log in and see reports of how much time you are spending looking at which apps and web pages (for $8/mo. you can get reports broken down by window title, not just application).</p>
<p>i have been happy to learn that i don&#8217;t &#8220;waste&#8221; as much time as i might have thought. but this past week isn&#8217;t a very accurate indication of my normal work habits. i went from notes and graphs to a 10-page ACM-format paper in a week:</p>
<p><a href="http://bierdoctor.com/images/gif/rescuetime.gif" target="_blank"><img src="http://bierdoctor.com/images/png/0526.png" border="0" height="481" width="391" /></a><br />
(<a href="http://bierdoctor.com/images/gif/rescuetime.gif" target="_blank">click for animated gif</a> showing May 26 &#8211; June 1)</p>
<p>it&#8217;s nice to see that i&#8217;ve still &#8220;got it&#8221;, i guess. but that was not a fun week.</p>
<p>i highly recommend RescueTime, if like me you want to be more meta about how you spend your time, and like looking at data.</p>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2009/06/02/productivity/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>thinking about software</title>
		<link>http://bierdoctor.com/2009/02/17/thinking-about-software/</link>
		<comments>http://bierdoctor.com/2009/02/17/thinking-about-software/#comments</comments>
		<pubDate>Tue, 17 Feb 2009 22:13:39 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[analysis]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[software tools]]></category>

		<guid isPermaLink="false">http://madmission.bierdoctor.com/2009/02/17/thinking-about-software/</guid>
		<description><![CDATA[i&#8217;ve been writing some code lately. well, scripting really, mostly to get data into a useful format and then do something with it &#8212; nothing really complicated. i wish i was faster at it, but the path i chose as an undergrad did not give me a lot of programming practice. my basic problem is [...]]]></description>
			<content:encoded><![CDATA[<p>i&#8217;ve been writing some code lately. well, scripting really, mostly to get data into a useful format and then do something with it &#8212; nothing really complicated. i wish i was faster at it, but the path i chose as an undergrad did not give me a lot of programming practice. my basic problem is that it takes me a while to learn new tools and programming languages, and i am always feeling behind so i only learn how to do what i need to do *right now*, rather than trying to gain some real expertise.</p>
<p>for example, i needed to learn a bit of perl in order to download and parse log data from a learning management system, and data from del.icio.us. i understand what these perl scripts do, i can adapt and extend them fairly easily to other similar kinds of tasks, but i wouldn&#8217;t really say that i *know* perl. for my dissertation experiment, the data collection system was written using ruby on rails, and because of this there is a really useful API for getting information out of the database. so, i learned a little bit of ruby, to save myself some time writing scripts to process and analyze the data. but like perl, i wouldn&#8217;t say that i *know* ruby. i also know just enough about scripting in R to write simple functions to get data from a database, manipulate it, and do stats or plot things. i know just enough python to get data from a database (sensing a pattern here?), munge it, and spit out some SVG graphics. but all this dabbling has not made me a better (faster) programmer.</p>
<p>my inspiration for this post was a <a href="http://www.bioinformaticszen.com/software/why_write_good_software">post on bioinformatics zen</a>, a blog that i sometimes read. i find that a lot of the coding i do is fairly similar to what the author of that blog says bioinformaticians do:  &#8220;a set of flat files, Perl scripts to parse out required rows, with R scripts to plot the results&#8221;. i document my scripts like crazy, but not out of some desire to write good software &#8212; i am just not good enough to instantly remember what i was doing and why i did it that way when i revisit a particular script i wrote two years ago. i sometimes fret about how slow i am, and how crappy my software is. i also fret about bugs that might cause embarrassing errors and affect the outcome of the analysis &#8212; in a sense, your results are only as good as the software you write (a point raised in <a href="http://biotext.org.uk/on-the-importance-of-testing-in-research-software/">another blog, biotext.org.uk</a>).</p>
<p>i guess i wish i had learned more and become better at doing this kind of work earlier in life, so by now i would be more of an expert. but the fact remains, it is doing the analyses and uncovering the results that keep me excited and engaged until 4am &#8212; not the writing and debugging of the scripts. writing code is a means to an end &#8212; it&#8217;s not something i *love* enough to spend what little free time i have learning how to be better at it. so, the question is, is there a more systematic way for me to approach this so i can pick up programming skills that will help me be more efficient? or is my haphazard plunking away as the need arises the best course of action, because at least then i am truly motivated to do the work?</p>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2009/02/17/thinking-about-software/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>on backing up</title>
		<link>http://bierdoctor.com/2009/01/19/on-backing-up/</link>
		<comments>http://bierdoctor.com/2009/01/19/on-backing-up/#comments</comments>
		<pubDate>Mon, 19 Jan 2009 22:39:49 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[administrivia]]></category>
		<category><![CDATA[in the news]]></category>
		<category><![CDATA[software tools]]></category>
		<category><![CDATA[tangential]]></category>
		<category><![CDATA[travel]]></category>

		<guid isPermaLink="false">http://madmission.bierdoctor.com/2009/01/19/on-backing-up/</guid>
		<description><![CDATA[i&#8217;ve been thinking a lot about backups lately, and have been inspired to make sure my dissertation is backed up so that if anything were to happen to my laptop all the work i&#8217;ve done won&#8217;t just disappear. when i worked in industry, it was somebody else&#8217;s job to make sure my laptop backed up [...]]]></description>
			<content:encoded><![CDATA[<p>i&#8217;ve been thinking a lot about backups lately, and have been inspired to make sure my dissertation is backed up so that if anything were to happen to my laptop all the work i&#8217;ve done won&#8217;t just disappear. when i worked in industry, it was somebody else&#8217;s job to make sure my laptop backed up automatically every day at noon (at Moto i guess they expected everybody to go to lunch at that time, so the degraded system performance wouldn&#8217;t matter as much). but as a student who owns my own computing equipment, it is my responsibility to make sure i&#8217;m prepared in the event something catastrophic happens to my laptop. i have two external USB backup drives at home: one that is a backup of my laptop, and another that is a backup of the backup in case the first drive fails. i back up as often as i remember to, which ends up being every 1-2 weeks.</p>
<p>however, the <a href="http://www.nytimes.com/2009/01/16/nyregion/16crash.html?_r=1&amp;bl=&amp;ei=5087&amp;en=f26292a9e53ee37f&amp;ex=1232168400&amp;pagewanted=all">recent crash of the US Airways Airbus in the Hudson River</a> inspired me to find a solution for backing up my most critical data (i.e., my dissertation) more frequently. everyone survived that crash, but had to leave carryon baggage behind when they evacuated. my anecdotal impression is that 100% survival in a plane crash seems like the exception rather than the rule, and it is more likely that if i were to be in a crash that the last thing i would be worried about is whether my dissertation is safe (duh). but, nevertheless, at this point losing a couple weeks worth of work feels like it would be a personal disaster. i know, i know, i am not normal.</p>
<p>so i spent some time over the weekend trying to figure out how to automatically back up my dissertation (about 1GB of data) on the university network. i&#8217;m sure i could subscribe to some kind of backup service, but honestly, i was looking for something that wouldn&#8217;t cost me anything. as it turns out, this was a nontrivial undertaking, and the solution is a bit of a kludge so i may end up going the subscription route anyway eventually.</p>
<p>i had already purchased backup software that i&#8217;ve been using for about a year, and i like it: <a href="http://decimus.net/synk_standard.php">Synk Standard</a>. it has a fairly usable interface for selecting which folders and files to back up to what locations, it supports having multiple backup scripts that can be scheduled separately or be triggered upon mounting of a particular drive, and it will even automatically connect to a network drive via <a href="http://www.samba.org/cifs/docs/what-is-smb.html">smb/cifs</a>. however, there&#8217;s the small matter of the VPN &#8212; one cannot connect to this particular university network from home (which is usually where i am in the middle of the night when i want this backup to take place) without first firing up the VPN client and connecting, something that Synk is unable to do successfully. And then there&#8217;s the matter of disconnecting the VPN client when finished, which Synk can&#8217;t do either. Synk does have the capability to open a file or program before starting the backup script and after finishing the script&#8230; i was able to get it to trigger the VPN client but it subsequently refused to start the backup. and Synk also was unable execute the AppleScript i wrote to disconnect and close the client &#8212; it opened the file in the script editor instead.</p>
<p>i needed some way to automatically launch and then quit the VPN client. it turns out that iCal (on the Mac) can launch a program or run a script as part of an alarm for an event. who knew! so, in addition to scheduling the backup to take place at a certain time through Synk, i also created events in iCal for starting up the VPN client (which thankfully connects automatically once it is started) and shutting down the client. these events are is triggered automatically by iCal at the appointed time, whether or not the application is running.</p>
<p>as if this weren&#8217;t complicated enough, through testing i found out that for some reason files copied over to the server have a modified date but no creation date. i didn&#8217;t know this was possible, and i&#8217;m guessing it might have something to do with Mac vs. Windows issues. but this meant that the backup program was copying EVERY SINGLE FILE every time it ran. this was totally unacceptable, and the only way i could figure out to get around the problem was really ugly. i set up a rule in the backup software such that only files modified AFTER the date of the first backup would be copied over. naturally, this is a stopgap, and i&#8217;ll have to change that date every so often or more and more files will be copied over every single time. but for the time being, it works ok. my dissertation will be backed up every night, and i don&#8217;t have to lose sleep over leaving my carryon behind in the event of a water landing.</p>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2009/01/19/on-backing-up/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>R in the NY Times</title>
		<link>http://bierdoctor.com/2009/01/07/r-in-the-ny-times/</link>
		<comments>http://bierdoctor.com/2009/01/07/r-in-the-ny-times/#comments</comments>
		<pubDate>Thu, 08 Jan 2009 00:21:01 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[in the news]]></category>
		<category><![CDATA[software tools]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://madmission.bierdoctor.com/2009/01/07/r-in-the-ny-times/</guid>
		<description><![CDATA[if any of you haven&#8217;t seen this yet, the NY Times published an article about R! for some strange reason, it didn&#8217;t make the &#8220;most emailed&#8221; feed. go figure. Data Analysts Captured By R&#8217;s Power the first thing i thought when i saw it was, way to go R! i had never heard of R [...]]]></description>
			<content:encoded><![CDATA[<p>if any of you haven&#8217;t seen this yet, the NY Times published an article about R! for some strange reason, it didn&#8217;t make the &#8220;most emailed&#8221; feed. go figure.</p>
<blockquote><p><a href="http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html">Data Analysts Captured By R&#8217;s Power</a></p></blockquote>
<p>the first thing i thought when i saw it was, way to go R! i had never heard of R when i first started my graduate program &#8212; i was still using SPSS. but as a poor graduate student, i didn&#8217;t want to pay yearly renewal fees for expensive licenses, and i started learning R because it is free. i&#8217;ve been hooked ever since.</p>
<p>as for the article in the nytimes, i think this statement isn&#8217;t quite accurate:</p>
<blockquote><p>But R has also quickly found a following because statisticians, engineers and scientists without computer programming skills find it easy to use.</p></blockquote>
<p>i think you definitely need some programming experience to get the command-line interface of R, which is where the power is. i never had much luck with the GUI point-and-click add-on. but i do agree with Daryl Pregibon, who is apparently a research scientist at Google and a fan of R:</p>
<blockquote><p>R is really important to the point that it’s hard to overvalue it.</p></blockquote>
<p>and, i love the part in the article about the reaction of SAS Institute to R:</p>
<blockquote><p>“R has really become the second language for people coming out of grad school now, and there’s an amazing amount of code being written for it,” said Max Kuhn, associate director of nonclinical statistics at Pfizer. “You can look on the SAS message boards and see there is a proportional downturn in traffic.”</p>
<p>SAS says it has noticed R’s rising popularity at universities, despite educational discounts on its own software, but it dismisses the technology as being of interest to a limited set of people working on very hard tasks.</p>
<p>“I think it addresses a niche market for high-end data analysts that want free, readily available code,&#8221; said Anne H. Milley, director of technology product marketing at SAS. She adds, “We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.”</p></blockquote>
<p>sounds a little defensive, eh?</p>
<p>for me, one of the biggest benefits of using R is the really nice looking, professional quality, complex <a href="http://addictedtor.free.fr/graphiques/">graphs and charts R produces</a> &#8212; they kick the ass of anything Excel or SPSS can generate. and, there are times when R reminds me of <a href="http://en.wikipedia.org/wiki/Lisp_programming_language">Lisp</a>, which makes me happy. using R confidently also requires that one be more knowledgeable about the statistics one is performing (and why) than the point-and-click stats packages do &#8212; and i believe this is a good thing! i feel like i&#8217;ve become a much better researcher and user of statistics because of R, and i&#8217;ve been able to do data munging and analysis that i think would be difficult-to-impossible in SPSS.</p>
<p>however, the R documentation can be a bit maddening at times. either it is too terse, or doesn&#8217;t quite cover your exact situation&#8230; but R is open source after all, so my expectations are lower than they might be if i had actually paid for the software. a little persistence usually does the trick.</p>
<p>if you&#8217;re interested in learning more about R, you can find the software at <a href="http://www.r-project.org/">r-project.org</a>, and my favorite introduction to doing basic stuff with R is hosted by the <a href="http://www.ats.ucla.edu/stat/r/notes/">UCLA Statistical Computing service</a>.</p>
<p>UPDATE: the NY Times article hit the &#8220;most emailed&#8221; feed finally, only after it was <a href="http://developers.slashdot.org/article.pl?sid=09/01/07/2316227">posted to Slashdot</a> yesterday evening.</p>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2009/01/07/r-in-the-ny-times/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>lattice graphics in R</title>
		<link>http://bierdoctor.com/2008/06/08/lattice-graphics-in-r/</link>
		<comments>http://bierdoctor.com/2008/06/08/lattice-graphics-in-r/#comments</comments>
		<pubDate>Mon, 09 Jun 2008 02:15:36 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[software tools]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://madmission.bierdoctor.com/2008/06/08/lattice-graphics-in-r/</guid>
		<description><![CDATA[if you&#8217;ve decided to switch over to doing stats in R because the graphs R generates look so nice, you&#8217;ve probably experienced frustration with the lattice graphics package. it is highly customizable, which also means it is crazy complicated. and, the documentation doesn&#8217;t seem to be written for non-expert users. while searching for documentation on [...]]]></description>
			<content:encoded><![CDATA[<p>if you&#8217;ve decided to switch over to doing stats in R because the graphs R generates look so nice, you&#8217;ve probably experienced frustration with the lattice graphics package. it is highly customizable, which also means it is crazy complicated. and, the documentation doesn&#8217;t seem to be written for non-expert users. while searching for documentation on the densityplot function earlier today, i came across <a href="http://osiris.sunderland.ac.uk/~cs0her/Statistics/UsingLatticeGraphicsInR.htm">this webpage</a>, which provides a fairly straightforward overview of the lattice package, and some nice examples of the types of graphs it is able to produce.</p>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2008/06/08/lattice-graphics-in-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>visualization</title>
		<link>http://bierdoctor.com/2008/03/20/visualization/</link>
		<comments>http://bierdoctor.com/2008/03/20/visualization/#comments</comments>
		<pubDate>Thu, 20 Mar 2008 06:35:52 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[software tools]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://madmission.bierdoctor.com/2008/03/20/visualization/</guid>
		<description><![CDATA[i&#8217;ve wanted to create some visualizations of event log data from my ctools project for quite a while now, but every time i thought about getting started, the idea of learning a new programming language or toolkit seemed like too much work for too little payoff. but when i decided to scale back my aspirations [...]]]></description>
			<content:encoded><![CDATA[<p>i&#8217;ve wanted to create some visualizations of event log data from my ctools project for quite a while now, but every time i thought about getting started, the idea of learning a new programming language or toolkit seemed like too much work for too little payoff. but when i decided to scale back my aspirations &#8212; i.e., admit to myself that i&#8217;m not the fastest programmer so producing a fully interactive looks-like-art visualization is beyond the scope of my dissertation project &#8212; i figured out an approach that just might turn out to be useful.</p>
<p>my objective is to make some pictures that will help me see patterns in the data, and help me communicate those patterns to others in presentations and papers. i&#8217;m NOT trying to create standalone interactive visualization applets to host on the web and allow others to explore the data.</p>
<p>what i really need is a way to produce some static visualizations that allow me to explore the data in more customizable ways than the usual graphing functions in statistical software allow. even the types of visualizations in <a href="http://services.alphaworks.ibm.com/manyeyes/page/Visualization_Options.html">manyeyes</a> are too restrictive (plus i can&#8217;t exactly just up and put all this data online). for example, i&#8217;d like to create a series of images that show how a site&#8217;s file and folder hierarchy grows and changes over time, or perhaps illustrating which users access which files on a site most frequently, or where specific files are located in the hierarchy at different points in time.</p>
<p>i looked into a couple of options before settling on my approach: using <a href="http://www.python.org/">Python</a> to connect to MySQL and obtain query results, munge them appropriately, and then generate static <a href="http://www.w3.org/Graphics/SVG/">SVG (scalable vector graphics)</a> images. one nice thing about this approach is that i&#8217;ve been wanting to play around with Python for a while (and there&#8217;s a lot of documentation on the web), and SVG images can be opened and edited in Adobe Illustrator to prep for use in presentations.</p>
<p>some other approaches i considered:</p>
<p>- <a href="http://processing.org/">Processing</a>, which i ultimately ruled out because i don&#8217;t really need interactivity, and just spitting out some static pictures seems to require the same kinds of steps as python/svg.</p>
<p>- <a href="http://nodebox.net/code/index.php/Home">NodeBox</a>, which seems like a really cool project, but i ruled it out because it isn&#8217;t clear how much documentation is out there, and i&#8217;m not sure how comprehensive the libraries are</p>
<p>my first attempt at a SVG file generated using python&#8230; not too fancy, but so far so good! each circle represents one ctools project site in use during 2007, and the size of the circle indicates the number of active users on each site. if you&#8217;re not using Firefox, you may not be able to see the <a href="http://bierdoctor.com/images/2008/03/scatterplot10.svg">image below</a> without a SVG viewer plugin.</p>
<p><embed src="http://bierdoctor.com/images/2008/03/scatterplot10.svg"/></p>
<p>here&#8217;s one with 50 sites rather than 10 [ <a href="http://bierdoctor.com/images/svg/scatterplot50.svg">link</a> ]</p>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2008/03/20/visualization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>i have a plan!</title>
		<link>http://bierdoctor.com/2008/01/10/i-have-a-plan/</link>
		<comments>http://bierdoctor.com/2008/01/10/i-have-a-plan/#comments</comments>
		<pubDate>Thu, 10 Jan 2008 19:48:38 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[administrivia]]></category>
		<category><![CDATA[future]]></category>
		<category><![CDATA[plans]]></category>
		<category><![CDATA[software tools]]></category>

		<guid isPermaLink="false">http://madmission.bierdoctor.com/2008/01/10/i-have-a-plan/</guid>
		<description><![CDATA[back in december, i posted about how i was looking for some software to help me create a time-based plan for the next year or so of my life. what i really wanted was a freeware tool that would allow me to make a gantt chart that&#8217;s not too ugly to look at, without getting [...]]]></description>
			<content:encoded><![CDATA[<p>back in december, i <a href="http://madmission.bierdoctor.com/2007/12/13/planning-the-next-6-months/">posted</a> about how i was looking for some software to help me create a time-based plan for the next year or so of my life. what i really wanted was a freeware tool that would allow me to make a gantt chart that&#8217;s not too ugly to look at, without getting frustrated by poor usability. i already use to-do lists pretty effectively for time management; what i needed was a way to estimate how long my tasks would take, see how they line up, move them around so i&#8217;m not overloading myself, and then figure out when my deadlines and milestones should be.</p>
<p>i tried a couple of freeware options for making gantt charts, but none of them made me happy. they were written in Java and too slow to load, or required too much clicking around to create and change tasks, or didn&#8217;t support grouping and color-coding of tasks&#8230; pretty much making them painful to use. and, planning is a painful enough task without the software causing pain, too.</p>
<p>so, i decided to splurge on <a href="http://www.omnigroup.com/applications/omniplan/">OmniPlan</a>, by the same company that makes <a href="http://www.omnigroup.com/applications/omnioutliner/">OmniOutliner</a>, my to-do list software of choice. the student price was $90, which is quite a bit more than free, but not as much as other project planning software packages. i just spent the past couple of days putting together a very nice looking gantt chart:</p>
<p><a href="http://bierdoctor.com/images/2008/01/theplan_image.png"><img class="alignnone" src="http://bierdoctor.com/images/2008/01/theplan_image.png" alt="" width="550" /></a></p>
<p>i have to say, i&#8217;m pretty happy with the software. i haven&#8217;t explored many of the advanced features yet, but it seems like it has all the essential stuff down pretty well. now all i have to do is stick to it (more or less)!</p>
<p>in other news, i found some super cheap <a href="http://www.newegg.com/Product/Product.aspx?Item=N82E16820134578">RAM</a> for my MacBook Pro at newegg.com. i was all ready to install it last night, when i realized i don&#8217;t have the right kind of screwdriver after watching this &#8220;how to&#8221; <a href="http://www.youtube.com/watch?v=Qozs6KZoarA">video on youtube</a>. argh. so i&#8217;m off to find myself some teeny tiny phillips head screwdrivers.</p>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2008/01/10/i-have-a-plan/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>my foray into ruby on rails</title>
		<link>http://bierdoctor.com/2007/12/04/my-foray-into-ruby-on-rails/</link>
		<comments>http://bierdoctor.com/2007/12/04/my-foray-into-ruby-on-rails/#comments</comments>
		<pubDate>Tue, 04 Dec 2007 22:41:23 +0000</pubDate>
		<dc:creator>emilee</dc:creator>
				<category><![CDATA[books]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[rails]]></category>
		<category><![CDATA[software tools]]></category>

		<guid isPermaLink="false">http://madmission.bierdoctor.com/2007/12/04/my-foray-into-ruby-on-rails/</guid>
		<description><![CDATA[today i started working on a programming project. i&#8217;m using Ruby on Rails to create a web application for my dissertation. i had this bright idea that it would make data collection and analysis much easier if my participants did the document and folder labeling and organization online rather than using hard copies. also, i [...]]]></description>
			<content:encoded><![CDATA[<p>today i started working on a programming project. i&#8217;m using Ruby on Rails to create a web application for my dissertation. i had this bright idea that it would make data collection and analysis much easier if my participants did the document and folder labeling and organization online rather than using hard copies. also, i might be able to argue that there&#8217;s a bit more external validity to doing these tasks using a computer rather than paper printouts.</p>
<p><a href="http://bierdoctor.com/images/2007/12/rails2.jpg"><img class="alignleft" style="margin-top: 5px; margin-bottom: 5px;" src="http://bierdoctor.com/images/2007/12/rails2.jpg" alt="" width="190" height="228" /></a>it has been a really long time since i&#8217;ve done anything more than munge data with simple perl scripts, or run statistical analyses in R. so this will be an adventure! i decided to go with Rails because i want to learn something new, and because building functional web applications quickly is the whole point of the Rails framework. it is actually pretty important that i get this thing up and running quickly, since i can&#8217;t submit my IRB application until i can show the review board the interface participants will be interacting with. i&#8217;m using a book called <a href="http://pragprog.com/titles/rails2/">Agile Web Development with Rails</a> to get me up to speed.</p>
<p>so far, i&#8217;ve been able to successfully install everything i need, and get a demo project up and running. yeah!! the installation was made much easier by this very helpful <a href="http://hivelogic.com/narrative/articles/ruby-rails-mongrel-mysql-osx">web page</a>.</p>
Copyright &copy; 2010 <strong><a href="http://bierdoctor.com/">Emilee Rader</a></strong>]]></content:encoded>
			<wfw:commentRss>http://bierdoctor.com/2007/12/04/my-foray-into-ruby-on-rails/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
