First two weeks this summer at RStudio

Just wrapped my first two weeks at RStudio, and I am totally awe-struck’d. The culture that I’ve only known of via twitter and the local chapter of R-Ladies is real. It’s not a company paying lip-service to their mission, but a community that’s actually living up to their values. It’s authentic, it’s inclusive, and it’s empowering, and (while I’m going to fail at conveying this all in one post) I’m going to try to get at least one of these qualities across.

Outside of academia, I often get a lot of “oohs” and “ahhs” for being in stats PhD program. But in all honesty, my computational skills are no different from many undergrads and from many folks that didn’t pursue traditional degrees at all. In fact, when it comes to building software, I am a complete amateur. But pictures are worth more than words, so I think this conversation with one of my mentors, Davis, sums its up:

To set the stage further, I’ve been given the responsibility of building a package that would manage the size of modeling objects in R. This entails understanding how many of these model objects (i.e., stan, survreg, glm, etc) are constructed and what components of these model objects should be kept (or eliminated). [Or, for any statistical audience that might be reading this… what would you consider a “minimally sufficient” object? Would love any thoughts anyone has on this, please DM!] The hope is that this package might also be accepted by CRAN and become part of the tidymodels package family. But, here is where I started:

  • The number of open-source contributions on github: 0;
  • The number of modeling objects I know intimately in R: 0; and
  • The number of packages submitted to CRAN: 0.

By this point, you might be thinking… Wait, how? How did you land this internship? I’m pinching myself as well :) The point is, one of the greatest assets of R and RStudio is its thriving community in which everyone is welcomed. Moreover, my mentors, Davis and Max, have given me this opportunity, and their belief is enough for my own, so let’s get started. These resources they’ve outlined have gotten me up to speed:

  • Advanced R: I was originally learning from the 1st edition, but this second edition is so, so good. To understand a lot of the modeling objects in R, I focused my first week on the Names and values, Environments, S3 and Metaprogramming chapters;
  • The Whole Game: Prior to this, building a package has always felt intimidating, but Hadley’s and Jenny’s outline here makes so accessible;
  • Design Principles: As a grad student, I’m trained to simply implement math (a.k.a. do whatever it takes to reproduce the algorithmic results from a paper), so getting into the mindset of designing functions that are extensible and optimized for the user was entirely new. Skimming through this resource helped a lot, at least to get a high-level understanding of how good open source code should behave; and
  • Style Guide: While it helps a lot to stalk the tidyverse’ developers githubs to emulate their coding style, this guide pretty much sums it up. As succinctly stated in the introduction: “Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread.” The styler and linter packages particularly useful.

After just two weeks, 80-ish hours of devouring these resources and receiving guidance from my mentors along way, this amateur me was able (1) build a package, (2) make and merge pull requests, and (3) familiarize herself with over 15 modeling objects. Anyway, piggybacking off the thoughts of Scott Fitzgerald…

For what it’s worth… it’s never too late to be whoever you want to be. There’s no time limit. Start whenever you want. You can change or stay the same. There are no rules to this thing. We can make the best or the worst of it. I hope you make the best of it. I hope you see things that startle you. I hope you feel things you’ve never felt before. I hope you meet people who have a different point of view. I hope you live a life you’re proud of, and if you’re not, I hope you have the courage to start over again.

Joyce Cahoon
PhD Student

My research interests include the foundations of statistical inference, Bayes and empirical Bayes, and their applications in high-dimensional problems.