A Summer of RStudio and ggplot2

For those of you wondering why I haven’t been tweeting and/or blogging about mud and lakes all summer, it’s because I had the incredible opportunity to spend the summer as an RStudio intern working with Hadley Wickham on ggplot2! It was a welcome change of pace from writing articles about mud in lakes, and I’m sad the internship is coming to a close. I had the opportunity to work alongside a lot of great interns at a fantastic company, prepare tons of issues for tidy-dev-day at UseR!

Purposeful Issue Organizing

One of my ongoing tasks this summer was to organize the open issues in ggplot2. Every issue was opened by a user who thought ggplot2 should do something different than it currently was doing, and remained open because there was no consensus about how (or if) the current behavior should change. Before this internship I had always been the user, and had always been a little frustrated at the reluctance to change anything.

Summarising SQL Translation for multiple dbplyr backends

Inspired by @gshotwell, I decided to have a look into bulk translating a ton of functions to SQL. The dplyr system to translate R code to SQL is really cool, but I’ve had some trouble in the past using it to write backend-agnostic code because of slightly different implementations of functions in different database backends. Is there a reference document somewhere of which dplyr commands work on various database backends? #rstats

Visualizing Canadian Climate Normals

I’m an avid Twitter follower of Simon Kuestenmacher (@simongerman600), who is a prolific tweeter of maps (all sorts). The other day I saw this tweet, which links to a reddit thread that used the PRISM dataset to make an animated map of precipitation in the US. A few weeks ago I had a colleague email me asking for the Canadian climate normals raw data (which can be found here), and having made an animated map of Earth’s paleogeography, I decided to give it a go for Canada.

Public Data Dive: 2018 Boeing 737 MAX flights

The recent grounding of almost all Boeing 737 MAX-series aircraft in the world is, according to a recent CBC commentator, unprecedented. I’m not an aircraft expert (or even a hobbyist), but I do love data and mining publicly-available datasets. Inspired by the nycflights13 R package (a dataset of all the flights in and out of New York City in 2013) and the FlightRadar24 blog post regarding Lion Air flight JT610, I thought I would see what information is accessible to the public about flights that used the 737 MAX-series aircraft.

Pourbaix-ish diagrams using PHREEQC and R

A side project of mine recently has been to play with PHREEQC, which is a powerful geochemical modelling platform put out by the USGS. In order to make the R package for phreeqc more accessible, I’ve started to wrap a few common uses of PHREEQC in a new R package, tidyphreeqc. In particular, I’m interested in using PHREEQC to take a look at the classic Pourbaix diagram, which is almost always represented in pure solution at a particular concentration of the target element, at 25°C.

Stratigraphic diagrams with tidypaleo & ggplot2

This post covers creating stratigraphic diagrams using ggplot2, highlighting the helpers contained within the tidypaleo package, which I’ve been using for the past few months to create diagrams. I chose the ggplot2 framework because it is quite flexible and can be used to create almost any time-stratigraphic diagram except ones that involve multiple axes (we can have a fight about whether or not those are appropriate anyway, but if you absolutely need to create them I suggest you look elsewhere).

The Circumpolar Diatom Database using R, the tidyverse, and mudata2

It is an exciting time for the integration of limnological and paleolimnological datasets. The National (US) Water Quality Monitoring Council Water Quality Portal has just made decades of state and federal water quality measurements available, the Pages2k project has collected hundreds of temperature proxy records for the last 2000 (ish) years, and the Neotoma database provides access to a large number of paleoecological datasets. For a final project in a course last fall, I chose to analyze the Circumpolar Diatom Database (CDD), which is a collection of water chemistry and diatom assemblage data hosted by the Aquatic Paleoecology Laboratory at ULaval.

Modifying facet scales in ggplot2

There is a very old issue in ggplot2 about the ability to modify particular scales when using facet_wrap() or facet_grid(). I often have this problem when using lots of facets, as sometimes the labels overlap with eachother on some of the scales. Without a way to set the breaks on one particular scale, it’s hard to fix this without exporting an SVG and modifying the result (it’s usually possible to fix it by specifying an overall set of breaks, or by rotating the x labels using theme(axis.

Verbifying nouns and using the pipe in ggplot2

There is a lot of talk about the ggplot2 package and the pipe. Should it be used? Some approaches, like the ggpipe package, replace many ggplot2 functions, adding the plot as the first argument so they can be used with the pipe. This ignores the fact that ggplot2 functions construct objects that can (and should) be re-used. Verbifying these noun functions to perform the task of creating the object and updating the plot object is one approach, and recently I wrote an experimental R package that implements it in just under 50 lines of code.