Plotly is a flexible framework for producing interactive graphics; it has a variety of implementations, including one for R. We’ll take a look at a few common plot types, and then introduce flexdashboards as a way to collect plots (either static or interactive).

This is the first module in the Interactivity topic; the relevant slack channel is here.

Example

To gear up for this topic, we’ll create a directory, start an R Project, initialize git, and push to GitHub. For reasons we’ll see soon, we want the repo to appear as a website, so I’ll copy my template files into the directory. I’m also going to add Julia as a collaborator.

library(tidyverse)
library(janitor)
library(stringr)
library(forcats)
library(viridis)
## Loading required package: viridisLite

library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

We’re going to focus on the Airbnb data for this topic. The code below extracts what we need right now; specifically, we select only a few of the variables, filter to include a subset of the data, and down-sample for computational efficiency.

set.seed(1)

airbnb_data = read_csv("./data/nyc_airbnb.zip") %>%
  clean_names() %>%
  mutate(rating = review_scores_location / 2) %>%
  select(boro = neighbourhood_group, neighbourhood, rating, price, room_type,
         latitude, longitude) %>%
  filter(!is.na(rating), 
         boro == "Manhattan",
         room_type == "Entire home/apt",
         price %in% 100:500)  %>% 
  sample_n(5000)
## Parsed with column specification:
## cols(
##   id = col_integer(),
##   review_scores_location = col_integer(),
##   name = col_character(),
##   host_id = col_integer(),
##   host_name = col_character(),
##   neighbourhood_group = col_character(),
##   neighbourhood = col_character(),
##   latitude = col_double(),
##   longitude = col_double(),
##   room_type = col_character(),
##   price = col_integer(),
##   minimum_nights = col_integer(),
##   number_of_reviews = col_integer(),
##   last_review = col_date(format = ""),
##   reviews_per_month = col_double(),
##   calculated_host_listings_count = col_integer(),
##   availability_365 = col_integer()
## )

We’ll use this dataset as the basis for our plots.

Plotly scatterplot

There are several practical differences comparing ggplot and plot_ly, but the underlying conceptual framework is similar. We need to define a dataset, specify how variables map to plot elements, and pick a plot type.

Below we’re plotting the location (latitude and longitude) of the rentals in our dataset, and mapping price to color. We also define a new variable text_label and map that to text.

The type of plot is scatter, which has several “modes”: markers produces the same kind of plot as ggplot::geom_point, lines produces the same kind of plot as ggplot::geom_line.

airbnb_data %>%
  mutate(text_label = str_c("Price: $", price, '\nRating: ', rating)) %>% 
  plot_ly(x = ~longitude, y = ~latitude, type = "scatter", mode = "markers",
          alpha = 0.5, 
          color = ~price,
          text = ~text_label)

This can be a useful way to show the data – it gives additional information on hovering and allows you to zoom in or out, for example.

Plotly boxplot

Next up is the boxplot. We’re going to do some pre-processing here to show only the neighborhoods with the most rentals.

After we’ve done that subsetting, the process for creating the boxplot is similar to above: define the dataset, specify the mappings, pick a plot type. Here the type is box, and there aren’t modes to choose from.

common_neighborhoods =
  airbnb_data %>% 
  count(neighbourhood, sort = TRUE) %>% 
  top_n(8) %>% 
  select(neighbourhood)
## Selecting by n

inner_join(airbnb_data, common_neighborhoods,
             by = "neighbourhood") %>% 
  mutate(neighbourhood = fct_reorder(neighbourhood, price)) %>% 
  plot_ly(y = ~price, color = ~neighbourhood, type = "box",
          colors = "Set2")

Again, this can be helpful – we have a five-number summary when we hover, and by clicking we can select groups we want to include or exclude.

Plotly barchart

Lastly, we’ll make a bar chart. Plotly expects data in a specific format for bar charts, so we use count to get the number of rentals in each neighborhood (i.e. to get the bar height). Otherwise, the process should seem pretty familiar …

airbnb_data %>% 
  count(neighbourhood) %>% 
  mutate(neighbourhood = fct_reorder(neighbourhood, n)) %>% 
  plot_ly(x = ~neighbourhood, y = ~n, color = ~neighbourhood, type = "bar")
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors

Interactivity in bar charts is kinda neat, but needs a bit more justification – you can zoom, which helps in some cases, or you could build in some addition information in hover text.

ggplotly

You can convert a ggplot object straight to an interactive graphic using ggplotly.

For example, the code below recreates our scatterplot using ggplot followed by ggplotly.

scatter_ggplot = airbnb_data %>%
  ggplot(aes(x = longitude, y = latitude, color = price)) +
  geom_point(alpha = 0.25) +
  scale_color_viridis() +
  coord_cartesian() +
  theme_classic()

ggplotly(scatter_ggplot)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`

We can recreate our boxplot in a similar way.

box_ggplot = 
  inner_join(airbnb_data, common_neighborhoods,
             by = "neighbourhood") %>% 
  mutate(neighbourhood = fct_reorder(neighbourhood, price)) %>% 
  ggplot(aes(x = neighbourhood, y = price, fill = neighbourhood)) +
  geom_boxplot() +
  theme_classic() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

ggplotly(box_ggplot)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`

If I really want an interactive plot to look good, I’ll use plot_ly to build it – ggplot was designed with static plots in mind, and the formatting and behavior of ggplotly is less visually appealing (to me) than plot_ly.

I use ggplot for static plots, and I make static plots way, way more frequently than interactive plots. Sometimes I’ll use ggplotly on top of that for some quick interactivity; this can be handy to do some zooming or inspect outlying features.

flexdashboard

Clearly you can embed interactive graphics in HTML files produced by R Markdown; this is a handy time to introduce dashboards. In short, dashboards are a collection of related graphics (or tables, or other outputs) that are displayed in a structured way that’s easy to navigate.

You can create dashboards using the flexdashboard package by specifying flex_dashboard as the output format in your R Markdown YAML. There are a variety of layout options, but we’ll focus on a pretty simple structure produced by the template below (note: this is the default dashboard template in R Studio).

---
title: "Untitled"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
---

```{r setup, include=FALSE}
library(flexdashboard)
```

Column {data-width=650}
-----------------------------------------------------------------------

### Chart A

```{r}

```

Column {data-width=350}
-----------------------------------------------------------------------

### Chart B

```{r}

```

### Chart C

```{r}

```

Conveniently, this dashboard has space for three plots! We’ll populate it using the plot_ly plots above; doing so produces a graphic like the one shown below.

Dashboard layouts are controlled by specifying columns and rows, and potentially subdiving these. We specified a two-column layout with set column widths, and then divided the second column into two panels. Using tabbed browsing and multiple pages can also be really useful – check out the gallery linked below for examples!

Hosting a flexdashboard

You can share the HTML files for dashboards directly (e.g. by email); you can also host these online to make the dashboard visible to others. That process is essentially the same as for any other website you’d make.

To illustrate, we’ll put the dashboard we just created on a website for this topic.

Other materials

  • Plotly can take a while to get used to; starting with their library and reference can help. I also like the cheatsheet
  • Dashboards are pretty well-supported. Check out the overview, layout discussion, and examples
  • There are cool dashboards all over. To get a sense of how these work in the real world, check out:

The code that I produced working examples in lecture is here.