If you write more than two functions, you need a package – this will remind you what functions do and how they interact with each other, help you keep track of inputs and outputs, and, if you want to share you code, allow you to do so in a standard format. The first part of this module covered getting to a complete package from scratch; this module covers some important but more advanced issues in R package development.
This is the second module in the Writing R Packages topic; the relevant slack channel is here.
For today’s example, I’ll continue working on example.package
, the R package we started in writing R packages I.
You can find the path to your package library using .libPath()
– opening this directory on your computer will show you what you’ve installed.
Before jumping into new pacakge development stuff, we’re going to take a closer look at R’s search path. You can see your current search path at any time using search()
.
search()
## [1] ".GlobalEnv" "package:stats" "package:graphics"
## [4] "package:grDevices" "package:utils" "package:datasets"
## [7] "package:methods" "Autoloads" "package:base"
There’s not much here yet, since we haven’t loaded anything – mostly we have default packages and the global environment.
When you call a function, R has to find it. We’ve often made the location of a function explicit using package::function()
which tells R specifically where to look but doesn’t affect the search path.
iris = janitor::clean_names(iris)
search()
## [1] ".GlobalEnv" "package:stats" "package:graphics"
## [4] "package:grDevices" "package:utils" "package:datasets"
## [7] "package:methods" "Autoloads" "package:base"
The iris
dataset is included in the datasets
package, which is in the search path. We can also use the clean_names()
function since we’ve been very clear about where R should find it. We didn’t do anything to the search path, though!
If you don’t specify the package for a specific function, R will look for it in the global environment and then in attached packages – that is, in the search path. The library()
function attaches a package to the search path, including it in the collection of packages R searches when trying to find a function. For example, to call clean_names()
without specifying the package, we can use library(janitor)
to attach the package to the search path.
library(janitor)
search()
## [1] ".GlobalEnv" "package:janitor" "package:stats"
## [4] "package:graphics" "package:grDevices" "package:utils"
## [7] "package:datasets" "package:methods" "Autoloads"
## [10] "package:base"
iris = clean_names(iris)
Why not just attach everything? In part, at least, to avoid naming conflicts. Both MASS
and dplyr
have functions called select()
, for example, and they do really different things. If you load both packages, which version you get depends on the order in which they’re loaded.
To use tidyverse::select
, we load that package second.
library(MASS)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:MASS':
##
## select
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
iris %>%
as_tibble() %>%
select(sepal_length)
## # A tibble: 150 x 1
## sepal_length
## <dbl>
## 1 5.1
## 2 4.9
## 3 4.7
## 4 4.6
## 5 5
## 6 5.4
## 7 4.6
## 8 5
## 9 4.4
## 10 4.9
## # ... with 140 more rows
Note the warning that dplyr::select()
masks MASS::select()
– these warnings are easy to overlook but are really important!
I’ll detach both packages, then reverse the order in which I attach them and try again.
detach("package:dplyr", unload = TRUE)
## Warning: 'dplyr' namespace cannot be unloaded:
## namespace 'dplyr' is imported by 'janitor' so cannot be unloaded
detach("package:MASS", unload = TRUE)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
iris %>%
as_tibble() %>%
select(sepal_length)
## Error in select(., sepal_length): unused argument (sepal_length)
iris %>%
as_tibble() %>%
dplyr::select(sepal_length)
## # A tibble: 150 x 1
## sepal_length
## <dbl>
## 1 5.1
## 2 4.9
## 3 4.7
## 4 4.6
## 5 5
## 6 5.4
## 7 4.6
## 8 5
## 9 4.4
## 10 4.9
## # ... with 140 more rows
The command that just uses select
produces an error, because it’s using (implicitly) MASS::select()
; the second is clear about using dplyr::select
and works as desired.
As you work more in R you will run into search path issues (if you haven’t already), and understanding how attaching packages affects the search path will help you resolve this. This discussion also ilustrates why it’s best to only attach the packages you need, and to use package::function()
notation in cases where a package isn’t used repeatedly.
NAMESPACE
The search path discussion is particularly relevant in the context of writing your own packages. In particular, the NAMESPACE
file determines search path associated with your package. The NAMESPACE
file for example.package
is shown below.
# Generated by roxygen2: do not edit by hand
export(sim_bern_mean)
export(sim_regression)
export(simulate_nrep)
import(dplyr)
import(purrr)
importFrom(magrittr,"%>%")
importFrom(tibble,tibble)
We used @import dplyr
and @import purrr
in our roxygen comments, which adds the statements import(dplyr)
and import(purrr)
to the NAMESPACE
. As a result, code in our package will include dplyr
and purrr
when looking for functions.
We also used @importFrom tibble tibble
and @importFrom magrittr "%>%"
in our roxygen comments, which adds the statements importFrom(tibble,tibble)
and importFrom(magrittr,"%>%")
to the NAMESPACE
. As a result, code in our package will include these specific functions when executing code.
There’s an important but confusing distinction between import directives in the NAMESPACE
and the Import
field in the DESCRIPTION
(shown below). Although they share a name, these mean different things: roughly, in the NAMESPACE
we’re listing packages that need to be included in the search path, while in the DESCRIPTION
we’re listing packages that have to be installed for our package to work.
Package: example.package
Title: Simuates Data and Summarizes
Version: 0.1.0
Authors@R: person("Jeff", "Goldsmith", email = "ajg2202@cumc.columbia.edu", role = c("aut", "cre"))
Description: What the package does (one paragraph).
Depends: R (>= 3.4.1)
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.0.1
Imports: broom,
dplyr,
magrittr,
purrr,
tibble,
To illustrate this distinction, recall that we used broom::tidy()
in our functions rather than including @import broom
in the roxygen comments. This makes it very clear where tidy()
comes from, and means we don’t need broom
in the search path; thus, these don’t appear in the NAMESPACE
. We do still need the packages though, so they’re listed as a dependency in the DESCRIPTION
.
The NAMESPACE
and roxygen comments also include exports, which identify functions that are visible when your package is attached. In bigger, more complex packages you may have functions you don’t want users to have access to; for those, remove @export
from the roxygen comments.
Checking yor package for common issues – things like the presence of all needed files, the completeness of documentation, whether the code and examples run – is critical to making sure your package is complete and self-contained. You can perform these checks using devtools::check()
or a button in RStudio. This is going to be frustrating, at least until you start to recognize that this is a helpful process. The checks really get into the corners of your package and find things you wouldn’t expect. The messages take some practice to understand. Correcting issues will force you to complete all your documentation.
You don’t have to do this for packages written for yourself, although I do recommend it. You do have to do this for packages that go on CRAN, which is part of the reason that CRAN packages are a bit more trustworthy. Many packages on GitHub has passed checks; look for a happy build | passing
sticker at the top of the README!
Below is the (redacted) output of checking example.package
.
Updating example.package documentation
Loading example.package
Warning: @examples [sim_bern_mean.R#12]: requires a value
Warning: @param [simulate_nrep.R#4]: requires name and description
Warning: @examples [simulate_nrep.R#13]: requires a value
...
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
...
* checking R code for possible problems ... NOTE
sim_bern_mean: no visible global function definition for ‘rbinom’
sim_regression: no visible global function definition for ‘rnorm’
sim_regression: no visible binding for global variable ‘x’
...
* checking examples ... ERROR
Running examples in ‘example.package-Ex.R’ failed
The error most likely occurred in:
> base::assign(".ptime", proc.time(), pos = "CheckExEnv")
> ### Name: sim_regression
> ### Title: Simulate from an SLR
> ### Aliases: sim_regression
>
> ### ** Examples
>
> # simulate a dataset and return estimates
> sim_regression(30, 2, 3)
Error in lm(y ~ x, data = sim_data) %>% broom::tidy() :
could not find function "%>%"
Calls: sim_regression
Execution halted
* DONE
Status: 1 ERROR, 1 WARNING, 2 NOTEs
...
We did pretty well! There are some warnings about our documentation (incomplete @param
and @example
roxygen comments), a note about being clear about where some functions come from, and an error in one of our examples (which needs %>%
to run, but doesn’t load the necessary package). This gives a sense of the kind of issues that checking your package will turn up.
Checking a package is a process that doesn’t vary from one package to the next because this process isn’t concerned with whether a function or package works as intended – the built-in checks are for things like documentation, namespace, installation, etc.
Testing, in contrast, is package specific because it is concerned with whether functions work as intended. This is important for at least two reasons:
Informal testing is common during development – you run your functions on code snippets and make sure they give the results you expect. Formal testing makes this process more rigorous and saving the informal tests and running them each time you check your package. Like other best practices for development, this takes some time to get used to but guards against future trouble and improves your software.
The testthat
package does as much as possible to facilitate formal testing. To set this up for your package, run devtools::use_testthat()
. Doing so will create a directory /tests/testthat/
to hold tests and a file /tests/testthat.R
to run tests. It’s your job to write the tests!
The file tests/testthat/test_sim_bern.R
is shown below (note: test files have to start with test
and be in the right directory):
context("sim_bern_mean")
test_that("simualtion returns a 2x1 dataframe", {
sim_output = sim_bern_mean(30, .5)
expect_is(output, "tbl_df")
expect_equal(ncol(output), 2)
expect_equal(nrow(output), 1)
})
test_that("simulation gives anticipated results", {
set.seed(1)
output = sim_bern_mean(30, .5)
set.seed(1)
sample = rbinom(30, 1, .5)
expect_equal(output$n, length(sample))
expect_equal(output$samp_avg, mean(sample))
})
These are pretty contrived tests, but give you an idea of how testing in general might work. Use devtools::test()
to run your tests (these will also run when you check your package); output for my tests is shown below.
Loading example.package
Testing example.package
sim_bern_mean: .....
This output contains .
for each passed test in each context, and will indicate when a test is failed.
Help pages for functions are great, but assume users know roughly how a package works and only need a reminder about some specifics. To give a more general introduction to a package – what functions do, how they interact, and why you wrote it – you need the longer-form documentation found in package vignettes. Fortunately, these can be written using R Markdown
To build the infrastructure needed to include a vignette in example.package
, I’ll run the lines below.
devtools::use_vignette("sim_vignette")
This makes several changes in the package directory.
knitr
and rmarkdown
to Suggests
in DESCRIPTION
; these packages aren’t dependencies, but will be needed for someone else to knit your vignette./vignettes/sim_vignette.Rmd
, with template vignette content./inst/doc
to .gitignore
.You’ll need to edit /vignettes/sim_vignette.Rmd
. There are some things you have to do:
Then edit the rest of the R Markdown document to give an overview of the package. This often consists of organizing some of the code you’ve used elsewhere – either in the examples or in the code you have that uses the package.
The vignette I wrote for example.package
can be downloaded here.
Disseminating your vignette gets complicated, unfortunately – the knitted RMD is in /inst/doc/
, which git is ignoring. Building a package (going from source to bundle) using devtools::build()
will compile the vignette and include it in the bundle, so packages installed from a bundle or binary will have vignettes available. That means you can check out vignettes for packages you’ve installed from CRAN; see what’s available with browseVignettes()
, or go straight to a vignette using vignette("dplyr")
.
Installing from github first builds the package bundle and then installs that; by default, this won’t knit vignettes in case they are time consuming or complex. You can force the inclusion of a vignette using devtools::install_github(build_vignettes = TRUE)
.
For packages I put on GH, I usually include a code chunk like the one below in my README to let users know how to include and access the vignette.
devtools::install_github("jeff-goldsmith/example.package",
build_vignettes = TRUE)
# vignette("sim_vignette")
Many of these topics are touched on in the other materials for writing R packages I; below we reiterate some of those and add some new resources.
usethis
package should automate a lot of the package writing process, although I haven’t used it myselfThe code that I produced working examples in lecture is here.