R + Spreadsheets
The package is somewhere between dev and beta.
— Testing and feedback would be helpful —
Overview:
Spreadsheets are a great tool for presenting 2-dimensional data in a reader-friendly format and are still the workhorse of the business community, but spreadsheets don’t have the data processing power of R, so its common for data professionals to work in both spreadsheets and R. However, working in both tools can be frustrating. It’s time expensive to read data back forth between the two, and typically, I want to do my data processing in R and then send out a series of decent-looking tables to a workbook. xlr is an effort to reduce the friction that comes from needing to work in both spreadsheets and R.
Functions:
-
xl()
– output a list of datasets as named spreadsheets in a workbook -
paste_from_xl()
– format for R your OS file/folder names or spreadsheet data that you copied into your clipboard memory -
enscript()
– produce the code to re-create an R object -
repeated()
– flag all repeated observations in a vector -
sys_open()
– request your OS to open files, folders, or URLs -
enlist()
– auto-name elements and catch lists without additional list nesting -
copy_for_xl()
– copy an R dataset to clipboard memory and format for pasting to a spreadsheet -
list_iron()
– flatten lists of embedded lists to a single-file list of named objects -
entibble()
– create tibbles with convenient default behavior for spreadsheet presentation
xl()
xl()
is the primary function of the package, and many of
the other functions are around to help it. The function is somewhat
similar to View()
from the utils
package in
that xl()
is for auto-naming and quickly viewing
datasets.
It’s easiest to see what xl()
does by passing it some
example data; try sending it a couple pre-loaded datasets, like:
xl(mtcars, iris, letters, AirPassengers, Titanic)
If you type the above code to the R console, you should get a workbook that pops open with those datasets presented in standardized, formatted spreadsheets. Once you’re done with the workbook, just close it; your spreadsheets are temporary. Close them, and they’re gone.
The video below is of xl()
creating a temporary workbook
where each data object becomes a labeled spreadsheet in a workbook.
To use xl()
, pass a dataframe or a list of dataframes.
It does not matter if you pass in lists with nested lists where your
structured data objects are buried somewhere. xl()
recursively flattens the list and auto-names your data objects to become
spreadsheets in a workbook. In fact, nested lists can be useful in the
naming of groups within spreadsheet tabs if you provide a sheet-naming
specification to xl()
like
.tabname_spec = list(name_spec = "{outer}|{inner}")
.
Try running the below examples in your console.
your_datasets <- enlist(flowers = iris, mtcars, datasets::USArrests)
xl(your_datasets)
xl(your_datasets, .tabname_spec = list(name_spec = "{outer}|{inner}"))
iris |> splitter(Species) |> xl()
iris |> splitter(Species) |> xl(your_datasets)
xl()
writes your workbooks in a temp directory and, to
keep workbooks from building up, uses later::later()
to
remove the temp savefile 5 minutes after write time. Temp directories
also delete themselves when you close R. If, however, you want to save a
workbook, specify a path or just a save name as the xl()
input parameter .path
. Specifying a savename signals to
xl()
not to mark the file for cleanup. For example,
xl(iris, .path = 'flower_data')
will save your data under
flower_data.xlsx
within your working directory.
(.path
uses here::here()
internally.)
xl()
creates tibbles of all datasets you pass in, and if
specified by the user parameter .return
, optionally returns
(always invisibly) the openxlsx
workbook object (default),
or the data objects in a list, or the workbook savepath as a string, or
all of them in a list.
paste_from_xl()
paste_from_xl()
pastes data from your clipboard memory
to a tibble object in R. You can pass spreadsheet data to R by copying
data from your spreadsheet, then type the following into your
console:
my_data <- paste_from_xl()
The function guesses whether you have field names present, but if the
function guessed wrong, you can specify with the parameter
has_fieldnames
, like below:
my_data <- paste_from_xl(has_fieldnames = TRUE)
The paste_from_xl()
key-chord:
However, a better method for pasting is to use quick keys
ctrl + alt + shift+ v
. To use this key-chord, run the
function set_xlr_key_chords()
. (To set up quick keys, you
only have to run this function once in RStudio, but you have to close
RStudio and re-open it for the new quick keys to be active.)
Then:
Copy some data from a spreadsheet.
Click over to the RStudio console or to an RStudio editor. (The console and editor have different paste_from_xl() functionality.)
Press
ctrl + alt + shift + v
all together to paste. (Just like pasting withctrl + v
, but using all 3 control keys, plus thev
.)
You should see a tibble of your spreadsheet data assigned to a
variable and the dataset echoed to the console. However, if you pressed
ctrl + alt + shift + v
, and you were over an editor window
instead of the console, you also get formatted code written into the
editor window that recreates the data object from script. The script is
a formatted version of R’s deparsed internal data representation.
The video below is an example of copying spreadsheet data and then
using key-chord ctrl + alt + shift + v
in RStudio.
Copy paths using paste_from_xl()
:
In addition to spreadsheet data, you can copy file paths with
paste_from_xl()
. Open a file navigation window in your
operating system, highlight a few files or folders, and press
ctrl + c
; then go back to RStudio, click on an editor
window, and press the key-chord ctrl + alt + shift + v
, you
get an enscripted vector of file paths pasted into your text editor.
The video below is an example of copying file paths from a file
window and then using the key-chord ctrl + alt + shift + v
in RStudio.
enscript()
I often manually process data down to particular information that’s too small to deal with saving and retrieving from a serialized file or database. Instead, I just want to save my processed dataset as text within the script, but I hate manually recreating the object.
If you have an object in a variable, and want that variable’s data to
be deparsed into a formatted script, pass that variable into
enscript()
, as in:
enscript()
copies formatted code to your clipboard that
will reproduce that variable’s internal R object.
Below is an example video of enscript()
:
Try running something like:
You can pass in an expression or a single variable by piping ‘enscript()’ to the end of anything, or by using the quick keys. Enscript(), and its key-chord, is extremely useful for creating unit tests for package test files.
The enscript()
key-chord:
It’s convenient to use quick keys with enscript()
, which
are ctrl + alt + shift + n
. (I remember the key-chord as
all three control keys, plus the n
, which sounds short for
‘enscript’.)
To use the enscript()
quick keys, Run the quick-key
setup with set_xlr_key_chords()
if you haven’t already.
Then:
Type a variable name like
mtcars
in your RStudio editor window (not in the console).Use your cursor to highlight the word
mtcars
.Press
ctrl + alt + shift + n
all together.
You should see the success message,
✔ Script copied to clipboard:
, written to your console with
the deparsed object echoed below it. You can now paste a formatted
script into an editor window or the console with
ctrl + v
.
Note that the enscript()
quick keys only work for a
highlighted variable or highlighted expression in the RStudio editor
window; the quick keys do not interpret anything highlighted in the
console.
repeated()
repeated()
helps filter a dataframe for identical
observations. It should probably be in dplyr
, but it’s in
xlr
because it’s not in dplyr
. I use it to
understand duplication errors. Use it like this:
mtcars |> dplyr::filter(repeated(disp))
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#> Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#> Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#> Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
repeated()
is just x %in% x[duplicated(x)]
with assertion checks, but way easier to remember than
x %in% x[duplicated(x)]
. Use it in and
and
or
statements like this:
# AND statements
mtcars |> dplyr::filter(repeated(hp), repeated(disp))
# OR statements
mtcars |> dplyr::filter(repeated(hp) | repeated(disp))
sys_open()
sys_open()
opens the .xlsx files, but also opens any
file, folder, or URL since this function requests the operating system
to open the provided input locations via your operating system’s
command-line interface. This function passes a path string (or many, if
you like) to your OS, asking the OS to open up the location using its
default application. The reason I use it instead of other functions is
because I’ve had better luck getting files to open when requesting that
the operating system make the decisions about how to open up the
file.
Try running sys_open()
without any inputs. It should pop
open a navigation pane in your working directory.
sys_open()
If you aren’t able to open a file with sys_open()
, then,
using your operating system’s navigation window, right-click on the file
to make sure your operating system has a default application assigned
that file type; after that, double-click on the file to make sure it
opens in your specified default application.
The video below is an example of opening up some PDF files by piping
them from paste_from_xl()
to sys_open()
.
enlist()
enlist()
is for quick list-naming. This function
produces named lists, without list-nesting when passed a single bare
list. That is, unlike list()
, enlist()
removes
its own embedded list layer if the user passes in a single list.
For instance, compare the structure and naming created by enlist:
To the structure and naming created by list:
list(list(list('hello'))) |> str()
#> List of 1
#> $ :List of 1
#> ..$ :List of 1
#> .. ..$ : chr "hello"
The enlist()
example is one list deep, whereas the
`list() three lists deep. However, if you send two list objects into
enlist, as shown in the example below, enlist has no choice but to wrap
them in a new list.
enlist(list(letters), list(letters)) |> str()
#> List of 2
#> $ list(letters):List of 1
#> ..$ : chr [1:26] "a" "b" "c" "d" ...
#> $ list(letters):List of 1
#> ..$ : chr [1:26] "a" "b" "c" "d" ...
A few additional notes about enlist()
:
Naming is permissive, you can have duplicate names in a list, so be careful about indexing by name since R returns only the first equivalently named observation from the list.
enlist()
has input parameter.label
, that takes a naming function. Internally,.label
is passed tonm
inrlang::set_names()
and so takes the same inputs. For example:enlist('hi','hello', .label = ~paste0(.,'_',1:2))
.enlist()
supports non-standard evaluation, e.g.enlist(!!!letters)
.Ignores empty inputs, e.g.
enlist(,,'hi',,)
.
Some clipboard prep notes:
The next 3 xlr
functions, use the clipboard memory. To
use the clipboard, install the clipr
package and then check
that the package works by running clipr::dr_clipr().
Also,
if using Linux, like me, make sure to install a clipboard tool via your
terminal, e.g. sudo apt-get install xclip
.
copy_for_xl()
copy_for_xl()
copies data to your clipboard memory
that’s formatted for pasting to cells in a spreadsheet. You can pass a
data object to it, like:
copy_for_xl(mtcars)
The copy_for_xl()
key-chord:
However, a better copying method is to use the quick keys
ctrl + alt + shift + c
. If you haven’t already done so, run
the function, set_xlr_key_chords()
to use these quick
keys.
Then:
Type
mtcars
in an RStudio text editor window (not in the console window)Use your cursor to highlight the word
mtcars
, andPress
ctrl + alt + shift + c
all together. (It’s just like copying withctrl + c
, but instead use all 3 control keys, plus thec
.)
You should see ✔ Table copied to clipboard:
written in
the console. Open a spreadsheet and paste the copied mtcars
data to it.
The below video example uses copy_for_xl()
, via the
console, to copy some starwars data.
list_iron()
list_iron()
takes any nested list of lists and irons the
input out to get a single-file list of objects.
Because an ‘.xlsx’ workbook can only make spreadsheets out of a
simple, flat list of data objects, list_iron()
recursively
smooshes any additional list structure out of the input.
list_iron()
also has a few convenience features for naming
the list objects and applying a function to each. For example, consider
the ridiculously embedded list structure below and list_iron’s
simplification of it:
ridiculous <- list(list(car_data = mtcars, list(list(flower_data = iris))))
# Original list depth
ridiculous |> str()
#> List of 1
#> $ :List of 2
#> ..$ car_data:'data.frame': 32 obs. of 11 variables:
#> .. ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#> .. ..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
#> .. ..$ disp: num [1:32] 160 160 108 258 360 ...
#> .. ..$ hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
#> .. ..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
#> .. ..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
#> .. ..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
#> .. ..$ vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
#> .. ..$ am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
#> .. ..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
#> .. ..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
#> ..$ :List of 1
#> .. ..$ :List of 1
#> .. .. ..$ flower_data:'data.frame': 150 obs. of 5 variables:
#> .. .. .. ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#> .. .. .. ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#> .. .. .. ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#> .. .. .. ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#> .. .. .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
# Ironed list depth
ridiculous |> list_iron() |> str()
#> List of 2
#> $ ridiculous|car_data :'data.frame': 32 obs. of 11 variables:
#> ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#> ..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
#> ..$ disp: num [1:32] 160 160 108 258 360 ...
#> ..$ hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
#> ..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
#> ..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
#> ..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
#> ..$ vs : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
#> ..$ am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
#> ..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
#> ..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
#> $ ridiculous|flower_data:'data.frame': 150 obs. of 5 variables:
#> ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#> ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#> ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#> ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#> ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
Notice the auto-naming of the list items above, i.e.
“ridiculous|car_data”, “ridiculous|flower_data”.
Control the naming with name_spec
and
name_repair
. You can also apply a function, with parameter
.f
, while flattening out all of the leaves in the lists ,
as shown below:
list_iron(ridiculous, .f = ~tail(.,2))
#> $`ridiculous|car_data`
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Maserati Bora 15.0 8 301 335 3.54 3.57 14.6 0 1 5 8
#> Volvo 142E 21.4 4 121 109 4.11 2.78 18.6 1 1 4 2
#>
#> $`ridiculous|flower_data`
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 149 6.2 3.4 5.4 2.3 virginica
#> 150 5.9 3.0 5.1 1.8 virginica
entibble()
entibble()
makes a tibble with defaults useful for
spreadsheets. Defaults include:
- automatically presenting rownames, if they exist, as the first column of the tibble
- not repeating input elements to obtain common lengths when producing a dataframe
- treating a list of data inputs equivalent to comma-separated data inputs
For instance, compare the tibbled and entibbled data below.
alpha <- rlang::set_names(letters, LETTERS)
# rownames are forced in so they show up in spreadsheets
entibble(alpha) |> head(2)
#> # A tibble: 2 × 2
#> rowname alpha
#> <chr> <chr>
#> 1 A a
#> 2 B b
tibble::tibble(alpha) |> head(2)
#> # A tibble: 2 × 1
#> alpha
#> <chr>
#> 1 a
#> 2 b
# lists passed to entibble are dumped and their objects joined like comma-separated
# inputs if the listed objects have compatible row dimensions
entibble(list(alpha, letters)) |> tail(2)
#> # A tibble: 2 × 3
#> rowname `1` `2`
#> <chr> <chr> <chr>
#> 1 Y y y
#> 2 Z z z
# compare the above to `tibble()`
tibble::tibble(list(alpha, letters)) |> tail(2)
#> # A tibble: 2 × 1
#> `list(alpha, letters)`
#> <list>
#> 1 <chr [26]>
#> 2 <chr [26]>