Skip to contents

[Experimental] This function is a work in progress, and should be used with caution.

In raw collected data, many types of information can be captured in one column. For example, the column name LMA_g.m2 contains the measured trait (Leaf Mass per Area, LMA) and the unit of measurement (grams per meter squared, g/m2), and recorded in that column are the values themselves. In Darwin Core, these different types of information must be separated into multiple columns so that they can be ingested correctly and aggregated with sources of data accurately.

This function converts information preserved in a single measurement column into multiple columns (measurementID, measurementUnit, and measurementType) as per Darwin Core standard.

Usage

set_measurements(.df, cols = NULL, unit = NULL, type = NULL, .keep = "unused")

Arguments

.df

a data.frame or tibble that the column should be appended to.

cols

vector of column names to be included as 'measurements'. Unquoted.

unit

vector of strings giving units for each variable

type

vector of strings giving a description for each variable

.keep

Control which columns from .data are retained in the output. Note that unlike dplyr::mutate(), which defaults to "all" this defaults to "unused"; i.e. only keeps Darwin Core fields, and not those fields used to generate them.

Value

A tibble with the requested fields added.

Details

Columns are nested in a single column measurementOrFact that contains Darwin Core Standard measurement fields. By nesting three measurement columns within the measurementOrFact column, nested measurement columns can be converted to long format (one row per measurement, per occurrence) while the original data frame remains organised by one row per occurrence. Data can be unnested into long format using tidyr::unnest().

Examples

# \donttest{
library(tidyr)

# Example data of plant species observations and measurements
df <- tibble::tibble(
  Site = c("Adelaide River", "Adelaide River", "AgnesBanks"),
  Species = c("Corymbia latifolia", "Banksia aemula", "Acacia aneura"),
  Latitude = c(-13.04, -13.04, -33.60),
  Longitude = c(131.07, 131.07, 150.72),
  LMA_g.m2 = c(NA, 180.07, 159.01),
  LeafN_area_g.m2 = c(1.100, 0.913, 2.960)
)

# Reformat columns to Darwin Core Standard
# Measurement columns are reformatted and nested in column `measurementOrFact`
df_dwc <- df |>
  set_measurements(
    cols = c(LMA_g.m2,
             LeafN_area_g.m2),
    unit = c("g/m2",
             "g/m2"),
    type = c("leaf mass per area",
             "leaf nitrogen per area")
  )
#>  Adding measurement columns
#>  Adding measurement columns [21ms]
#> 
#>  Converting measurements to Darwin Core
#>  Converting measurements to Darwin Core [23ms]
#> 
#> ⠙ Checking 4 columns: measurementValue, measurementID, measurementUnit, and mea
#> ⠹ Checking 4 columns: measurementValue, measurementID, measurementUnit, and mea
#>  Checking 4 columns: measurementValue, measurementID, measurementUnit, and mea
#> 
#>  Successfully nested measurement columns in column measurementOrFact.
#>  Successfully nested measurement columns in column measurementOrFact. [110ms]
#> 

df_dwc
#> # A tibble: 3 × 5
#>   Site           Species            Latitude Longitude measurementOrFact
#>   <chr>          <chr>                 <dbl>     <dbl> <list>           
#> 1 Adelaide River Corymbia latifolia    -13.0      131. <tibble [2 × 4]> 
#> 2 Adelaide River Banksia aemula        -13.0      131. <tibble [2 × 4]> 
#> 3 AgnesBanks     Acacia aneura         -33.6      151. <tibble [2 × 4]> 

# Unnest to view full long format data frame
df_dwc |>
  tidyr::unnest(measurementOrFact)
#> # A tibble: 6 × 8
#>   Site           Species       Latitude Longitude measurementValue measurementID
#>   <chr>          <chr>            <dbl>     <dbl>            <dbl> <chr>        
#> 1 Adelaide River Corymbia lat…    -13.0      131.           NA     LMA_g.m2|1   
#> 2 Adelaide River Corymbia lat…    -13.0      131.            1.1   LeafN_area_g…
#> 3 Adelaide River Banksia aemu…    -13.0      131.          180.    LMA_g.m2|2   
#> 4 Adelaide River Banksia aemu…    -13.0      131.            0.913 LeafN_area_g…
#> 5 AgnesBanks     Acacia aneura    -33.6      151.          159.    LMA_g.m2|3   
#> 6 AgnesBanks     Acacia aneura    -33.6      151.            2.96  LeafN_area_g…
#> # ℹ 2 more variables: measurementUnit <chr>, measurementType <chr>

# }