Skip to contents

Run a test suite of checks to test whether a data.frame or tibble conforms to Darwin Core Standard.

While most users will only want to call suggest_workflow(), the underlying check functions are exported for detailed work, or for debugging. This function is useful for users experienced with Darwin Core Standard or for final dataset checks.

Usage

check_dataset(.df)

Arguments

.df

A tibble against which checks should be run

Value

Invisibly returns the input data frame, but primarily called for the side-effect of running check functions on that input.

Details

check_dataset() is modelled after devtools::test(). It runs a series of checks, then supplies a summary of passed/failed checks and error messages.

Checks run by check_dataset() are the same that would be run automatically by various set_ functions in a piped workflow. This function allows users with only minor expected updates to check their entire dataset without the need for set_ functions.

Examples

# \donttest{
df <- tibble::tibble(
  scientificName = c("Crinia Signifera", "Crinia Signifera", "Litoria peronii"),
  latitude = c(-35.27, -35.24, -35.83),
  longitude = c(149.33, 149.34, 149.34),
  eventDate = c("2010-10-14", "2010-10-14", "2010-10-14"),
  status = c("present", "present", "present")
  )

# Run a test suite of checks for Darwin Core Standard conformance
# Checks are only run on columns with names that match Darwin Core terms
df |>
  check_dataset()
#>  Testing data
#>  | E P | Column        
#> ⠙ | 0  scientificName 
#>  | 0  | scientificName  [12ms]
#> 
#> ⠙ | 0  eventDate      
#>  | 1  | eventDate       [37ms]
#> 
#> 
#> ══ Results ═════════════════════════════════════════════════════════════════════
#> 
#> [ Errors: 1 | Pass: 1 ]
#> 
#>  Checking Darwin Core compliance
#>  Data does not meet minimum Darwin Core column requirements
#>  Use `suggest_workflow()` to see more information.
#> 
#> 
#> ── Error in term ───────────────────────────────────────────────────────────────
#> 
#> eventDate must be a Date vector, not a character.
#>  Specify date format with lubridate functions e.g. `ymd()`, `mdy()`, or
#> `dmy()`.
#> 
# }