Create unique identifier columns — composite

A unique identifier is a pattern of words, letters and/or numbers that is unique to a single record within a dataset. Unique identifiers are useful because they identify individual observations, and make it possible to change, amend or delete observations over time. They also prevent accidental deletion when when more than one record contains the same information(and would otherwise be considered a duplicate).

The identifier functions in corella make it easier to generate columns with unique identifiers in a dataset. These functions can be used within set_events(), set_occurrences(), or (equivalently) dplyr::mutate().

Usage

composite_id(..., sep = "-")

sequential_id(width)

random_id()

Arguments

...: Zero or more variable names from the tibble being mutated (unquoted), and/or zero or more _id functions, separated by commas.
sep: Character used to separate field values. Defaults to "-"
width: (Integer) how many characters should the resulting string be? Defaults to one plus the order of magnitude of the largest number.

Value

An amended tibble containing a column with identifiers in the requested format.

Details

Generally speaking, it is better to use existing information from a dataset to generate identifiers. For this reason we recommend using composite_id() to aggregate existing fields, if no such composite is already present within the dataset. Composite IDs are more meaningful and stable; they are easier to check and harder to overwrite.

It is possible to call sequential_id() or random_id() within composite_id() to combine existing and new columns.

Examples

df <- tibble::tibble(
  eventDate = paste0(rep(c(2020:2024), 3), "-01-01"),
  basisOfRecord = "humanObservation",
  site = rep(c("A01", "A02", "A03"), each = 5)
  )

# Add composite ID using a random ID, site name and eventDate
df |>
  set_occurrences(
    occurrenceID = composite_id(random_id(),
                                site,
                                eventDate)
    )
#> ⠙ Checking 2 columns: basisOfRecord and occurrenceID
#> ✔ Checking 2 columns: basisOfRecord and occurrenceID [627ms]
#> 
#> # A tibble: 15 × 4
#>    eventDate  basisOfRecord    site  occurrenceID                               
#>    <chr>      <chr>            <chr> <chr>                                      
#>  1 2020-01-01 humanObservation A01   f9d5416e-0854-11f0-b0eb-7c1e526f97f1-A01-2…
#>  2 2021-01-01 humanObservation A01   f9d54182-0854-11f0-b0eb-7c1e526f97f1-A01-2…
#>  3 2022-01-01 humanObservation A01   f9d5418c-0854-11f0-b0eb-7c1e526f97f1-A01-2…
#>  4 2023-01-01 humanObservation A01   f9d5418d-0854-11f0-b0eb-7c1e526f97f1-A01-2…
#>  5 2024-01-01 humanObservation A01   f9d54196-0854-11f0-b0eb-7c1e526f97f1-A01-2…
#>  6 2020-01-01 humanObservation A02   f9d54197-0854-11f0-b0eb-7c1e526f97f1-A02-2…
#>  7 2021-01-01 humanObservation A02   f9d54198-0854-11f0-b0eb-7c1e526f97f1-A02-2…
#>  8 2022-01-01 humanObservation A02   f9d541a0-0854-11f0-b0eb-7c1e526f97f1-A02-2…
#>  9 2023-01-01 humanObservation A02   f9d541a1-0854-11f0-b0eb-7c1e526f97f1-A02-2…
#> 10 2024-01-01 humanObservation A02   f9d541aa-0854-11f0-b0eb-7c1e526f97f1-A02-2…
#> 11 2020-01-01 humanObservation A03   f9d541ab-0854-11f0-b0eb-7c1e526f97f1-A03-2…
#> 12 2021-01-01 humanObservation A03   f9d541ac-0854-11f0-b0eb-7c1e526f97f1-A03-2…
#> 13 2022-01-01 humanObservation A03   f9d541b4-0854-11f0-b0eb-7c1e526f97f1-A03-2…
#> 14 2023-01-01 humanObservation A03   f9d541b5-0854-11f0-b0eb-7c1e526f97f1-A03-2…
#> 15 2024-01-01 humanObservation A03   f9d541b6-0854-11f0-b0eb-7c1e526f97f1-A03-2…

# Add composite ID using a sequential number, site name and eventDate
df |>
  set_occurrences(
    occurrenceID = composite_id(sequential_id(),
                                site,
                                eventDate)
    )
#> ⠙ Checking 2 columns: basisOfRecord and occurrenceID
#> ⠹ Checking 2 columns: basisOfRecord and occurrenceID
#> ✔ Checking 2 columns: basisOfRecord and occurrenceID [629ms]
#> 
#> # A tibble: 15 × 4
#>    eventDate  basisOfRecord    site  occurrenceID      
#>    <chr>      <chr>            <chr> <chr>             
#>  1 2020-01-01 humanObservation A01   001-A01-2020-01-01
#>  2 2021-01-01 humanObservation A01   002-A01-2021-01-01
#>  3 2022-01-01 humanObservation A01   003-A01-2022-01-01
#>  4 2023-01-01 humanObservation A01   004-A01-2023-01-01
#>  5 2024-01-01 humanObservation A01   005-A01-2024-01-01
#>  6 2020-01-01 humanObservation A02   006-A02-2020-01-01
#>  7 2021-01-01 humanObservation A02   007-A02-2021-01-01
#>  8 2022-01-01 humanObservation A02   008-A02-2022-01-01
#>  9 2023-01-01 humanObservation A02   009-A02-2023-01-01
#> 10 2024-01-01 humanObservation A02   010-A02-2024-01-01
#> 11 2020-01-01 humanObservation A03   011-A03-2020-01-01
#> 12 2021-01-01 humanObservation A03   012-A03-2021-01-01
#> 13 2022-01-01 humanObservation A03   013-A03-2022-01-01
#> 14 2023-01-01 humanObservation A03   014-A03-2023-01-01
#> 15 2024-01-01 humanObservation A03   015-A03-2024-01-01