A unique identifier is a pattern of words, letters and/or numbers that is unique to a single record within a dataset. Unique identifiers are useful because they identify individual observations, and make it possible to change, amend or delete observations over time. They also prevent accidental deletion when when more than one record contains the same information(and would otherwise be considered a duplicate).
The identifier functions in corella make it easier to
generate columns with unique identifiers in a dataset. These functions can
be used within set_events()
, set_occurrences()
, or (equivalently)
dplyr::mutate()
.
Arguments
- ...
Zero or more variable names from the tibble being mutated (unquoted), and/or zero or more
_id
functions, separated by commas.- sep
Character used to separate field values. Defaults to
"-"
- width
(Integer) how many characters should the resulting string be? Defaults to one plus the order of magnitude of the largest number.
Details
Generally speaking, it is better to use existing
information from a dataset to generate identifiers. For this reason we
recommend using composite_id()
to aggregate existing fields, if no
such composite is already present within the dataset. Composite IDs are
more meaningful and stable; they are easier to check and harder to overwrite.
It is possible to call
sequential_id()
or random_id()
within
composite_id()
to combine existing and new columns.
Examples
df <- tibble::tibble(
eventDate = paste0(rep(c(2020:2024), 3), "-01-01"),
basisOfRecord = "humanObservation",
site = rep(c("A01", "A02", "A03"), each = 5)
)
# Add composite ID using a random ID, site name and eventDate
df |>
set_occurrences(
occurrenceID = composite_id(random_id(),
site,
eventDate)
)
#> ⠙ Checking 2 columns: basisOfRecord and occurrenceID
#> ✔ Checking 2 columns: basisOfRecord and occurrenceID [627ms]
#>
#> # A tibble: 15 × 4
#> eventDate basisOfRecord site occurrenceID
#> <chr> <chr> <chr> <chr>
#> 1 2020-01-01 humanObservation A01 f9d5416e-0854-11f0-b0eb-7c1e526f97f1-A01-2…
#> 2 2021-01-01 humanObservation A01 f9d54182-0854-11f0-b0eb-7c1e526f97f1-A01-2…
#> 3 2022-01-01 humanObservation A01 f9d5418c-0854-11f0-b0eb-7c1e526f97f1-A01-2…
#> 4 2023-01-01 humanObservation A01 f9d5418d-0854-11f0-b0eb-7c1e526f97f1-A01-2…
#> 5 2024-01-01 humanObservation A01 f9d54196-0854-11f0-b0eb-7c1e526f97f1-A01-2…
#> 6 2020-01-01 humanObservation A02 f9d54197-0854-11f0-b0eb-7c1e526f97f1-A02-2…
#> 7 2021-01-01 humanObservation A02 f9d54198-0854-11f0-b0eb-7c1e526f97f1-A02-2…
#> 8 2022-01-01 humanObservation A02 f9d541a0-0854-11f0-b0eb-7c1e526f97f1-A02-2…
#> 9 2023-01-01 humanObservation A02 f9d541a1-0854-11f0-b0eb-7c1e526f97f1-A02-2…
#> 10 2024-01-01 humanObservation A02 f9d541aa-0854-11f0-b0eb-7c1e526f97f1-A02-2…
#> 11 2020-01-01 humanObservation A03 f9d541ab-0854-11f0-b0eb-7c1e526f97f1-A03-2…
#> 12 2021-01-01 humanObservation A03 f9d541ac-0854-11f0-b0eb-7c1e526f97f1-A03-2…
#> 13 2022-01-01 humanObservation A03 f9d541b4-0854-11f0-b0eb-7c1e526f97f1-A03-2…
#> 14 2023-01-01 humanObservation A03 f9d541b5-0854-11f0-b0eb-7c1e526f97f1-A03-2…
#> 15 2024-01-01 humanObservation A03 f9d541b6-0854-11f0-b0eb-7c1e526f97f1-A03-2…
# Add composite ID using a sequential number, site name and eventDate
df |>
set_occurrences(
occurrenceID = composite_id(sequential_id(),
site,
eventDate)
)
#> ⠙ Checking 2 columns: basisOfRecord and occurrenceID
#> ⠹ Checking 2 columns: basisOfRecord and occurrenceID
#> ✔ Checking 2 columns: basisOfRecord and occurrenceID [629ms]
#>
#> # A tibble: 15 × 4
#> eventDate basisOfRecord site occurrenceID
#> <chr> <chr> <chr> <chr>
#> 1 2020-01-01 humanObservation A01 001-A01-2020-01-01
#> 2 2021-01-01 humanObservation A01 002-A01-2021-01-01
#> 3 2022-01-01 humanObservation A01 003-A01-2022-01-01
#> 4 2023-01-01 humanObservation A01 004-A01-2023-01-01
#> 5 2024-01-01 humanObservation A01 005-A01-2024-01-01
#> 6 2020-01-01 humanObservation A02 006-A02-2020-01-01
#> 7 2021-01-01 humanObservation A02 007-A02-2021-01-01
#> 8 2022-01-01 humanObservation A02 008-A02-2022-01-01
#> 9 2023-01-01 humanObservation A02 009-A02-2023-01-01
#> 10 2024-01-01 humanObservation A02 010-A02-2024-01-01
#> 11 2020-01-01 humanObservation A03 011-A03-2020-01-01
#> 12 2021-01-01 humanObservation A03 012-A03-2021-01-01
#> 13 2022-01-01 humanObservation A03 013-A03-2022-01-01
#> 14 2023-01-01 humanObservation A03 014-A03-2023-01-01
#> 15 2024-01-01 humanObservation A03 015-A03-2024-01-01