API Docs#

corella.basisOfRecord_values()#

A pandas.Series of accepted (but not mandatory) values for basisOfRecord values.

Parameters:

None

Return type:

A pandas.Series of accepted (but not mandatory) values for basisOfRecord values..

Examples

>>> corella.basisOfRecord_values()
  basisOfRecord values
0     humanObservation
1   machineObservation
2       livingSpecimen
3    preservedSpecimen
4       fossilSpecimen
5     materialCitation
corella.check_dataset(occurrences=None, events=None, max_num_errors=5, print_report=True)#

Checks whether or not the data in your occurrences complies with Darwin Core standards.

Parameters:
  • occurrences (pandas.DataFrame) – The pandas.DataFrame that contains your occurrences.

  • events (pandas.DataFrame) – The pandas.DataFrame that contains your events.

  • max_num_errors (int) – The maximum number of errors to display at once. Default is 5.

  • print_report (logical) – Specify whether you want to print the report or return a Boolean denoting whether or not the dataset passed. Default is True

Return type:

Raises a ValueError if something is not valid.

Examples

Passing Dataset Occurrences using check_dataset

corella.countryCode_values()#

A pandas.Series of accepted (but not mandatory) values for countryCode values.

Parameters:

None

Return type:

A pandas.Series of accepted (but not mandatory) values for countryCode values..

Examples

>>> corella.countryCode_values()
0      AD
1      AE
2      AF
3      AG
4      AI
       ..
244    YE
245    YT
246    ZA
247    ZM
248    ZW
Name: Code, Length: 249, dtype: object
corella.event_terms()#

A pandas.Series of accepted (but not mandatory) values for event data.

Parameters:

None

Return type:

A pandas.Series of accepted (but not mandatory) values for event data.

Examples

>>> corella.event_terms()
0                     type
1                 modified
2                 language
3                  license
4             rightsHolder
              ...         
77         georeferencedBy
78       georeferencedDate
79    georeferenceProtocol
80     georeferenceSources
81     georeferenceRemarks
Name: term_localName, Length: 82, dtype: object
corella.occurrence_terms()#

A pandas.Series of accepted (but not mandatory) values for occurrence data.

Parameters:

None

Return type:

A pandas.Series of accepted (but not mandatory) values for occurrence data.

Examples

>>> corella.occurrence_terms()
0                             type
1                         modified
2                         language
3                          license
4                     rightsHolder
                  ...             
201              relatedResourceID
202         relationshipOfResource
203        relationshipAccordingTo
204    relationshipEstablishedDate
205            relationshipRemarks
Name: term_localName, Length: 206, dtype: object
corella.set_abundance(dataframe=None, individualCount=None, organismQuantity=None, organismQuantityType=None)#

Checks for location information, as well as uncertainty and coordinate reference system. Also runs data checks on coordinate validity.

Parameters:
  • dataframe (pandas.DataFrame) – The pandas.DataFrame that contains your data to check

  • individualCount (str) – A column name that contains your individual counts (should be whole numbers).

  • organismQuantity (str) – A column name that contains a number or enumeration value for the quantity of organisms. Used together with organismQuantityType to provide context.

  • organismQuantityType (str) – A column name or phrase denoting the type of quantification system used for organismQuantity.

Return type:

pandas.DataFrame with the updated data.

Examples

set_abundance vignette

corella.set_collection(dataframe=None, datasetID=None, datasetName=None, catalogNumber=None)#

Checks for location information, as well as uncertainty and coordinate reference system. Also runs data checks on coordinate validity.

Parameters:
  • dataframe (pandas.DataFrame) – The pandas.DataFrame that contains your data to check

  • datasetID (str) – A column name or other string denoting the identifier for the set of data. May be a global unique identifier or an identifier specific to a collection or institution.

  • datasetName (str) – A column name or other string identifying the data set from which the record was derived.

  • catalogNumber (str) – A column name or other string denoting a unique identifier for the record within the data set or collection.

Return type:

pandas.DataFrame with the updated data.

Examples

set_collection vignette

corella.set_coordinates(dataframe=None, decimalLatitude=None, decimalLongitude=None, geodeticDatum=None, coordinateUncertaintyInMeters=None, coordinatePrecision=None)#

Checks for location information, as well as uncertainty and coordinate reference system. Also runs data checks on coordinate validity.

Parameters:
  • dataframe (pandas.DataFrame) – The pandas.DataFrame that contains your data to check

  • decimalLatitude (str) – A column name that contains your latitudes (units in degrees).

  • decimalLongitude (str) – A column name that contains your longitudes (units in degrees).

  • geodeticDatum (str) – A column name or a str with he datum or spatial reference system that coordinates are recorded against (usually “WGS84” or “EPSG:4326”). This is often known as the Coordinate Reference System (CRS). If your coordinates are from a GPS system, your data are already using WGS84.

  • coordinateUncertaintyInMeters (str, float or int) – A column name (str) or a float/int with the value of the coordinate uncertainty. coordinateUncertaintyInMeters will typically be around 30 (metres) if recorded with a GPS after 2000, or 100 before that year.

  • coordinatePrecision (str, float or int) – Either a column name (str) or a float/int with the value of the coordinate precision. coordinatePrecision should be no less than 0.00001 if data were collected using GPS.

Return type:

pandas.DataFrame with the updated data.

Examples

set_coordinates vignette

corella.set_datetime(dataframe=None, eventDate=None, year=None, month=None, day=None, eventTime=None, string_to_datetime=False, yearfirst=True, dayfirst=False, time_format='%H:%m:%S')#

Checks for time information, such as the date an occurrence occurred. Also runs checks on the validity of the format of the date.

Parameters:
  • dataframe (pandas.DataFrame) – The pandas.DataFrame that contains your data to check

  • eventDate (str) – A column name or value with the date or date + time of the observation/event.

  • year (str or int) – A column name or value with the year the observation/event.

  • month (str or int) – A column name or value with the month the observation/event.

  • day (str or int) – A column name or value with the day the observation/event.

  • eventTime (str) – A column name or value with the time the observation/event. Date + time information for observations is accepted in eventDate.

  • string_to_datetime (logical) – An argument that tells corella to convert dates that are in a string format to a datetime format. Default is False.

  • yearfirst (logical) – An argument to specify whether or not the day is first when converting your string to datetime. Default is True.

  • dayfirst (logical) – An argument to specify whether or not the day is first when converting your string to datetime. Default is False.

  • time_format (str) – A str denoting the original format of the dates that are being converted from a str to a datetime object. Default is '%H:%m:%S'.

Return type:

pandas.DataFrame with the updated data.

Examples

set_datetime vignette

corella.set_events(dataframe=None, eventID=None, parentEventID=None, eventType=None, Event=None, samplingProtocol=None, event_hierarchy=None, sequential_id=False, add_sequential_id='first', add_random_id='first', composite_id=None, sep='-', random_id=False)#

Identify or format columns that contain information about an Event. An “Event” in Darwin Core Standard refers to an action that occurs at a place and time. Examples include:

  • A specimen collecting event

  • A survey or sampling event

  • A camera trap image capture

  • A marine trawl

  • A camera trap deployment event

  • A camera trap burst image event (with many images for one observation)

Parameters:
  • dataframe (pandas.DataFrame) – The pandas.DataFrame that contains your data to check

  • eventID (str, logical) – A column name (str) that contains a unique identifier for your event. Can also be set to True to generate values. Parameters for these values can be specified with the arguments sequential_id, add_sequential_id, composite_id, sep and random_id

  • sequential_id (logical) – Create sequential IDs and/or add sequential ids to composite ID. Default is False.

  • add_sequential_id (str) – Determine where to add sequential id in composite id. Values are first and last. Default is first.

  • composite_id (str, list) – str or list containing columns to create composite IDs. Can be combined with sequential ID.

  • sep (char) – Separation character for composite IDs. Default is -.

  • random_id (logical) – Create a random ID using the uuid package. Default is False.

  • add_random_id (str) – Determine where to add sequential id in random id. Values are first and last. Default is first.

  • parentEventID (str) – A column name (str) that contains a unique ID belonging to an event below it in the event hierarchy.

  • eventType (str) – A column name (str) or a str denoting what type of event you have.

  • Event (str) – A column name (str) or a str denoting the name of the event.

  • samplingProtocol (str or) – Either a column name (str) or a str denoting how you collected the data, i.e. “Human Observation”.

  • event_hierarchy (dict) – A dictionary containing a hierarchy of all events so they can be linked. For example, if you have a set of observations that were taken at a particular site, you can use the dict {1: “Site Visit”, 2: “Sample”, 3: “Observation”}.

Return type:

pandas.DataFrame with the updated data.

Examples

set_events vignette

corella.set_individual_traits(dataframe=None, individualID=None, lifeStage=None, sex=None, vitality=None, reproductiveCondition=None)#

Checks for location information, as well as uncertainty and coordinate reference system. Also runs data checks on coordinate validity.

Parameters:
  • dataframe (pandas.DataFrame) – The pandas.DataFrame that contains your data to check

  • individualID (str) – A column name containing an identifier for an individual or named group of individual organisms represented in the Occurrence. Meant to accommodate resampling of the same individual or group for monitoring purposes. May be a global unique identifier or an identifier specific to a data set.

  • lifeStage (str) – A column name containing the age, class or life stage of an organism at the time of occurrence.

  • sex (str) – A column name or value denoting the sex of the biological individual.

  • vitality (str) – A column name or value denoting whether an organism was alive or dead at the time of collection or observation.

  • reproductiveCondition (str) – A column name or value denoting the reproductive condition of the biological individual.

Return type:

pandas.DataFrame with the updated data.

Examples

set_individual_traits vignette

corella.set_license(dataframe=None, license=None, rightsHolder=None, accessRights=None)#

Checks for location information, as well as uncertainty and coordinate reference system. Also runs data checks on coordinate validity.

Parameters:
  • dataframe (pandas.DataFrame) – The pandas.DataFrame that contains your data to check

  • license (str) – A column name or value denoting a legal document giving official permission to do something with the resource. Must be provided as a url to a valid license.

  • rightsHolder (str) – A column name or value denoting the person or organisation owning or managing rights to resource.

  • accessRights (str) – A column name or value denoting any access or restrictions based on privacy or security.

Return type:

pandas.DataFrame with the updated data.

Examples

set_license vignette

corella.set_locality(dataframe=None, continent=None, country=None, countryCode=None, stateProvince=None, locality=None)#

Checks for additional location information, such as country and countryCode.

Parameters:
  • dataframe (pandas.DataFrame) – The pandas.DataFrame that contains your data to check

  • continent (str) – Either a column name (str) or a string denoting one of the seven continents. Valid values are: "Africa", "Antarctica", "Asia", "Europe", "North America", "Oceania", "South America"

  • country (str or pandas.Series) – Either a column name (str) or a string denoting a valid country name. See country_codes.

  • countryCode (str or pandas.Series) – Either a column name (str) or a string denoting a valid country code. See country_codes.

  • stateProvince (str or pandas.Series) – Either a column name (str) or a string denoting a sub-national region.

  • locality (str or pandas.Series) – Either a column name (str) or a string containing a specific description of a location or place.

Return type:

pandas.DataFrame with the updated data.

Examples

set_locality vignette

corella.set_observer(dataframe=None, recordedBy=None, recordedByID=None)#

Checks for the name of the taxon you identified is present.

Parameters:
  • dataframe (pandas.DataFrame) – The pandas.DataFrame that contains your data to check

  • recordedBy (str) – A column name or name(s) of people, groups, or organizations responsible for recording the original occurrence. The primary collector or observer should be listed first.

  • recordedByID (str) – A column name or the globally unique identifier for the person, people, groups, or organizations responsible for recording the original occurrence.

Return type:

pandas.DataFrame with the updated data.

Examples

Either add here later or link to vignettes.

corella.set_occurrences(dataframe=None, occurrenceID=None, catalogNumber=None, recordNumber=None, basisOfRecord=None, sequential_id=False, add_sequential_id='first', composite_id=None, sep='-', random_id=False, add_random_id='first', occurrenceStatus=None, errors=[], add_eventID=False, events=None, eventType=None)#

Checks for unique identifiers of each occurrence and how the occurrence was recorded.

Parameters:
  • dataframe (pandas.DataFrame) – The pandas.DataFrame that contains your data to check

  • occurrenceID (str or bool) – Either a column name (str) or True (bool). If a column name is provided, the column will be renamed. If True is provided, unique identifiers will be generated in the dataset. Note: Every occurrence should have an occurrenceID entry. Ideally, IDs should be persistent to avoid being lost in future updates. They should also be unique, both within the dataset, and (ideally) across all other datasets.

  • catalogNumber (str or bool) – Either a column name (str) or True (bool). If a column name is provided, the column will be renamed. If True is provided, unique identifiers will be generated in the dataset.

  • recordNumber (str or bool) – Either a column name (str) or True (bool). If a column name is provided, the column will be renamed. If True is provided, unique identifiers will be generated in the dataset.

  • sequential_id (logical) – Create sequential IDs and/or add sequential ids to composite ID. Default is False.

  • add_sequential_id (str) – Determine where to add sequential id in composite id. Values are first and last. Default is first.

  • composite_id (str, list) – str or list containing columns to create composite IDs. Can be combined with sequential ID.

  • sep (char) – Separation character for composite IDs. Default is -.

  • random_id (logical) – Create a random ID using the uuid package. Default is False.

  • add_random_id (str) – Determine where to add sequential id in random id. Values are first and last. Default is first.

  • basisOfRecord (str) – Either a column name (str) or a valid value for basisOfRecord to add to the dataset. For values of basisOfRecord, it only accepts camelCase, for consistency with field "humanObservation", "machineObservation", "livingSpecimen", "preservedSpecimen", "fossilSpecimen", "materialCitation"

  • occurrenceStatus (str) – Either a column name (str) or a valid value for occurrenceStatus to add to the dataset. Valid values are "present" or "absent"

  • errors (list) – ONLY FOR DEBUGGING: existing list of errors.

  • add_eventID (logic) – Either a column name (str) or a valid value for occurrenceStatus to add to the dataset.

  • events (pd.DataFrame) – Dataframe containing your events.

  • eventType (str) – Either a column name (str) or a valid value for eventType to add to the dataset.

Return type:

pandas.DataFrame with the updated data.

Examples

set_occurrences vignette

corella.set_scientific_name(dataframe=None, scientificName=None, taxonRank=None, scientificNameAuthorship=None)#

Checks for the name of the taxon you identified is present.

Parameters:
  • dataframe (pandas.DataFrame) – The pandas.DataFrame that contains your data to check

  • scientificName (str) – A column name (str) denoting all full scientific names in the lower level taxonomic rank that can be determined.

  • taxonRank (str) – A column name (str) denoting the taxonomic rank of your scientific names (species, genus etc.)

  • scientificNameAuthorship (str) – A column name (str) denoting the authorship information for scientificName.

Return type:

pandas.DataFrame with the updated data.

Examples

set_scientific_name vignette

corella.set_taxonomy(dataframe=None, kingdom=None, phylum=None, taxon_class=None, order=None, family=None, genus=None, specificEpithet=None, vernacularName=None)#

Adds extra taxonomic information. Also runs checks on whether or not the names are the correct data type.

Parameters:
  • dataframe (pandas.DataFrame) – The pandas.DataFrame that contains your data to check

  • kingdom (str,``list``) – A column name, kingdom name (str) or list of kingdom names (list).

  • phylum (str,``list``) – A column name, phylum name (str) or list of phylum names (list).

  • taxon_class (str,``list``) – A column name, class name (str) or list of class names (list).

  • order (str,``list``) – A column name, order name (str) or list of order names (list).

  • family (str,``list``) – A column name, family name (str) or list of family names (list).

  • genus (str,``list``) – A column name, genus name (str) or list of genus names (list).

  • specificEpithet (str,``list``) – A column name, specificEpithet name (str) or list of specificEpithet names (list). Note: If scientificName is Abies concolor, the specificEpithet is concolor.

  • vernacularName (str,``list``) – A column name, vernacularName name (str) or list of vernacularName names (list).

Return type:

pandas.DataFrame with the updated data.

Examples

set_taxonomy vignette

corella.suggest_workflow(occurrences=None, events=None)#

Suggests a workflow to ensure your data conforms with the pre-defined Darwin Core standard.

Parameters:

dataframe (pandas.DataFrame) – The pandas.DataFrame that contains your data to check.

Return type:

A printed report detailing presence or absence of required data.

Examples

Suggest a workflow for a small dataset

import pandas as pd
import corella
df = pd.DataFrame({'species': ['Callocephalon fimbriatum', 'Eolophus roseicapilla'], 'latitude': [-35.310, '-35.273'], 'longitude': [149.125, 149.133], 'eventDate': ['14-01-2023', '15-01-2023'], 'status': ['present', 'present']})
corella.suggest_workflow(dataframe=df)
── Darwin Core terms ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── All DwC terms ──

Matched 1 of 5 column names to DwC terms:

✓ Matched: eventDate
✗ Unmatched: species, latitude, status, longitude

── Minimum required DwC terms occurrences ──

Type                       Matched term(s)    Missing term(s)
-------------------------  -----------------  ------------------------------------------------
Identifier (at least one)  -                  occurrenceID OR catalogNumber OR recordNumber
Record type                -                  basisOfRecord
Scientific name            -                  scientificName
Location                   -                  decimalLatitude, decimalLongitude, geodeticDatum
Date/Time                  eventDate          -

── Suggested workflow ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Occurrences ──

To make your occurrences Darwin Core compliant, use the following workflow:

corella.set_occurrences()
corella.set_scientific_name()
corella.set_coordinates()

Additional functions: set_abundance(), set_collection(), set_individual_traits(), set_license(), set_locality(), set_taxonomy()