Check design

Goals

Pelican is focused on data quality. It supports an interrogation of the quality of a dataset, rather than an exploration of the data that it contains. As such, it is not designed to support many features that are more appropriate to exploration.

It is also focused on intrinsic quality rather than extrinsic quality. That said, it can include intrinsic metrics that are easy to calculate (like the number of contracting processes) to support extrinsic metrics (like the proportion of all contracts covered by the dataset).

Levels

Quality checks are grouped into four levels, based on the subject they are measuring:

Field-level

A single field’s value

Compiled release-level

A single contracting process

Dataset-level

A collection of contracting processes

Time-based

Two collections at different times

Monetary values

To compare and sum amounts, all amounts are converted to USD. This conversion does not take inflation into account: EUR in 2010 is converted to USD in 2010. As such, checks against datasets covering periods of high inflation might produce incorrect results. Relevant checks are:

coherent/value_realistic

Each monetary value is between -5 billion USD and +5 billion USD.

distribution/value

The sum of the top 1% of tender/award/contract values doesn’t exceed 50% of the sum of all such values.

Compiled release-level consistency checks compare tender, award, contract and transaction amounts. These amounts are expected to cover a short period, such that incorrect results are unlikely, especially given the generous margins.