Dataset-level¶

Determine the type of check (see Repository structure).
Find a check under the corresponding dataset sub-directory to copy as a starting point.

Add the check to the dataset/definitions.py file. For example:

    "distribution.main_procurement_category": main_procurement_category,
    "unique.tender_id": tender_id,

Each check is an object (usually a module) that has two attributes: add_item and get_result.

Items are read in batches. Each item is passed to the add_item function, which:

Accepts three arguments: an accumulator (a dict), the item, and the item’s ID
Determines whether the check can be calculated against the item
If not, returns the unchanged accumulator
Updates the accumulator
Returns the updated accumulator

Once all items are read, the get_result function:

Accepts the accumulator
Creates an empty result dict
Determines whether the check can be calculated against the accumulator
If not, sets result["meta"] = {"reason": "..."} and returns the result dict
Determines whether the check passes

Note

Some ID fields allow both integer and string values. When resolving references by comparing IDs, the check should fail if the IDs are different types. It should neither succeed nor N/A (it is likely to N/A if IDs are not coerced to string).
Sets these keys on the result object:

result (boolean)
Whether the check passes

value (float)
A number from 0 to 100

meta
Any additional data to help interpret the result, like examples
Returns the result dict

An empty result dict looks like:

    """
    Initialize a dataset-level check result.

    :param version: the check's version
    """
    return {
        "result": None,
        "value": None,
        "meta": None,
        "version": version,
    }

Storage¶

The result of each check for a given dataset is stored in a single row in the dataset_level_check table.