Dataset-level

  1. Determine the type of check (see Repository structure).

  2. Find a check under the corresponding dataset sub-directory to copy as a starting point.

  3. Add the check to the dataset/definitions.py file. For example:

        "distribution.main_procurement_category": main_procurement_category,
        "unique.tender_id": tender_id,
    

Each check is an object (usually a module) that has two attributes: add_item and get_result.

Items are read in batches. Each item is passed to the add_item function, which:

  1. Accepts three arguments: an accumulator (a dict), the item, and the item’s ID

  2. Determines whether the check can be calculated against the item

  3. If not, returns the unchanged accumulator

  4. Updates the accumulator

  5. Returns the updated accumulator

Once all items are read, the get_result function:

  1. Accepts the accumulator

  2. Creates an empty result dict

  3. Determines whether the check can be calculated against the accumulator

  4. If not, sets result["meta"] = {"reason": "..."} and returns the result dict

  5. Determines whether the check passes

    Note

    Some ID fields allow both integer and string values. When resolving references by comparing IDs, the check should fail if the IDs are different types. It should neither succeed nor N/A (it is likely to N/A if IDs are not coerced to string).

  6. Sets these keys on the result object:

    result (boolean)

    Whether the check passes

    value (float)

    A number from 0 to 100

    meta

    Any additional data to help interpret the result, like examples

  7. Returns the result dict

An empty result dict looks like:

    """
    Initialize a dataset-level check result.

    :param version: the check's version
    """
    return {
        "result": None,
        "value": None,
        "meta": None,
        "version": version,
    }

Storage

The result of each check for a given dataset is stored in a single row in the dataset_level_check table.