Time-based

  1. Find a check under the time_variance directory to copy as a starting point.

  2. Add the check to the time_variance/definitions.py file. For example:

        "ocid": ocid,
        "tender_title": tender_title,
        "phase_stable": phase_stable,
    

Each check is an object (usually a module) that has two attributes: filter and evaluate.

Pairs of items with the same ocid are read in batches from the dataset and its ancestor. Each item is passed to the filter function, which:

  1. Accepts five arguments: an accumulator, an ancestor’s item and its ID, and a dataset’s item and its ID

  2. Returns whether the check can be calculated against the pair of items (for example, if both are present)

If filter returns True, and if the new item is present, then the evaluate function:

  1. Accepts five arguments, like filter

  2. Determines whether the check passes

  3. Returns the accumulator, and whether the check passes

The accumulator is initialized as:

    """
    Initialize a time-based check result accumulator.
    """
    return {
        "total_count": 0,
        "coverage_count": 0,
        "failed_count": 0,
        "ok_count": 0,
        "examples": ReservoirSampler(50),
    }

time_variance/processor.py then prepares the result dict. An empty result dict looks like:

    """
    Initialize a time-based check result.

    :param version: the check's version
    """
    return {
        "check_result": None,
        "check_value": None,
        "coverage_value": None,
        "coverage_result": None,
        "meta": None,
        "version": version,
    }

Storage

The result of each check for a given dataset is stored in a single row in the time_variance_level_check table.