Dataset-level¶
Determine the type of check (see Repository structure).
Find a check under the corresponding
dataset
sub-directory to copy as a starting point.Add the check to the
dataset/definitions.py
file. For example:"distribution.main_procurement_category": main_procurement_category, "unique.tender_id": tender_id,
Each check is an object (usually a module) that has two attributes: add_item
and get_result
.
Items are read in batches. Each item is passed to the add_item
function, which:
Accepts three arguments: an accumulator (a dict), the item, and the item’s ID
Determines whether the check can be calculated against the item
If not, returns the unchanged accumulator
Updates the accumulator
Returns the updated accumulator
Once all items are read, the get_result
function:
Accepts the accumulator
Creates an empty
result
dictDetermines whether the check can be calculated against the accumulator
If not, sets
result["meta"] = {"reason": "..."}
and returns theresult
dictDetermines whether the check passes
Note
Some ID fields allow both integer and string values. When resolving references by comparing IDs, the check should fail if the IDs are different types. It should neither succeed nor N/A (it is likely to N/A if IDs are not coerced to string).
Sets these keys on the
result
object:result
(boolean)Whether the check passes
value
(float)A number from 0 to 100
meta
Any additional data to help interpret the result, like examples
Returns the
result
dict
An empty result
dict looks like:
"""
Initialize a dataset-level check result.
:param version: the check's version
"""
return {
"result": None,
"value": None,
"meta": None,
"version": version,
}
Storage¶
The result of each check for a given dataset is stored in a single row in the dataset_level_check
table.