Dataset-level
=============

#. Determine the type of check (see :ref:`repository-structure`).
#. Find a check under the corresponding ``dataset`` sub-directory to copy as a starting point.
#. Add the check to the ``dataset/definitions.py`` file. For example:

   .. literalinclude:: ../../../dataset/definitions.py
      :language: python
      :start-at: distribution.main_procurement_category
      :end-at: unique.tender_id

Each check is an object (usually a module) that has two attributes: ``add_item`` and ``get_result``.

Items are read in batches. Each item is passed to the ``add_item`` function, which:

#. Accepts three arguments: an accumulator (a dict), the item, and the item's ID
#. Determines whether the check can be calculated against the item
#. If not, returns the unchanged accumulator
#. Updates the accumulator
#. Returns the updated accumulator

Once all items are read, the ``get_result`` function:

#. Accepts the accumulator
#. Creates an empty ``result`` dict
#. Determines whether the check can be calculated against the accumulator
#. If not, sets ``result["meta"] = {"reason": "..."}`` and returns the ``result`` dict
#. Determines whether the check passes

   .. note::

      Some ID fields allow both integer and string values. When resolving references by comparing IDs, the check should fail if the IDs are different types. It should neither succeed nor N/A (it is likely to N/A if IDs are not coerced to string).

#. Sets these keys on the ``result`` object:

   ``result`` (boolean)
     Whether the check passes
   ``value`` (float)
     A number from 0 to 100
   ``meta``
     Any additional data to help interpret the result, like examples

#. Returns the ``result`` dict

An empty ``result`` dict looks like:

.. literalinclude:: ../../../pelican/util/checks.py
   :language: python
   :start-after: get_empty_result_dataset
   :end-at: }

Storage
-------

The result of each check for a given dataset is stored in a single row in the ``dataset_level_check`` table.