Null data

Presentation from the Research Group for Genomic Epidemiology – 26 September 2022

Null data in databases

Databases evolve (and in some cases de-evolve) over time, when the needs and use of the database and the data contained within changes. This could for instance happen when another sample round begins, and the information gathered reflects the experiences from the previous rounds, which might enrich the information of the new sampling round. This is a typical workflow, where the information collected and thus later stored changes over time and therefore the associated database should also reflect the given changes. While adding new information to a database is common practice, the users should be aware that the process can lead to unfortunately inconsistency and difficulty for other and future users. By adding a new column to an already existing database table, requires a deliberately choice for all the existing data entries, otherwise every row will just be provided a default value such as NULL. If a conscious choice of what “NULL” is not made, or not shared among the users (direct and indirect) of the database, a simple data representation of NULL might not mean the same even within the same column.

NULL is meant to mean, “unknown” or “not applicable”, which already suggest multiple interpretations. But when it is also used as a null value, the understanding of NULL is getting more and more nontransparent. Overall, the aim of using NULL is to provide information, and when the understanding of this information is inconsistent it might lead to confusion rather than inside. Usage of NULL should therefore be meaningful and not just used a “filler” variable, but rather used to provide meaning information for a given entry, not only by itself but also in comparison to other entries.

Timmie Lagermann’s presentation