Informatics Hub uses Modular BI to support a complex case-based surveillance system for HIV

Kenya's HIV case-based surveillance system

From 2015, Kenya, through the Ministry of Health - NASCOP (National AIDS and STIs Control Programme) and assisted by UCSF, has implemented a case-based surveillance (CBS) system to track individuals living with HIV. This system enables health officials to monitor the progress of individual HIV cases, including testing and treatment. It also helps identify trends and patterns in disease occurrence. It focuses on specific life events related to the disease, such as the date of infection, date of ART (Antiretroviral Therapy) initiation, regimen at ART initiation, switch/change of regimen-line, CD4 and viral load results, and other indicators.

Modular BI simplifies ETL process with SQL

The data points for the CBS system are collected from various disparate systems, resulting in a large amount of encounter-level data (such as clinical encounters, prescriptions, and lab results) that need to be processed to extract the relevant events. The source data is subject to changes in its structure, and the requirements for the visuals and interpretation of key indicators may change over time. To address this challenge, the UCSF HIS team developed an ETL (Extract Transform Load) architecture called Modular BI. This architecture is used to create and manage procedures that extract, transform, and load data into analytics tables.

Modular BI aims to simplify the complexities of ETLs by dividing the process into four separate layers that can be managed independently. These layers include staging, base, derived, and final. Each layer has a unique role: the staging layer is largely used to collect and clean the data, the base layer to arrange the data, the derived layer to make complex indicator calculations (for example, TX_CURR), and the final one combines everything so that the data can be imported into a reporting tool such as PowerBI.

Modular BI Proces: staging, bases, derived, final
Modular BI divides the ETL process into four separate layers that can be managed independently. These layers include staging, base, derived, and final. Each layer has a unique role: the staging layer is largely used to collect and clean the data, the base layer to arrange the data, the derived layer to make complex indicator calculations, and the final stage combines everything so that the data can be imported into a reporting tool such as PowerBI.

Modular BI is entirely written in SQL, which is a powerful and flexible language for data manipulation. SQL is widely used, making it easy to find developers with SQL coding skills. By using plain SQL, and not a dedicated ETL tool, all of Modular BI’s code can be version controlled, kept in git, and undergo source code reviews by peers. This also means any change can be rolled back, if needed.

Modular BI code in Git for managing version control.
Version controlled SQL Modular BI code

Modular BI used widely beyond Kenya 

Modular BI is a significant step forward in reducing the time required to implement changes in source data, calculate indicators, and producing reports. UCSF implemented it first in Kenya, but it is now widely used in several other UCSF-supported countries as well, including Tanzania, Jamaica, Trinidad and Tobago, Namibia and Uganda.