Use DRH Tools Locally

The DRH Edge platform is designed for localized data handling and is ideal for the following scenarios:

Local Data Preview and Analysis

  • Allows users to preview study data, analyze metrics, and view charts directly on their local machines.
  • Ensures secure data exploration in a controlled, offline environment without the need for external data sharing.

Supported Operating Systems

The DRH Edge Platform runs on Linux and macOS. The table below clarifies what is and is not supported:

EnvironmentSupport Status
Native Linux (Ubuntu, Debian, and most distributions)✅ Fully Supported
Linux under WSL on Windows (Ubuntu, Debian, etc.)✅ Fully Supported
macOS✅ Fully Supported
Native Windows (PowerShell, CMD, without WSL)❌ Not Supported

Note: :

  • WSL (Windows Subsystem for Linux) is simply a way to run Linux on Windows — it is built into Windows and requires no extra licensing.
  • When DRH Edge runs inside WSL, it is running on Linux. “Not supported on Windows” specifically means native Windows shells such as PowerShell or CMD without WSL.

Using DRH Edge Software

If you prefer to use the DRH Edge software for local data processing and visualization, please follow these steps:

  1. Meet the Edge Usage Prerequisites

    • DRH recommends using the suggested file structure for organizing your files. Alternatively, provide the DRH technical team with detailed information about your file structure and column patterns in advance.
    • This information is essential for selecting or developing the appropriate Spry runbook for your dataset.
  2. Select or Request a Runbook

    • The platform uses executable Markdown files (Spry Runbooks) to orchestrate the full pipeline. Pre-built runbooks are available for common device types (e.g., Dexcom Clarity, Simplera).
    • If your dataset structure is unique, the DRH technical team can develop a custom runbook tailored to your organization’s file patterns.
  3. Verification and Upload

    • After running the pipeline and generating the database, verify the results using the built-in SQLPage dashboard.

DRH Edge UI Setup and Data Transformation

This guide explains how to set up the DRH Edge platform and run the full data pipeline — from raw CSV files through validation, ETL, and into an interactive local dashboard.

The DRH Edge platform converts your CSV study files into a structured SQLite database, performs validation and quality checks, and presents results through a SQLPage web interface — all within your own environment.


Supported Operating Systems

See the Supported Operating Systems section above.


Prerequisites

Ensure the following tools are installed before proceeding. Each tool must meet the minimum version requirement.

ToolRoleDocumentation
Python 3.8+Primary language for Singer Taps and the drh-target package.Python.org
SpryOrchestrates tasks (bash/SQL) defined in Executable Markdown files.Spry Docs
Surveilr (v3.10+)The engine for data ingestion, OTel trace collection, and pipeline orchestration.Surveilr Docs

Installation & Version Verification

Install each tool using the instructions below, then run the corresponding verification command to confirm the correct version is active.

Python 3.8+

Ubuntu / Ubuntu under WSL:

sudo apt update && sudo apt install -y python3 python3-pip python3-venv

macOS:

brew install python

Verify:

python3 --version
# Expected output: Python 3.8.x or higher

Spry

Follow the installation steps at Spry Docs.

Verify:

spry --version

Surveilr (v3.10+)

Follow the installation steps at Surveilr Docs.

Verify:

surveilr --version
# Expected output: surveilr v3.10.x or higher


Getting Started

Step 1: Clone the Repository

git clone https://github.com/diabetes-research/spry-drh-edge-platform.git
cd spry-drh-edge-platform/drh-edge-core

Step 2: Organize Your Study Data

Before running the pipeline, verify your dataset is organized correctly.

Your dataset folder must contain the following files:

  • Study Metadata — general study information
  • Participant Data — subject demographics
  • CGM File Metadata — metadata for individual tracing files
  • CGM Tracing Files — raw glucose data (e.g., cgm_tracing_001, cgm_tracing_002, …)

The folder structure should look like this:

raw-data/
└── your-study/
    ├── study.csv
    ├── participant.csv
    ├── cgm_file_metadata.csv
    ├── cgm_tracing_001.csv
    ├── cgm_tracing_002.csv
    └── ...

Note: File structures and naming conventions must follow the DRH Data Organization Standard.

Copy your dataset into a new sub-directory under raw-data/:

cp -r /path/to/your/study raw-data/your-study/

Step 3: Configure Your Environment

Select the executable Markdown runbook that matches your data type:

Device / Data TypeRunbook File
Synthetic Simplera CGMdrh-simplera-cgm-systems.md
Synthetic Dexcom Claritydrh-dexcom-clarity.md
Physionet CGMACROSdrh-physio-cgmacros.md
Custom datasetContact the DRH technical team

Open the selected .md file and update the prepare-env task block with your values:

STUDY_DATA_PATH="raw-data/your-study/"
TENANT_ID="YOUR_LAB_ID"
TENANT_NAME="Your Research Lab Name"

Then run:

spry rb task prepare-env [your-runbook].md

Example:

spry rb task prepare-env drh-simplera-cgm-systems.md

Step 4: Clean Your Environment

Before running the pipeline, reset the environment to avoid file permission errors or stale data from previous runs:

./reset.sh

This script:

  • Marks all Singer Tap Python files as executable
  • Removes any existing resource-surveillance.sqlite.db
  • Deletes temporary UI and artifact files

Step 5: Execute the Pipeline

Run the full orchestration — Pre-validation → Ingestion → ETL → UI Packaging:

spry rb task prepare-db-deploy-server [your-runbook].md

Example:

spry rb task prepare-db-deploy-server drh-simplera-cgm-systems.md

Step 6: Launch the Dashboard

surveilr web-ui

Open http://localhost:9227 in your browser to view demographics, CGM trends, validation reports, and de-identification logs.

If port 9227 is already in use:

sudo kill $(sudo lsof -t -i:9227)

Step 7: Verify the Diagnostics Results in the UI

Check the Diagnostics section of the dashboard. If the overall_status shows FAIL, the ETL will be blocked and row-level diagnostic details will be displayed to help identify the issue.

diagnostics


Standard Workflow Summary

StepCommand
1. Clean./reset.sh
2. Configurespry rb task prepare-env [your-runbook].md
3. Executespry rb task prepare-db-deploy-server [your-runbook].md
4. Launch Dashboardsurveilr web-ui

Port & Database Management

  • Close External Viewers: Ensure resource-surveillance.sqlite.db is not open in any external tool (such as DB Browser for SQLite) while a spry task is running — this prevents the engine from writing to the file.
  • Gated Execution: If a previous run failed at the Validation Gate, run ./reset.sh before retrying to start from a clean state.
  • Multiple Datasets: To run a different dataset, copy the base runbook, update the environment variables, and execute using the new file:
cp drh-simplera-cgm-systems.md study-x-etl.spry.md
spry rb task prepare-env study-x-etl.spry.md
spry rb task prepare-db-deploy-server study-x-etl.spry.md

Dataset Preparation Aides

For complex datasets like PhysioNet CGMacros, the platform provides specialized Python aides to transform raw data into DRH-standard CSVs before ingestion. To access the technical specifications and development prompts for the dataset preparation aides, you can use the following links to the specialized documentation files within the repository:

Aide ScriptPurposeDocumentation Link
Supporting Files GeneratorCreates relational CSVs (institutions, investigators, labs) from text specs.Gen-Spec-Support.md
Required Files GeneratorAutomatically derives cgm file metadata, study, and participant CSVs based on the dataset.Gen-Spec-Content.md

Workflow Context

As highlighted in the platform documentation, these aides are positioned at the start of your data journey: Raw Data[Preparation Aides]Standard CSVs[Runbook Pipeline]Dashboard.

Key References for Implementation

  • Aide Logic: The documentation links above include the clinical logic used to derive ICD-10 codes and diabetes types automatically from raw glucose markers.
  • Runbook Orchestration: Once the CSVs are generated, use the drh-physio-cgmacros.md runbook to execute the ingestion and ETL tasks.
  • System Requirements: Ensure you are using Surveilr v3.34.0 or more for the most stable experience with these preparation tools.

Platform Visual Showcase

🏛️ Research Landing Page

The entry point provides a high-level summary of active studies, enrollment progress, and system health.

Research Landing Page
Overview of active studies, enrollment progress, and system health.


🔍 Validation Gate & Diagnostics

If incoming data violates clinical schemas or structural rules, the ETL is automatically blocked to prevent database corruption. Row-level diagnostic details are surfaced directly in the UI.

Diagnostics View
Schema validation and ETL gating with deep-link row-level diagnostics.


📊 Clinical Research Dashboard

Once data passes the validation gate, it is transformed into optimized relational tables for high-fidelity visualization and metric calculation.

Research Dashboard
Participant demographics, enrollment metrics, and CGM analytics.


🛠️ Participant Drill-Down

Detailed drill-downs allow researchers to inspect individual participant traces, meal logs, and device performance metrics in a centralized view.

Participant Metrics
Detailed participant-level time-series data and device performance metrics.


Data De-identification

As part of the DRH Edge Platform, the ETL pipeline includes built-in capabilities for de-identifying and anonymizing sensitive data, ensuring compliance with privacy regulations before data is shared externally. This protects Personally Identifiable Information (PII) and Protected Health Information (PHI) while maintaining the data’s analytical value.

Why De-identification Is Important

Organizations handling Continuous Glucose Monitoring (CGM) data must comply with global privacy standards such as HIPAA (USA), GDPR (EU), and other applicable data protection laws. De-identification ensures that sensitive information — such as participant demographics and device identifiers — is secured before external sharing or publication.

De-identification in DRH Edge

The platform focuses on removing or masking direct identifiers . Masking is applied automatically during pipeline execution and is logged in the dashboard under the de-identification log view.

Common transformations include:

  • Participant Name → Removed or replaced with a randomly generated identifier
  • Date of Birth → Replaced with an age range or general birth year
  • Email Addresses → Redacted or replaced with dummy values
  • Device IDs → Hashed or replaced with a pseudonymized ID

Customization

Each customer’s file structure may vary. You can specify which columns require anonymization, and the DRH technical team will generate the necessary SQL scripts and incorporate them into a custom runbook for your organization.

Example: The following SQL performs de-identification

deidentify

Before masking the investigator file

investigatr-before-masking

After masking

masked-investigator

Tools for CSV File Validation

We recommend the following third-party open-source tools to validate whether your files adhere to the required structure before running the pipeline.

Data Curator

Data Curator is a lightweight desktop data editor designed to help describe, validate, and share open data. It offers a user-friendly graphical interface for editing tabular data while ensuring adherence to data standards.

Key Features:

  • Schema Editing: Modify Frictionless JSON schemas to suit your project’s requirements.
  • Load and Preview Data: Visualize and inspect CSV files before ingestion.
  • Validate Against Schema: Perform schema validation using the Frictionless JSON specification.
  • Edit and Save: Correct errors directly in the application and save validated data.
  • Export Options: Share validated datasets as clean CSV files or complete data packages.

Open Data Editor (ODE)

The Open Data Editor is an online tool for non-technical users to explore, validate, and detect errors in tabular datasets. No installation is required.

Key Features:

  • Online Access: Access directly through your browser.
  • Error Detection: Quickly identify errors in table formatting and data values.
  • Schema Validation: Validate datasets against predefined Frictionless JSON schemas.
  • Interactive Editing: Fix errors and inconsistencies in real-time.
  • Export Options: Save corrected files for further use or sharing.