1. Data Organization Ontology

The following ontology is used to structure and organize data on disk for neuroimaging analyses. The system supports three hierarchical levels of analysis: subject-level, group-level, and longitudinal. Each level corresponds to a specific directory structure and naming convention to ensure consistency, traceability, and compatibility with processing pipelines.

1.1. Analysis Levels

  1. Subject-Level Analysis

  • Data is organized per individual subject.

  • Root directory: subjects/.

  • Each subject has a dedicated subdirectory named using a unique identifier (e.g., sub-001, sub-002).

  • Within each subject directory, session-specific data is stored in subdirectories following the BIDS convention (e.g., ses-01, ses-02).

  • Example path: subjects/sub-001/ses-01/.

  1. Group-Level Analysis

  • Data is aggregated across subjects for group-level processing.

  • Root directory: quality_check/, morphometry/, statistic/.

  • Each group-level analysis type has its own named directory:

    • quality_check/ for quality control metrics.

    • morphometry/ for structural measurements.

    • statistic/ for statistical outputs.

  • These directories contain results derived from multiple subjects and sessions, often in tabular or summary formats.

  1. Longitudinal Analysis

  • Data is structured to support longitudinal studies across multiple timepoints.

  • Root directory: longitudinal/.

  • Contains subject-wise data aggregated across sessions and timepoints.

  • May include harmonized metrics, trajectory models, or longitudinal statistical outputs.

  • Example path: longitudinal/subject/sub-001/ses-01/.

1.2. Metadata and Logs

A dataset_description.json file is placed at the root of the dataset. This file documents: - The tool or pipeline used for processing. - The version of the tool. - The container technology used (Docker image), if applicable.

A log/ directory is maintained at the root level to store logs from all processing steps. One key log file is report_<timestamp>.rst, which provides a comprehensive report of: - All processing steps executed. - Input and output file locations for each step. - Runtime information for each step.

1.3. Directory Overview

The overall structure can be visualized as:

data_root/
├── subjects/
│   ├── sub-001/
│   │   ├── ses-01/
│   │   └── ses-02/
│   └── sub-002/
│       └── ses-01/
├── quality_check/
├── morphometry/
├── statistic/
├── longitudinal/
│   ├── subjects
│   │   ├── sub-001/
│   │   └── sub-002/
│   └── quality_check
├── log/
│   └── report.rst
└── dataset_description.json

In a tool that relies on a non-typed ontology, multiple runs of the same modality may occur within a single session, which can lead to naming conflicts. In such cases, an additional run-<ID> subdirectory is created. When the run identifier is missing from the filename, or when several images of the same modality share the same run identifier, a deterministic five-digit identifier is generated from the filename using a UUID. This ensures stable and reproducible naming. To guarantee uniqueness of run identifiers within the session folder, we check for duplicates and issues a warning if a conflict is detected.