Analyzer → Importer Integration ================================ .. note:: This integration is available from **BIOMERO v2.3.0** / **NL-BIOMERO v1.3.0** onward and requires ``IMPORTER_ENABLED=true`` in your deployment. Overview -------- Before this integration, analysis results (segmentation masks, measurements, etc.) produced by SLURM workflows were imported back into OMERO exclusively via the OMERO API. This means the files landed on **OMERO server storage** (the ``/OMERO`` volume), which grows without limit and bypasses the managed remote-storage layer. With the Analyzer → Importer integration the result files are instead: 1. Written to the **shared remote storage** that BIOMERO.importer already monitors. 2. Imported **in-place** by BIOMERO.importer — the same way manual data imports work — so they end up in the importer's storage, not on the OMERO server volume. 3. **Permanently archived** in a dedicated ``.analyzed///`` folder, alongside the original zip, metadata CSV, and SLURM log file, for full traceability. This makes analysis results a first-class citizen of the same storage and permission model as incoming research data, and avoids a split between "imported data" and "analysis results". .. figure:: ./flow_diagram_ADI_import.png :alt: Analyzer result import flow via BIOMERO.importer :align: center :width: 80% Analysis results are moved to remote (importer) storage and then imported in-place by BIOMERO.importer, using the same pipeline as manual imports. How it works ------------ Script selection ~~~~~~~~~~~~~~~~ The BIOMERO worker selects the import script based on the ``IMPORTER_ENABLED`` environment variable: * ``IMPORTER_ENABLED=true`` → runs ``SLURM_Import_Results.py`` (importer-enabled path) * ``IMPORTER_ENABLED=false`` (default) → runs ``SLURM_Get_Results.py`` (classic OMERO API path) Both scripts are bundled with BIOMERO.scripts and are present on the OMERO server container. The biomeroworker reads the env-var at runtime and passes the correct script name to the workflow orchestrator. Permanent storage layout ~~~~~~~~~~~~~~~~~~~~~~~~ Results are written to the shared drive under the group's subfolder: .. code-block:: text / └── .analyzed/ └── / └── / ├── _out.zip # original results archive ├── _out/ # extracted result images + CSVs │ ├── data/out/ │ │ ├── result_image.tiff │ │ └── measurements.csv │ └── metadata.csv # workflow provenance metadata └── omero-.log # SLURM job log The ``.analyzed`` hidden directory distinguishes archived analysis results from raw incoming data. Each workflow run gets its own ``/`` directory, making multiple imports from the same workflow safe. Group base path resolution ~~~~~~~~~~~~~~~~~~~~~~~~~~ The ```` is resolved in order: 1. **Explicit group mapping** in ``biomero-config.json`` (same mappings used by the OMERO.biomero UI). If the user's OMERO group has a configured folder mapping, that folder is used. 2. **Fallback**: ``/`` where ``base_dir`` comes from the importer's ``settings.yml``. For example, group ``system`` with no explicit mapping → ``/data/system/``. The group is taken from the OMERO user's **active group at the time they launched the analysis workflow**. This means results always land in the group's designated storage area. .. _permission-requirements: Permission requirements ~~~~~~~~~~~~~~~~~~~~~~~ The BIOMERO worker process (running inside the ``biomeroworker`` container) must have **write permission** on the group folder in the mounted storage: * If explicit mappings are configured, write permission is required on the mapped path. * If no mapping exists, write permission is required to **create the group-named subfolder** under ``base_dir``. For example, if ``base_dir=/data`` and a user in group ``team-a`` runs their first workflow, the worker needs to be able to create ``/data/team-a/``. The default setup in NL-BIOMERO runs the biomeroworker as the ``omero-server`` user. Ensure this user (or its UID, typically ``999``) has write access to the relevant directories on the host/NAS. Upload order and in-place import ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ After copying result files to permanent storage, ``SLURM_Import_Results.py``: 1. Creates an **upload order** in the BIOMERO.importer tracking database (``INGEST_TRACKING_DB_URL``), pointing to the image files in ``.analyzed/…``. 2. Polls the database until the order reaches ``STAGE_IMPORTED`` or ``STAGE_INGEST_FAILED``. 3. Adds workflow metadata (key-value map annotations) to the imported images in OMERO once the import is confirmed. BIOMERO.importer picks up the order, imports the files in-place, and updates the stage in the tracking database. The biomeroworker sees the status change via polling and reports back to the user. Progress tracking ~~~~~~~~~~~~~~~~~ When ``track_workflows=True`` in ``slurm-config.ini``, task status updates are visible in the OMERO.biomero web interface: * ``IMPORTING`` — worker has written files to remote storage and submitted the upload order. * ``IMPORTED`` — all upload orders confirmed as successfully imported by BIOMERO.importer. * ``IMPORT_FAILED`` — one or more upload orders failed; check biomero-importer logs. Infrastructure requirements ---------------------------- Shared storage mount ~~~~~~~~~~~~~~~~~~~~~ The ``biomeroworker``, the ``biomero-importer``, and the ``omeroserver`` containers all need to mount the **same remote storage volume at the same path**: .. list-table:: :header-rows: 1 :widths: 30 40 30 * - Container - Purpose - Mount * - ``biomeroworker`` - Write analysis results to ``.analyzed/`` - ``./web/L-Drive:/data`` * - ``biomero-importer`` - Read files, perform in-place import - ``./web/L-Drive:/data`` * - ``omeroserver`` - Resolve symlinks from in-place import - ``./web/L-Drive:/data`` The path must be identical in all three containers. If using a NAS/NFS mount on the host, bind-mount the same host path into each container. Environment variables for biomeroworker ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The following env-vars must be set on the biomeroworker (in addition to the standard BIOMERO ones): .. list-table:: :header-rows: 1 :widths: 35 55 10 * - Variable - Purpose - Required * - ``IMPORTER_ENABLED`` - Switches the import script to ``SLURM_Import_Results.py`` - Yes * - ``INGEST_TRACKING_DB_URL`` - PostgreSQL URL for the importer tracking database (same DB as BIOMERO.importer) - Yes * - ``SQLALCHEMY_URL`` - PostgreSQL URL for BIOMERO event-sourcing (workflow tracking) - Yes (existing) Volume mounts for biomeroworker ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Two additional volume mounts are required so the worker can read the importer configuration and the group-mapping JSON: .. code-block:: yaml volumes: # ... existing mounts ... - "./config/biomero-importer:/opt/omero/server/config-importer:ro" - "./web/biomero-config.json:/opt/omero/server/biomero-config.json:ro" * ``config/biomero-importer`` — contains ``settings.yml`` for BIOMERO.importer (same directory mounted into the ``biomero-importer`` container). The worker reads ``base_dir`` from this file. * ``web/biomero-config.json`` — contains group-folder mappings used by OMERO.biomero. The worker uses these to resolve the correct destination path for each group. Docker Compose changes ----------------------- The ``biomeroworker`` service in ``docker-compose.yml`` now looks like: .. code-block:: yaml biomeroworker: build: context: ./ dockerfile: ./biomeroworker/Dockerfile args: BIOMERO_VERSION: ${BIOMERO_VERSION} BIOMERO_IMPORTER_VERSION: ${BIOMERO_IMPORTER_VERSION} # new environment: # ... existing ... INGEST_TRACKING_DB_URL: postgresql+psycopg2://${BIOMERO_POSTGRES_USER}:${BIOMERO_POSTGRES_PASSWORD}@database-biomero:5432/${BIOMERO_POSTGRES_DB} # new IMPORTER_ENABLED: ${IMPORTER_ENABLED} volumes: # ... existing ... - "./config/biomero-importer:/opt/omero/server/config-importer:ro" # new - "./web/biomero-config.json:/opt/omero/server/biomero-config.json:ro" # new The Dockerfile itself installs ``biomero-importer`` (Python library, not the service) as a separate layer so that it is available to the worker's script execution environment: .. code-block:: dockerfile RUN $VIRTUAL_ENV/bin/python -m pip install --ignore-requires-python \ biomero-importer==${BIOMERO_IMPORTER_VERSION} Non-Docker deployments ----------------------- If you are not using the NL-BIOMERO Docker Compose stack, you need to replicate the same setup manually on the BIOMERO worker host: 1. **Install biomero-importer Python library** in the same virtual environment as ``biomero``: .. code-block:: bash pip install biomero-importer== 2. **Set environment variables** before starting the OMERO processor: .. code-block:: bash export IMPORTER_ENABLED=true export INGEST_TRACKING_DB_URL=postgresql+psycopg2://user:pass@db-host:5432/biomero 3. **Mount / configure file-system access** so the worker process can write to the same storage location that BIOMERO.importer monitors. 4. **Place configuration files** at the paths the script expects: * ``settings.yml`` → ``/opt/omero/server/config-importer/settings.yml`` (or set ``IMPORTER_CONFIG_PATH`` if you customise this) * ``biomero-config.json`` → ``/opt/omero/server/biomero-config.json`` 5. **Permissions**: ensure the worker process UID has write access to the group folders as described in `Permission requirements`_ above. Troubleshooting --------------- Results end up on OMERO server storage instead of remote storage ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Check that ``IMPORTER_ENABLED=true`` is set on the biomeroworker container/process (not just on omeroweb or biomero-importer). * Verify the biomero-importer Python library is installed in the worker's virtual environment: .. code-block:: bash docker-compose exec biomeroworker \ /opt/omero/server/venv3/bin/python -c "import biomero_importer; print('ok')" Upload order created but import never completes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Confirm biomero-importer container is running and healthy. * Check that the ``INGEST_TRACKING_DB_URL`` on both the biomeroworker and the biomero-importer service point to the **same** database. * Check biomero-importer logs for errors: .. code-block:: bash docker-compose logs biomero-importer Files not found at expected path ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Confirm all three containers (biomeroworker, biomero-importer, omeroserver) mount the storage at the **same path** (``/data`` by default); a mismatch will cause "file not found" errors at import time. * Check that the group-folder mapping in ``biomero-config.json`` points to a path that exists under the mounted storage root. PermissionError writing to ``.analyzed`` folder ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Verify the worker process UID has write access to the group base path on the host storage. * For new groups (first workflow run), the worker also needs permission to **create** the group subfolder under ``base_dir`` if it does not exist yet. Import polling times out ~~~~~~~~~~~~~~~~~~~~~~~~~ * The default timeout is 3600 seconds (1 hour). For very large datasets the importer may take longer. * This timeout can be adjusted; please open an issue or customise ``SLURM_Import_Results.py`` for your site. Related documentation --------------------- * :doc:`biomero-importer` — BIOMERO.importer container reference * :doc:`biomeroworker` — BIOMERO worker container reference * :doc:`database` — Shared PostgreSQL database * `BIOMERO.scripts repository `_