Cellprofiler tutorial

Cellprofiler is already known for excellent interoperability with OMERO. You can directly load images into the cellprofiler pipelines.

Cellprofiler also has options to run in batch mode and headless, for analyzing big data on compute clusters, like we want as well.

However, for our purposes, this is insufficient, as we want to run it from OMERO, and on a compute cluster that only has SSH access.

In this tutorial I will show you how to add a cellprofiler pipeline as a workflow to OMERO and Slurm, with this client library.

0. Prerequisite: OMERO, Slurm and biomero.

We assume you have these 3 components setup and connected. If not, follow the main README first.

1. Grab the data and pipeline

We want to try something ready-made, and we like spots here at the AMC.

So let’s grab this spot-counting example from the cellprofiler website: https://github.com/tischi/cellprofiler-practical-NeuBIAS-Lisbon-2017/blob/master/practical-handout.md

2. Try the pipeline locally

UI

It is always a good idea to test your algorithms locally before jumping to remote compute. You can walk through the readme, or open the PLA-dot-counting-with-speckle-enhancement.cpproj. It seems to be a bit older, so we have to fix the threshold ((0.0 to 0.0) and change the input to our local file location.

Export pipeline file only

Cellprofiler works with both .cpproj and .cppipe. The project version hardcodes the filepaths in there, which we don’t want. So go to File > Export > Pipeline and save this as a .cppipe file.

Another bonus is that .cppipe is human-readable and editable. Important later on.

headless

After it works, let’s try it headless too:

./cellprofiler.exe -c -p '<path-to>\PLA-dot-counting-with-speckle-enhancement.cppipe' -o '<path-to>\cellprofiler_results' -i '<path-to>\PLA_data

Here we provide the input images (-i), the output folder (-o), the project (-p) and headless mode (-c).

See this blog for more info on the commandline parameters.

3. Upload the data to OMERO

Let’s make a screen out of these 8 wells, for fun.

  • Open the importer.

  • Since there is no screen metadata in the files, first create a project and dataset in OMERO.

  • Import the PLA_data folder there.

  • Go to the Web UI.

  • Select the new dataset.

  • Activate script Dataset to Plate (under omero/util_scripts/).

    • Fill in 8 wells per row (optional)

    • Screen: PLA_data

  • Now we have a plate with 8 wells in a screen in OMERO.

4. Package the cellprofiler in a FAIR package

To create a FAIR workflow, let’s follow the steps from Biaflows for creating a new workflow, as they explained it quite well already: https://neubias-wg5.github.io/creating_bia_workflow_and_adding_to_biaflows_instance.html

We just ignore some parts specific to the BIAFLOWS server, like adding as a trusted source. We will add the workflow to OMERO and Slurm instead, as a final step.

0. Create a workflow Github repository

To kickstart, we can reuse some of the workflow setup for CellProfiler from Neubias github.

You can follow along, or just use my version at the end (https://github.com/TorecLuik/W_SpotCounting-CellProfiler)

  • Login/create an account on Github

  • Go to link above.

  • Go to Use this template

    • Create a new repository

      • Name it W_SpotCounting-CellProfiler

      • Keep it Public

  • Clone your new repository locally

    • Code > Clone > HTTPS > Copy

    • git clone https://github.com/<...>/W_SpotCounting-CellProfiler.git

  • Open the folder in your favorite editor

  • Copy the project we want to this folder e.g. PLA-dot-counting-with-speckle-enhancement.cpproj

a. Create a Dockerfile for cellprofiler

The Dockerfile installs our whole environment.

We want:

  1. Cellprofiler

  2. Cytomine/Biaflows helper libraries (for Input/Output)

  3. Our workflow files:

  • wrapper.py (the logic to run our workflow)

  • descriptor.json (the metadata of our workflow)

  • *.cppipe (our cellprofiler pipeline)

Now it turns out that this Dockerfile uses an old version of CellProfiler (with Python 2). We want the newest one, so I rewrote the Dockerfile:

Our new/changed Dockerfile
FROM cellprofiler/cellprofiler

Instead of installing cellprofiler manually, it turns out they host containers images themselves, so let’s reuse those.

# Install Python3.7
RUN apt-get update && apt-get install -y python3.7 python3.7-dev python3.7-venv
RUN python3.7 -m pip install --upgrade pip && python3.7 -m pip install Cython

This cellprofiler image is quite modern, but we need an older Python to work with the Cytomine/Biaflows libraries. So we Install Python3.7 (and Cython package).

# ------------------------------------------------------------------------------
# Install Cytomine python client
RUN git clone https://github.com/cytomine-uliege/Cytomine-python-client.git && \
    cd Cytomine-python-client && git checkout tags/v2.7.3 && \ 
    python3.7 -m pip install . && \
    cd .. && \
    rm -r Cytomine-python-client

# ------------------------------------------------------------------------------
# Install BIAFLOWS utilities (annotation exporter, compute metrics, helpers,...)
RUN apt-get update && apt-get install libgeos-dev -y && apt-get clean
RUN git clone https://github.com/Neubias-WG5/biaflows-utilities.git && \
    cd biaflows-utilities/ && git checkout tags/v0.9.1 && python3.7 -m pip install .

# install utilities binaries
RUN chmod +x biaflows-utilities/bin/*
RUN cp biaflows-utilities/bin/* /usr/bin/ && \
    rm -r biaflows-utilities

These 2 parts install specific versions of the biaflows library and Cytomine library with Python 3.7.

# ------------------------------------------------------------------------------
# Add repository files: wrapper, command and descriptor
RUN mkdir /app
ADD wrapper.py /app/wrapper.py
ADD PLA-dot-counting-with-speckle-enhancement.cppipe /app/PLA-dot-counting-with-speckle-enhancement.cppipe
ADD descriptor.json /app/descriptor.json

ENTRYPOINT ["python3.7","/app/wrapper.py"]

Finally we add our own workflow to /app folder:

  • wrapper.py

  • .cppipe

  • descriptor.json

And we tell the image to call wrapper.py with python3.7 when we start it up using an ENTRYPOINT. This also forwards commandline parameters that you provide to the wrapper.py script, e.g. workflow parameters.

b. Setup the metadata in descriptor.json

We actually don’t have any input parameters (except the default input/output) at this moment. Look at this extra chapter for more info on how to approach that.

So we can just use the basic descriptor.json that was given and remove the last 2 non-cytomine parameters. Mainly, update the name, description and where we will publish the container (your new dockerhub account).

Example full json
{
  "name": "SpotCounting-CellProfiler",
  "description": "Workflow for spot counting in CellProfiler",
  "container-image": {
    "image": "torecluik/w_spotcounting-cellprofiler",
    "type": "singularity"
  },
  "command-line": "python wrapper.py CYTOMINE_HOST CYTOMINE_PUBLIC_KEY CYTOMINE_PRIVATE_KEY CYTOMINE_ID_PROJECT CYTOMINE_ID_SOFTWARE",
  "inputs": [
    {
      "id": "cytomine_host",
      "value-key": "@ID",
      "command-line-flag": "--@id",
      "name": "BIAFLOWS host",
      "set-by-server": true,
      "optional": false,
      "type": "String"
    },
    {
      "id": "cytomine_public_key",
      "value-key": "@ID",
      "command-line-flag": "--@id",
      "name": "BIAFLOWS public key",
      "set-by-server": true,
      "optional": false,
      "type": "String"
    },
    {
      "id": "cytomine_private_key",
      "value-key": "@ID",
      "command-line-flag": "--@id",
      "name": "BIAFLOWS private key",
      "set-by-server": true,
      "optional": false,
      "type": "String"
    },
    {
      "id": "cytomine_id_project",
      "value-key": "@ID",
      "command-line-flag": "--@id",
      "name": "BIAFLOWS project ID",
      "set-by-server": true,
      "optional": false,
      "type": "Number"
    },
    {
      "id": "cytomine_id_software",
      "value-key": "@ID",
      "command-line-flag": "--@id",
      "name": "BIAFLOWS software ID",
      "set-by-server": true,
      "optional": false,
      "type": "Number"
    }
  ],

  "schema-version": "cytomine-0.1"
}

c. Update the command in wrapper.py

So the wrapper gets called when the container starts. This is where we ‘wrap’ our pipeline by handling input/output and parameters. We also have to make sure that we call the pipeline correctly here.

Our changes to the wrapper

This first part we keep the same: the BiaflowsJob will parse the commandline parameters for us and provide those as bj.parameter.<param_name> if we did want them. But we don’t use any right now.

def main(argv):
    base_path = "{}".format(os.getenv("HOME")) # Mandatory for Singularity
    problem_cls = CLASS_OBJSEG

    with BiaflowsJob.from_cli(argv) as bj:
        bj.job.update(status=Job.RUNNING, progress=0, statusComment="Initialisation...")
        # 1. Prepare data for workflow
        in_imgs, gt_imgs, in_path, gt_path, out_path, tmp_path = prepare_data(problem_cls, bj, is_2d=True, **bj.flags)

The second part (where we call our pipeline) we can simplify a bit, as we don’t need to parse parameters for cellprofiler. See later for how to start handling that.

We specifically name the cppipe that we added to /app, and we use subprocess.run(...) to execute our cellprofiler headless on the commandline: cellprofiler -c -r -p ... -i ... -o ... -t.

In theory we could also use the cellprofiler python package here, for more control. But in general, we can run any commandline program with subprocess.run, so this wrapper will look similar for most workflows.

        pipeline = "/app/PLA-dot-counting-with-speckle-enhancement.cppipe"

        # 2. Run CellProfiler pipeline
        bj.job.update(progress=25, statusComment="Launching workflow...")
        
        ## If we want to allow parameters, we have to parse them into the pipeline here
        # mod_pipeline = parse_cellprofiler_parameters(bj, pipeline, tmp_path)
        mod_pipeline = pipeline

        shArgs = [
            "cellprofiler", "-c", "-r", "-p", mod_pipeline,
            "-i", in_path, "-o", out_path, "-t", tmp_path,
        ]
        status = run(" ".join(shArgs), shell=True)

Finally, we don’t change much to the rest of the script and just handle the return code. 0 means success, so then we just log to the logfile.

There is some built-in logic for Biaflows, like uploading results and metrics. We keep it in for the logs, but they are essentially a no-op because we will provide the command-line parameters --local and -nmc (no metric computation).

Full changes can be found here

d. Run locally

Now that we have a docker, we can run this locally or anywhere that we have docker installed, without the need for having the right version of cellprofiler, etc. Let’s try it out:

  1. Setup your data folder like this:

  • PLA as main folder

    • PLA_data with the 8 images, as subfolder

    • out as empty subfolder

    • gt as empty subfolder

  1. Build a container: docker build -t spotcounting-cp . (Note the . is important, it means this folder)

  2. Run the container on the PLA folder like this: docker run --rm -v <my-drive>\PLA\:/data-it spotcounting-cp --local --infolder /data-it/PLA_data --outfolder /data-it/out --gtfolder /data-it/gt  -nmc

This should work the same as before, with a bit of extra logging thrown in. Except now, we didn’t need to have cellprofiler installed! Anyone with Docker (or Podman or Singularity) can run this workflow now.

e. Publish to GitHub and DockerHub

So how do other people get to use our workflow?

  1. We publish the source online on Github:

  • Commit to git: git commit -m 'Update with spotcounting pipeline' -a

  • Push to github: git push

  • Setup automated release to Dockerhub:

    • First, create a free account on Dockerhub if you don’t have one

    • On Dockerhub, login and create a new Access Token via Account Settings / Security. Name it Github or something. Copy this token (to a file).

    • Back on your Github repository, add 2 secrets by going to Settings / Secrets and variables / Actions / New repository secret

      • First, add Name: DOCKERHUB_USERNAME and Secret: <your-dockerhub-username>

      • Also, add Name: DOCKERHUB_TOKEN and Secret: <token-that-you-copied>

  • Now, tag and release this as a new version on Github (and automatically Dockerhub):

    • Pretty easy to do from Github page: Releases > new release.

    • Add a tag like v1.0.0.

    • Now, the Github Action Docker Image CI will build the container for you and publish it on Dockerhub via the credentials you provided. This will take a few minutes, you can follow along at the Actions tab.

    • Now you can verify that it is available online: https://hub.docker.com/u/your-dockerhub-user

  • Great! now everybody (with internet access) can pull your workflow image and run it locally: docker run --rm -v <my-drive>\PLA\:/data-it <your-dockerhub-user>/w_spotcounting-cellprofiler:v1.0.0 --local --infolder /data-it/PLA_data --outfolder /data-it/out --gtfolder /data-it/gt -nmc

And this is what we will make OMERO do on the Slurm cluster next.

Optional: Manually publish the image on Dockerhub:

  • First, create an account on Dockerhub if you don’t have one

  • Login locally on the commandline to this account too: docker login

  • (Optional) Build your latest docker image if you didn’t do that yet (docker build -t spotcounting-cp .).

  • Tag your local Docker image with a new tag to match this Dockerhub account and release: docker tag spotcounting-cp:latest <your-dockerhub-user>/w_spotcounting-cellprofiler:v1.0.0

  • Push your tagged image to Dockerhub: docker push <your-dockerhub-user>/w_spotcounting-cellprofiler:v1.0.0

  • Now you can verify that it is available online: https://hub.docker.com/u/your-dockerhub-user

E.g. mine can be found @ https://hub.docker.com/r/torecluik/w_spotcounting-cellprofiler/tags

  • Great! now everybody (with internet access) can pull your workflow image and run it locally: docker run --rm -v <my-drive>\PLA\:/data-it <your-dockerhub-user>/w_spotcounting-cellprofiler:v1.0.0 --local --infolder /data-it/PLA_data --outfolder /data-it/out --gtfolder /data-it/gt -nmc

And this is what we will make OMERO do on the Slurm cluster next.

5. Add this workflow to the OMERO Slurm Client

  1. Let’s adjust the slurm-config.ini on our OMERO processor server.

In the [MODEL] section we add our new workflow:

# -------------------------------------
# CELLPROFILER SPOT COUNTING
# -------------------------------------
# The path to store the container on the slurm_images_path
cellprofiler_spot=cellprofiler_spot
# The (e.g. github) repository with the descriptor.json file
cellprofiler_spot_repo=https://github.com/TorecLuik/W_SpotCounting-CellProfiler/tree/v1.0.0
# The jobscript in the 'slurm_script_repo'
cellprofiler_spot_job=jobs/cellprofiler_spot.sh

Note that we link to the v1.0.0 specifically.

When using a new version, like v1.0.1, update this config again. For example, I had a bugfix, so I released my workflow to v1.0.1, using the release + push + update steps.

For me, updating is done by rebuilding my docker container for the processor worker: docker-compose up -d --build omeroworker-5

  1. and recreate the Slurm environment:

  • Run SlurmClient.from_config(init_slurm=true) on the OMERO processor server.

E.g. using this omero script
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Original work Copyright (C) 2014 University of Dundee
#                                   & Open Microscopy Environment.
#                    All Rights Reserved.
# Modified work Copyright 2022 Torec Luik, Amsterdam UMC
# Use is subject to license terms supplied in LICENSE.txt
#
# Example OMERO.script to instantiate a 'empty' Slurm connection.

import omero
import omero.gateway
from omero import scripts
from omero.rtypes import rstring, unwrap
from biomero import SlurmClient
import logging

logger = logging.getLogger(__name__)


def runScript():
    """
    The main entry point of the script
    """

    client = scripts.client(
        'Slurm Init',
        '''Will initiate the Slurm environment for workflow execution.

        You can provide a config file location, 
        and/or it will look for default locations:
        /etc/slurm-config.ini
        ~/slurm-config.ini
        ''',
        scripts.Bool("Init Slurm", grouping="01", default=True),
        scripts.String("Config file", optional=True, grouping="01.1",
                       description="The path to your configuration file. Optional."),
        namespaces=[omero.constants.namespaces.NSDYNAMIC],
    )

    try:
        message = ""
        init_slurm = unwrap(client.getInput("Init Slurm"))
        if init_slurm:
            configfile = unwrap(client.getInput("Config file"))
            if not configfile:
                configfile = ''
            with SlurmClient.from_config(configfile=configfile,
                                         init_slurm=True) as slurmClient:
                slurmClient.validate(validate_slurm_setup=True)
                message = "Slurm is setup:"
                models, data = slurmClient.get_all_image_versions_and_data_files()
                message += f"Models: {models}\nData:{data}"

        client.setOutput("Message", rstring(str(message)))

    finally:
        client.closeSession()


if __name__ == '__main__':
    runScript()

Now your Slurm cluster has

  • your image ‘v1.0.0’.

  • And also a job-script for Slurm, automatically generated (unless you changed that behaviour in the slurm-config).

6. Add a OMERO script to run this from the Web UI

  1. select a screen / dataset

  2. select workflow

  3. run workflow!

  4. check progress

  5. Import resulting data

I have created several OMERO scripts using this library, and the run_workflow can do this for us. It will attach the results as a zipfile attachment to the screen. Perhaps we can integrate with OMERO.Tables in the future.

Extra: How to add workflow parameters to cellprofiler?

So normally, adding workflow parameters to your commandline in wrapper.py is easy, like this:

        # Add here the code for running the analysis script
        #"--chan", "{:d}".format(nuc_channel)
        cmd = ["python", "-m", "cellpose", "--dir", tmp_path, "--pretrained_model", "nuclei", "--save_tif", "--no_npy", "--chan", "{:d}".format(nuc_channel), "--diameter", "{:f}".format(bj.parameters.diameter), "--cellprob_threshold", "{:f}".format(bj.parameters.prob_threshold)]
        status = subprocess.run(cmd)

Here we add bj.parameters.diameter (described here) as "--diameter", "{:f}".format(bj.parameters.diameter).

However, cellprofiler does not support changing pipeline parameters from the commandline. Maybe it will in the future. For now, we have 3 options:

  1. Edit the .cppipe file and override our parameters there automatically

  2. Use the Python cellprofiler library in wrapper.py and open and edit the pipeline.

  3. Add an extra python script that does number 2, which we call from the wrapper.py and which does accept commandline arguments.

For 1., this is where parseCPparam function comes in (in wrapper.py). I have updated it a bit in my version. It matches the name in descriptor.json literally with the same string in .cppipe, and then changes the values to the new ones provided on the commandline. However, if you use the same module twice (like in our example pipeline), it will overwrite both of them with the same value. In our example, that does not work properly, e.g. the size of a nucleus should NOT be the same as the size of a spot.

Options 2 and 3 are an exercise for the reader. There is an example in the OMERO docs of using the CellProfiler Python API: Getting started with CellProfiler and OMERO.

Extra 2: We should add a LICENSE

See the importance of a license here:

You’re under no obligation to choose a license. However, without a license, the default copyright laws apply, meaning that you retain all rights to your source code and no one may reproduce, distribute, or create derivative works from your work. If you’re creating an open source project, we strongly encourage you to include an open source license.

So, we are essentially not allowed to make all these changes and use their template without a license. We will just assume we have a license as they explain all these steps in their docs. To make this easier for the future, always add a license. I asked them to add one to the example workflows.

A nice permissive default is Apache 2.0. It allows people to generally use it however they want, private / commercial / open / closed etc.

But there is also copyleft, where people can only adapt your code if they also keep the same license on all their code; e.g. GNU. That is a bit more restrictive.


CellExpansion tutorial

Introduction

Different type of aggregates of proteins can form inside a nucleus or inside the cytoplasm of a cell. In our example, we have aggregates (spots) outside of the nucleus and we want to quantify these per cell.

1. Import data to OMERO

Import data as you would normally.

We use this image ‘Cells.tif’, shown as part of this png with a mask here:

Nuclei label image

2. Extract masks with Cellpose

This process is actually 2 steps: we want the nuclei masks and also the aggregates masks. Luckily these were stained with different colors and are available in different channels:

  • Channel 3 = Nuclei

  • Channel 2 = Aggregates

So we can run 2 CellPose workflows on OMERO and retrieve both masks. We store them as images in a new dataset and particularly name them: “{original_file}NucleiLabels.{ext}” and “{original_file}GranulesLabels.{ext}”.

Combine both in the same dataset afterward, this will be our input dataset for the CellExpansion algorithm.

3. CellExpansion

To estimate the amount of aggregates per cell, we actually need the cytoplasm in our example. Then we can calculate overlap.

One could segment the cytoplasm, especially in this image (its just channel 1), but we have a Python script that does this algorithmically instead for the fun of it.

We apply the CellExpansion algorithm on the nuclei mask and estimate the full reach of the cells with new masks.

4 images showing cell expansion

For this, we have to first add it to OMERO: We could just add the Python code to a OMERO job script. But then the Processor needs to have the right Python libraries installed. Instead, we should package it in a lightweight container with the correct Python environment. This in turn makes the workflow more FAIR.

  1. I made this workflow container for it: github repo.

  2. Release a version and publish a docker image

  3. Add the workflow to Slurm and OMERO:

# -------------------------------------
# CELLEXPANSION SPOT COUNTING
# -------------------------------------
# The path to store the container on the slurm_images_path
cellexpansion=cellexpansion
# The (e.g. github) repository with the descriptor.json file
cellexpansion_repo=https://github.com/TorecLuik/W_CellExpansion/tree/v1.0.1
# The jobscript in the 'slurm_script_repo'
cellexpansion_job=jobs/cellexpansion.sh
  1. Run the workflow on our Nuclei mask. Output the new mask back as image in a new dataset.

Calculate overlap

We calculate overlap with another very short Python script. It outputs the overlap counts of 2 masks.

Example original code:

imCellsCellLabels=imread('images/CellsNucleiLabels.tif',cv2.IMREAD_ANYDEPTH)
imCellsGranulesLabels=imread('images/CellsGranulesLabels.tif',cv2.IMREAD_ANYDEPTH)
numCells=np.max(imCellsCellLabels)
CellNumGranules=np.zeros([numCells,2],dtype=np.int16)
granulesStats=pd.DataFrame(measure.regionprops_table(imCellsGranulesLabels, properties=('centroid',)))
granulesStatsnp=np.ndarray.astype(np.round(granulesStats.to_numpy()),dtype=np.uint16)
granulesStatsInCellLabel=imCellsCellLabels[granulesStatsnp[:,0],granulesStatsnp[:,1]]
for i in range(1,numCells+1):
    CellNumGranules[i-1,0]=np.count_nonzero(granulesStatsInCellLabel==i)
pd.DataFrame(CellNumGranules,columns=['Estimated']).style

I added this as a separate workflow at W_CountMaskOverlap.

  1. add the workflow to config.

  2. make one dataset with pairs of our mask files. We name them the same as the original image, but with an extra suffix. E.g. Cells_CellExpansion.tif and Cells_Aggregates.tif.

  3. Call the new workflow on this dataset / image selection, and supply the suffixes chosen (“_CellExpansion” and “_Aggregates”) as parameter. Then make sure to upload the result of the workflow as a zip, as it will be a csv file.

  4. Check the resulting csv for a count of aggregates per cell!

Workflow management?

Of course, this required knowledge and manual manipulation of renaming images and supplying that metadata to the next workflow. Ideally you would be able to string singular workflows together with Input/Output like using NextFlow or Snakemake. We are looking into it for a future version.

Extra

Out of memory

While running CellPose on the Aggregates, my job ran out of memory. So I had to bump up the default memory used by the generated job scripts, in slurm_config.ini:

# -------------------------------------
# CELLPOSE SEGMENTATION
# -------------------------------------
# The path to store the container on the slurm_images_path
cellpose=cellpose
# The (e.g. github) repository with the descriptor.json file
cellpose_repo=https://github.com/TorecLuik/W_NucleiSegmentation-Cellpose/tree/v1.2.7
# The jobscript in the 'slurm_script_repo'
cellpose_job=jobs/cellpose.sh
# Override the default job values for this workflow
# Or add a job value to this workflow
# If you don't want to override, comment out / delete the line.
# Run CellPose Slurm with 10 GB GPU
cellpose_job_gres=gpu:1g.10gb:1
# Run CellPose Slurm with 15 GB CPU memory
cellpose_job_mem=15GB

I added the ...mem=15GB configuration, which will add mem=15GB to the Slurm job command from now on for CellPose workflows. No need to restart the server, these changes get picked up whenever we start a new client from this config file (which is when we start a new script).

So after updating that ini file, I kickstart the workflow for channel 2 again and this time it works and returns the mask.


Local Slurm tutorial

Introduction

This library is meant to be used with some external HPC cluster using Slurm, to offload your (OMERO) compute to servers suited for it.

However, if you don’t have ready access (yet) to such a cluster, you might want to spin some test environment up locally and connect your (local) OMERO to it. This is what we will cover in this tutorial.

0. Requirements

To follow this tutorial, you need:

  • Git

  • Docker (Desktop for Windows)

  • OMERO Insight

  • > 18GB memory

  • > 8 CPU cores

Warning: I tested with Windows here, and I’ve heard a few issues with (command-line) Linux:

  1. host.docker.internal address does not work to communicate via the host machine on (command-line) Linux.

  2. If you don’t run Docker as root, it won’t have access to the mounted SSH keys because of file rights.

    • As an example, we run a setup on (rootless) Podman where we add SSH keys as (podman) secrets instead.

System requirements could be less, but then you have to change some configurations for Slurm.

I provide ready-to-go TL;DR, but in the details of each chapter I walk through the steps I took to make these containers ready.

1. Setup Docker containers for Slurm

TL;DR:

  • Clone my example slurm-docker-cluster locally: here

Details

Always a good idea to stand on the shoulders of giants, so we want to spin up a ready-made Slurm container cluster. Here on Github is a nice example with a open source license. It uses Docker containers and Docker Compose to easily orchestrate their interactions.

This setup will spin up a few separate containers (on the same computer) to make 1 slurm cluster:

  • slurmdbd, the Slurm DataBase Daemon

  • slurmctld, the Slurm Control Daemon, our entrypoint

  • mysql, the actual database

  • c1 and c2, 2 compute nodes

Note: these compute nodes are not setup to use GPU, that is a whole other challenge that we will not get into. But even on CPU, Slurm can be useful for parallel processing and keeping track of a queue of jobs.

So let’s clone this repository to our local system:

git clone https://github.com/giovtorres/slurm-docker-cluster.git .

You can build and run these containers as described in their README. Then you can already play around with Slurm that way, so try it out!

However, we are missing an ingredient: SSH access!

2. Add SSH access

TL;DR:

  1. Copy your public SSH key (id_rsa.pub) into this git folder (it will get copied into the Docker image when you build it)

  2. Add a SSH config file, store it as ~/.ssh/config (no extension):

Host localslurm
	HostName host.docker.internal
	User slurm
	Port 2222
	IdentityFile ~/.ssh/id_rsa
	StrictHostKeyChecking no
Details

We need to setup our library with SSH access between OMERO and Slurm, but this is not built-in to these containers yet (because Docker actually has a built-in alternative, docker exec).

Luckily, people have already worked on SSH access into containers too, like here. So let’s borrow their OpenSSH setup and add it to the Dockerfile of the Slurm Control Daemon (slurmctld):

======= 2a. Make a new Dockerfile for the slurmctld =======

We want to combine the 2 Dockerfiles. However, one is ubuntu and the other is rockylinux. The biggest difference is that rockylinux uses the yum package manager to install software, instead of apt. We will stick to the Slurm image as the base image and just add the OpenSSH on top of it.

Turns out, another difference is the use of systemctld causing all kinds of issues. So I spent the time to activate OpenSSH server on Rocky linux:

FROM rockylinux:8

... # all the Slurm stuff from original Dockerfile ... 

## ------- Setup SSH ------
RUN yum update && yum install  openssh-server initscripts sudo -y
# Create a user “sshuser” and group “sshgroup”
# RUN groupadd sshgroup && useradd -ms /bin/bash -g sshgroup sshuser
# Create sshuser directory in home
RUN mkdir -p /home/slurm/.ssh
# Copy the ssh public key in the authorized_keys file. The idkey.pub below is a public key file you get from ssh-keygen. They are under ~/.ssh directory by default.
COPY id_rsa.pub /home/slurm/.ssh/authorized_keys
# change ownership of the key file. 
RUN chown slurm:slurm /home/slurm/.ssh/authorized_keys && chmod 600 /home/slurm/.ssh/authorized_keys
# Start SSH service
# RUN service ssh start
# RUN /etc/init.d/sshd start
RUN /usr/bin/ssh-keygen -A
# Expose docker port 22
EXPOSE 22
CMD ["/usr/sbin/sshd","-D"]
# CMD ["slurmdbd"]

We have replaced the slurmdbd command (CMD) with our setup from sshdocker, starting a ssh daemon (sshd) with our SSH public key associated to the slurm user . This last part is important: to build this new version, you need to copy your public SSH key into this Docker image. This is performed in this line:

# Copy the ssh public key in the authorized_keys file. The idkey.pub below is a public key file you get from ssh-keygen. They are under ~/.ssh directory by default.
COPY id_rsa.pub /home/<user>/.ssh/authorized_keys

So, you need to add your id_rsa.pub public key to this directory, so Docker can copy it when it builds the image.

Turns out, we also need to change the entrypoint script:

... # other stuff from script

if [ "$1" = "slurmctld" ]
then
    echo "---> Starting the MUNGE Authentication service (munged) ..."
    gosu munge /usr/sbin/munged

    echo "---> Starting SSH Daemon (sshd) ..."
    # exec /usr/bin/ssh-keygen -A
    exec /usr/sbin/sshd -D &
    exec rm /run/nologin &
    exec chmod 777 /data &

    echo "---> Waiting for slurmdbd to become active before starting slurmctld ..."

    ... # other stuff from script

We added the command to start the SSH daemon on the CTLD here, where it is actually called. We also added some quick bugfixes to make the tutorial SSH work. If you still run into issues with permissions in /data, login as superuser and also apply write access again.

======= 2b. Tell Docker Compose to use the new Dockerfile for slurmctld =======

Currently, Docker Compose will spin up all containers from the same Dockerfile definition.

So we will change the Dockerfile for the slurmctld as defined in the docker-compose.yml, by replacing image with build:

slurmctld:
    # image: slurm-docker-cluster:${IMAGE_TAG:-21.08.6}
    # Build this image from current folder
    # Use a specific file: Dockerfile_slurmctld
    build: 
      context: ./
      dockerfile: Dockerfile_slurmctld
    command: ["slurmctld"]
    container_name: slurmctld
    hostname: slurmctld
    volumes:
      - etc_munge:/etc/munge
      - etc_slurm:/etc/slurm
      - slurm_jobdir:/data
      - var_log_slurm:/var/log/slurm
    expose:
      - "6817"
    ports:
      - "2222:22"
    depends_on:
      - "slurmdbd"

We also mapped port 22 (SSH) from the container to our localhost port 2222. So now we can connect SSH to our localhost and be forwarded to this Slurm container.

Test it out:

  1. Fire up the Slurm cluster:

docker-compose up -d --build
  1. SSH into the control node:

ssh -i C:\Users\<you>\.ssh\id_rsa slurm@localhost -p 2222 -o UserKnownHostsFile=/dev/null

This should connect as the slurm user to the control container on port 2222 (type yes to connect, we will fix promptless login later).

Last login: Tue Aug  8 15:48:31 2023 from 172.21.0.1
[slurm@slurmctld ~]$

Congratulations!

======= 2c. Add SSH config for simple login =======

But, we can simplify the SSH, and our library needs a simple way to login.

For this, add this config file as your ~/.ssh/config, no extension. See here for more information.

Of course, first update the values with those you used to SSH before, e.g.:

Host slurm
	HostName localhost
	User slurm
	Port 2222
	IdentityFile ~/.ssh/id_rsa
	StrictHostKeyChecking no

Then try it out: ssh slurm

======= StrictHostKeyChecking ======= Note that I added StrictHostKeyChecking no, as our Slurm container will have different keys all the time. A normal Slurm server likely does not, and won’t require this flag. This is also where we get our pretty warning from:

...> ssh slurm
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.

The host key changed =)

If you don’t add this flag, it will safe you from danger and deny access. Of course, that is not very useful for our tutorial.

3. Test Slurm

TL;DR:

  1. Spin up the Slurm cluster: docker-compose up -d --build

  2. SSH into the control node: ssh localslurm

  3. Start some filler jobs: sbatch --wrap="sleep 5 && hostname" &&  sbatch --wrap="sleep 5 && hostname" &&  sbatch --wrap="sleep 5 && hostname" &&  sbatch --wrap="sleep 5 && hostname"

  4. Check the progress: squeue

  5. Check some output, e.g. job 1: cat slurm-1.out

Details

Now connect via SSH to Slurm, change to /data (our fileserver shared between the Slurm nodes) and let’s see if Slurm works:

[slurm@slurmctld ~]$ cd /data
[slurm@slurmctld data]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
[slurm@slurmctld data]$

The queue is empty! Let’s fill it up with some short tasks:

[slurm@slurmctld data]$ sbatch --wrap="sleep 5 && hostname" &&  sbatch --wrap="sleep 5 && hostname" &&  sbatch --wrap="sleep 5 && hostname" &&  sbatch --wrap="sleep 5 && hostname"
Submitted batch job 5
Submitted batch job 6
Submitted batch job 7
Submitted batch job 8
[slurm@slurmctld data]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                 7    normal     wrap    slurm  R       0:01      1 c1
                 8    normal     wrap    slurm  R       0:01      1 c2
[slurm@slurmctld data]$

I fired off 4 jobs that take 2 seconds, so a few remained in the queue by the time I called for an update. You can also see they split over the 2 compute nodes c1 and c2.

The jobs wrote their stdout output in the current dir (/data, which is where permission issues might come in):

[slurm@slurmctld data]$ ls
slurm-3.out  slurm-4.out  slurm-5.out  slurm-6.out  slurm-7.out  slurm-8.out
[slurm@slurmctld data]$ cat slurm-7.out
c1
[slurm@slurmctld data]$ cat slurm-8.out
c2
[slurm@slurmctld data]$

They logged the hostname command, which returned c1 for some and c2 for others, as those were the hosts the compute was used from.

Now let’s connect OMERO to our Slurm!

4. OMERO & OMERO Slurm Client

Ok, now we need a OMERO server and a correctly configured OMERO Slurm Client.

TL;DR:

  1. Clone my example docker-example-omero-grid-amc locally: git clone -b processors https://github.com/TorecLuik/docker-example-omero-grid-amc.git

  2. Fire up the OMERO containers: docker-compose up -d --build

  3. Go to OMERO.web (localhost:4080), login root pw omero

  4. Upload some images (to localhost) with OMERO.Insight (e.g. Cells.tiff).

  5. In web, run the slurm/init_environment script (here)

Details

======= OMERO in Docker =======

You can use your own OMERO setup, but for this tutorial I will refer to a dockerized OMERO that I am working with: get it here.

git clone -b processors https://github.com/TorecLuik/docker-example-omero-grid-amc.git

Let’s (build it and) fire it up:

docker-compose up -d --build

======= OMERO web =======

Once they are running, you should be able to access web at localhost:4080. Login with user root / pw omero.

Import some example data with OMERO Insight (connect with localhost).

======= Connect to Slurm =======

This container’s processor node (worker-5) has already installed our omero-slurm-client library.

======= Add ssh config to OMERO Processor =======

Ok, so localhost works fine from your machine, but we need the OMERO processing server worker-5 to be able to do it too, like we did before.

By some smart tricks, we have mounted our ~/.ssh folder to the worker container, so it knows and can use our SSH settings and config.

However, we need to change the HostName to match one that the container can understand. localhost works fine from our machine, but not from within a Docker container. Instead, we need to use host.docker.internal (documentation).

Host slurm
	HostName host.docker.internal
	User slurm
	Port 2222
	IdentityFile ~/.ssh/id_rsa
	StrictHostKeyChecking no

Restart your OMERO cluster if you already started it: docker-compose down & docker-compose up -d --build

Ok, so now we can connect from within the worker-5 to our Slurm cluster. We can try it out:

...\docker-example-omero-grid> docker-compose exec omeroworker-5 /bin/bash
bash-4.2$ ssh slurm
Last login: Wed Aug  9 13:08:54 2023 from 172.21.0.1
[slurm@slurmctld ~]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
[slurm@slurmctld ~]$ exit
logout
Connection to host.docker.internal closed.
bash-4.2$ exit
exit

======= slurm-config.ini =======

Let us setup the library’s config file slurm-config.ini correctly.

Now, the omero-slurm-client library by default expects the Slurm ssh connection to be called slurm, but you can adjust it to whatever you named your ssh Host in config.

In this Docker setup, the config file is located at the worker-gpu folder and in the Dockerfile it is copied to /etc/, where the library will pick it up.

Let’s use these values:

[SSH]
# -------------------------------------
# SSH settings
# -------------------------------------
# The alias for the SLURM SSH connection
host=slurm
# Set the rest of your SSH configuration in your SSH config under this host name/alias
# Or in e.g. /etc/fabric.yml (see Fabric's documentation for details on config loading)

[SLURM]
# -------------------------------------
# Slurm settings
# -------------------------------------
# General settings for where to find things on the Slurm cluster.
# -------------------------------------
# PATHS
# -------------------------------------
# The path on SLURM entrypoint for storing datafiles
#
# Note: 
# This example is relative to the Slurm user's home dir
slurm_data_path=/data/my-scratch/data
# The path on SLURM entrypoint for storing container image files
#
# Note: 
# This example is relative to the Slurm user's home dir
slurm_images_path=/data/my-scratch/singularity_images/workflows
# The path on SLURM entrypoint for storing the slurm job scripts
#
# Note: 
# This example is relative to the Slurm user's home dir
slurm_script_path=/data/my-scratch/slurm-scripts

We have put all the storage paths on /data/my-scratch/ and named the SSH Host connection slurm.

The other values we can keep as default, except we don’t have a GPU, so let’s turn that off for CellPose:

# -------------------------------------
# CELLPOSE SEGMENTATION
# -------------------------------------
# The path to store the container on the slurm_images_path
cellpose=cellpose
# The (e.g. github) repository with the descriptor.json file
cellpose_repo=https://github.com/TorecLuik/W_NucleiSegmentation-Cellpose/tree/v1.2.7
# The jobscript in the 'slurm_script_repo'
cellpose_job=jobs/cellpose.sh
# Override the default job values for this workflow
# Or add a job value to this workflow
# For more examples of such parameters, google SBATCH parameters.
# If you don't want to override, comment out / delete the line.
# Run CellPose Slurm with 10 GB GPU
# cellpose_job_gres=gpu:1g.10gb:1
# Run CellPose Slurm with 15 GB CPU memory
cellpose_job_mem=15GB

The gres will request a 10GB GPU on the Slurm cluster, but we only set up CPU docker slurm.

We will also comment out some of the other algorithms, so we have to download less containers to our Slurm cluster and speed up the tutorial.

This brings us to the following configuration file:

[SSH]
# -------------------------------------
# SSH settings
# -------------------------------------
# The alias for the SLURM SSH connection
host=slurm
# Set the rest of your SSH configuration in your SSH config under this host name/alias
# Or in e.g. /etc/fabric.yml (see Fabric's documentation for details on config loading)

[SLURM]
# -------------------------------------
# Slurm settings
# -------------------------------------
# General settings for where to find things on the Slurm cluster.
# -------------------------------------
# PATHS
# -------------------------------------
# The path on SLURM entrypoint for storing datafiles
#
# Note: 
# This example is relative to the Slurm user's home dir
slurm_data_path=/data/my-scratch/data
# The path on SLURM entrypoint for storing container image files
#
# Note: 
# This example is relative to the Slurm user's home dir
slurm_images_path=/data/my-scratch/singularity_images/workflows
# The path on SLURM entrypoint for storing the slurm job scripts
#
# Note: 
# This example is relative to the Slurm user's home dir
slurm_script_path=/data/my-scratch/slurm-scripts
# -------------------------------------
# REPOSITORIES
# -------------------------------------
# A (github) repository to pull the slurm scripts from.
#
# Note: 
# If you provide no repository, we will generate scripts instead!
# Based on the job_template and the descriptor.json
#
# Example:
#slurm_script_repo=https://github.com/TorecLuik/slurm-scripts
slurm_script_repo=
# -------------------------------------
# Processing settings
# -------------------------------------
# General/default settings for processing jobs.
# Note: NOT YET IMPLEMENTED
# Note: If you need to change it for a specific case only,
# you should change the job script instead, either in OMERO or Slurm 


[MODELS]
# -------------------------------------
# Model settings
# -------------------------------------
# Settings for models/singularity images that we want to run on Slurm
#
# NOTE: keys have to be unique, and require a <key>_repo and <key>_image value as well.
#
# NOTE 2: Versions for the repo are highly encouraged! 
# Latest/master can change and cause issues with reproducability!
# We pickup the container version based on the version of the repository.
# For generic master branch, we pick up generic latest container.
# -------------------------------------
# CELLPOSE SEGMENTATION
# -------------------------------------
# The path to store the container on the slurm_images_path
cellpose=cellpose
# The (e.g. github) repository with the descriptor.json file
cellpose_repo=https://github.com/TorecLuik/W_NucleiSegmentation-Cellpose/tree/v1.2.7
# The jobscript in the 'slurm_script_repo'
cellpose_job=jobs/cellpose.sh
# Override the default job values for this workflow
# Or add a job value to this workflow
# For more examples of such parameters, google SBATCH parameters.
# If you don't want to override, comment out / delete the line.
# Run CellPose Slurm with 10 GB GPU
# cellpose_job_gres=gpu:1g.10gb:1
# Run CellPose Slurm with 15 GB CPU memory
cellpose_job_mem=15GB
# -------------------------------------
# # STARDIST SEGMENTATION
# # -------------------------------------
# # The path to store the container on the slurm_images_path
# stardist=stardist
# # The (e.g. github) repository with the descriptor.json file
# stardist_repo=https://github.com/Neubias-WG5/W_NucleiSegmentation-Stardist/tree/v1.3.2
# # The jobscript in the 'slurm_script_repo'
# stardist_job=jobs/stardist.sh
# -------------------------------------
# CELLPROFILER SEGMENTATION
# # -------------------------------------
# # The path to store the container on the slurm_images_path
# cellprofiler=cellprofiler
# # The (e.g. github) repository with the descriptor.json file
# cellprofiler_repo=https://github.com/Neubias-WG5/W_NucleiSegmentation-CellProfiler/tree/v1.6.4
# # The jobscript in the 'slurm_script_repo'
# cellprofiler_job=jobs/cellprofiler.sh
# -------------------------------------
# DEEPCELL SEGMENTATION
# # -------------------------------------
# # The path to store the container on the slurm_images_path
# deepcell=deepcell
# # The (e.g. github) repository with the descriptor.json file
# deepcell_repo=https://github.com/Neubias-WG5/W_NucleiSegmentation-DeepCell/tree/v.1.4.3
# # The jobscript in the 'slurm_script_repo'
# deepcell_job=jobs/deepcell.sh
# -------------------------------------
# IMAGEJ SEGMENTATION
# # -------------------------------------
# # The path to store the container on the slurm_images_path
# imagej=imagej
# # The (e.g. github) repository with the descriptor.json file
# imagej_repo=https://github.com/Neubias-WG5/W_NucleiSegmentation-ImageJ/tree/v1.12.10
# # The jobscript in the 'slurm_script_repo'
# imagej_job=jobs/imagej.sh
# # -------------------------------------
# # CELLPROFILER SPOT COUNTING
# # -------------------------------------
# The path to store the container on the slurm_images_path
cellprofiler_spot=cellprofiler_spot
# The (e.g. github) repository with the descriptor.json file
cellprofiler_spot_repo=https://github.com/TorecLuik/W_SpotCounting-CellProfiler/tree/v1.0.1
# The jobscript in the 'slurm_script_repo'
cellprofiler_spot_job=jobs/cellprofiler_spot.sh
# # -------------------------------------
# CELLEXPANSION SPOT COUNTING
# -------------------------------------
# The path to store the container on the slurm_images_path
cellexpansion=cellexpansion
# The (e.g. github) repository with the descriptor.json file
cellexpansion_repo=https://github.com/TorecLuik/W_CellExpansion/tree/v1.0.1
# The jobscript in the 'slurm_script_repo'
cellexpansion_job=jobs/cellexpansion.sh

======= Init environment =======

Now we go to OMERO web and run the slurm/init_environment script to apply this config and setup our Slurm. We will use the default location, no need to fill in anything, just run the script.

Slurm Init Busy

Slurm Init Done

Note, this will take a while, since it is downloading workflow docker images and building (singularity) containers from them.

Congratulations! We have setup workflows CellPose v1.2.7, Cellprofiler Spot v1.0.1 and CellExpansion v1.0.1. And there are no data files yet.

Let’s go run some segmentation workflow then!

5. Workflows!

TL;DR:

  1. In web, select your images and run script slurm/SLURM Run Workflow

    • Tick off E-mail box (not implemented in this Slurm docker setup)

    • For importing results, change 3a) Import into NEW Dataset to CellPose_Masks

    • For importing results, change 3b) Rename the imported images to {original_file}_cpmask.{ext}

    • Select cellpose, but tick off use_gpu off (sadly not implemented in this docker setup)

    • Click Run Script

  2. Check activity window (or get a coffee), it should take a few minutes (about 3m:30s for 4 256x256 images for me) and then say (a.o.): COMPLETE

    • Or it FAILED, in which case you should check all the details anyway and get your hands dirty with debugging! Or try less and smaller images.

  3. Refresh your Explore window, there should be a new dataset CellPose_Masks with a mask for every input image.

Details

So, I hope you added some data already; if not, import some images now.

Let’s run slurm/SLURM Run Workflow:

Slurm Run Workflow

You can see that this script recognized that we downloaded 3 workflows, and what their parameters are. For more information on this magic, follow the other tutorials.

Let’s select cellpose and click use gpu off (sadly). Tune the other parameters as you like for your images. Also, for output let’s select Import into NEW Dataset by filling in a dataset name: cellpose_images. Click Run Script.

Slurm Run Cellpose

Result: Job 1 is FAILED. Turns out, our Slurm doesn’t have the compute nodes to execute this operation.

======= Improve Slurm =======

Update the slurm.conf file in the git repository.

# COMPUTE NODES
NodeName=c[1-2] RealMemory=5120 CPUs=8 State=UNKNOWN

Here, 5GB and 8 CPU each should do the trick!

Rebuild the containers. Note that the config is on a shared volume, so we have to destroy that volume too (it took some headbashing to find this out):

docker-compose down --volumes 
docker-compose up --build

That should take you through connecting OMERO with a local Slurm setup.

Batching

Try slurm/SLURM Run Workflow Batched (here)[https://github.com/NL-BioImaging/omero-slurm-scripts/blob/master/workflows/SLURM_Run_Workflow_Batched.py] to see if there is any speedup by splitting your images over multiple jobs/batches.

We have installed 2 nodes in this Slurm cluster, so you could make 2 batches of half the images and get your results quicker. However we are also limited to compute 2 jobs in parallel, so smaller (than half) batches will just wait in the queue (with some overhead) and probably take longer in total.

Note that there is always overhead cost, so the speedup will not be linear. However, the more time is in compute vs overhead, the more gains you should get by splitting over multiple jobs / nodes / CPUs.

Let’s check on the Slurm node:

$ sacct --starttime "2023-06-13T17:00:00" --format Jobid,State,start,end,JobName%-18,Elapsed -n -X --endtime "now"

In my latest example, it was 1 minute (30%) faster to have 2 batches/jobs (32 & 33) vs 1 job (31):

31            COMPLETED 2023-08-23T08:41:28 2023-08-23T08:45:02 omero-job-cellpose   00:03:34

32            COMPLETED 2023-08-23T09:22:00 2023-08-23T09:24:27 omero-job-cellpose   00:02:27
33            COMPLETED 2023-08-23T09:22:03 2023-08-23T09:24:40 omero-job-cellpose   00:02:37

Google Cloud Slurm tutorial

Introduction

This library is meant to be used with some external HPC cluster using Slurm, to offload your (OMERO) compute to servers suited for it.

However, if you don’t have ready access (yet) to such a cluster, you might want to spin some test environment up in the Cloud and connect your (local) OMERO to it. This is what we will cover in this tutorial, specifically Google Cloud.

0. Requirements

To follow this tutorial, you need:

  • Git

  • Docker

  • OMERO Insight

  • A creditcard (but we’ll work with free credits)

I use Windows here, but it should work on Linux/Mac too. If not, let me know.

I provide ready-to-go TL;DR, but in the details of each chapter I walk through the steps I took to make these containers ready.

1. Setup Google Cloud for Slurm

TL;DR:

  1. Follow this tutorial from Google Cloud. Click ‘guide me’.

  2. Make a new Google Account to do this, with free $300 credits to use Slurm for a bit. This requires the creditcard (but no cost).

Details

So, we follow this tutorial and end up with a hpcsmall VM on Google Cloud.

However, we are missing an ingredient: SSH access!

2. Add SSH access

TL;DR:

  1. Add your public SSH key (~/.ssh/id_rsa.pub) to the Google Cloud instance, like here. Easiest is with Cloud shell, upload your public key, and run gcloud compute os-login ssh-keys add    --key-file=id_rsa.pub

  2. Turn the firewall setting (e.g. hpc-small-net-fw-allow-iap-ingress) to allow 0.0.0.0/0 as IP ranges for tcp:22.

  3. Promote the login node’s IP address to a static one: here

  4. Copy that IP and your username.

  5. On your own computer, add a SSH config file, store it as ~/.ssh/config (no extension) with the ip and user filled in:

Host gcslurm
	HostName <fill-in-the-External-IP-of-VM-instance>
	User <fill-in-your-Google-Cloud-user>
	Port 22
	IdentityFile ~/.ssh/id_rsa
Details

We need to setup our library with SSH access between OMERO and Slurm, but this is not built-in to these Virtual Machines yet. We will forward our local SSH to our OMERO (in this tutorial), so we just need to setup SSH access to the Google Cloud VMs.

This sounds easier than it actually is.

Follow the steps at here:

  1. Note that this tutorial by default seems to use the “OS Login” method, using the mail account you signed up with.

  2. Open a Cloud Shell

  3. Upload your public key to this Cloud Shell (with the ... button).

  4. Run the gcloud compute os-login ssh-keys add    --key-file=id_rsa.pub command they show, pointing at your newly uploaded public key. Leave out the optional project and expire_time.

Then, we have to ensure that the firewall accepts requests from outside Google Cloud, if it doesn’t already.

Go to the firewall settings and edit the tcp:22 (e.g. hpc-small-net-fw-allow-iap-ingress) and add the 0.0.0.0/0 ip ranges.

Now we are ready:

  • ssh -i ~/.ssh/id_rsa <fill-in-your-Google-Cloud-user>@<fill-in-the-External-IP-of-VM-instance>

E.g. my Google Cloud user became t_t_luik_amsterdamumc_nl, related to the email I signed up with. The External IP was on the VM instances page for the login node hpcsmall-login-2aoamjs0-001.

Now to make this connection easy, we need 2 steps:

  1. Fix this external IP address, so that it will always be the same

  2. Fix a SSH config file for this SSH connection

For 1, we got to here and follow the Console steps to promote the IP address to a static IP address. Now back in the All screen, your newly named Static IP address should show up. Copy that IP (it should be the same IP as before, but now it will not change when you restart the system)

For 2, On your own computer, add a SSH config file, store it as ~/.ssh/config (no extension) with the ip and user filled in:

Host gcslurm
	HostName <fill-in-the-External-IP-of-VM-instance>
	User <fill-in-your-Google-Cloud-user>
	Port 22
	IdentityFile ~/.ssh/id_rsa

Now you should be able to login with a simple: ssh gcslurm.

Congratulations!

3. Test Slurm

TL;DR:

  1. SSH into the login node: ssh gcslurm

  2. Start some filler jobs: sbatch --wrap="sleep 5 && hostname" &&  sbatch --wrap="sleep 5 && hostname" &&  sbatch --wrap="sleep 5 && hostname" &&  sbatch --wrap="sleep 5 && hostname"

  3. Check the progress: squeue

  4. Check some output when its done, e.g. job 1: cat slurm-1.out

Details

Now connect via SSH to Google Cloud Slurm and let’s see if Slurm works:

[t_t_luik_amsterdamumc_nl@hpcsmall-login-2aoamjs0-001 ~]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
[t_t_luik_amsterdamumc_nl@hpcsmall-login-2aoamjs0-001 ~]$ sbatch --wrap="sleep 5 && hostname" &&  sbatch --wrap="sleep 5 && hostname" &&  sbatch --wrap="sleep 5 && hostname" &&  sbatch --wrap="sleep 5 && hostname"
Submitted batch job 4
Submitted batch job 5
Submitted batch job 6
Submitted batch job 7
[t_t_luik_amsterdamumc_nl@hpcsmall-login-2aoamjs0-001 ~]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                 4     debug     wrap t_t_luik CF       0:03      1 hpcsmall-debug-ghpc-3
                 5     debug     wrap t_t_luik PD       0:00      1 (Resources)
                 6     debug     wrap t_t_luik PD       0:00      1 (Priority)
                 7     debug     wrap t_t_luik PD       0:00      1 (Priority)

I fired off 4 jobs that take some seconds, so they are still in the queue by the time I call the squeue. Note that the first one might take a while since Google Cloud has to fire up a new compute node for the first time.

The jobs wrote their stdout output in the current dir:

[t_t_luik_amsterdamumc_nl@hpcsmall-login-2aoamjs0-001 ~]$ ls
slurm-4.out  slurm-5.out  slurm-6.out  slurm-7.out
[t_t_luik_amsterdamumc_nl@hpcsmall-login-2aoamjs0-001 ~]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
[t_t_luik_amsterdamumc_nl@hpcsmall-login-2aoamjs0-001 ~]$ cat slurm-4.out
hpcsmall-debug-ghpc-3
[t_t_luik_amsterdamumc_nl@hpcsmall-login-2aoamjs0-001 ~]$ cat slurm-5.out
hpcsmall-debug-ghpc-3

All on the same node that was spun up, on-demand, by Google Cloud. You should be able to see it still alive in the VM instances tab as well. It will be destroyed again if not used for a while, saving you costs.

3b. Install requirements: Singularity / Apptainer and 7zip

TL;DR:

  1. Follow this guide to install Singularity, but in step 5 please install in /opt/apps ! /apps is not actually shared with all nodes.

  2. Execute the following to update ~/.bashrc:

echo 'export PATH=/apps/singularity/3.8.7/bin:/usr/sbin:${PATH}' >> ~/.bashrc && source ~/.bashrc
  1. Install 7zip: sudo yum install -y p7zip p7zip-plugins

Now we want to run containers on our Slurm cluster using singularity, but this is not installed by default.

Luckily the folks at Google have a guide for it, so let’s follow that one.

If the ssh connection to the login node doesn’t work from Google Cloud Shell, you can continue with the steps by using the SSH connection (ssh gcslurm) that we just built from your local commandline.

Use this URL for the singularity tar:

https://github.com/apptainer/singularity/releases/download/v3.8.7/singularity-3.8.7.tar.gz

wget https://github.com/apptainer/singularity/releases/download/v3.8.7/singularity-3.8.7.tar.gz && tar -xzf singularity-${SINGULARITY_VERSION}.tar.gz && cd singularity-${SINGULARITY_VERSION}

The module step did not work for me, because it is the wrong directory in the guide!

In step 5, we need to install to /opt/apps instead! This is very important because the compute nodes that have to execute the job need to have access to this software too, and this directory is the actual shared directory:

./mconfig --prefix=/opt/apps/singularity/${SINGULARITY_VERSION} && \
    make -C ./builddir && \
    sudo make -C ./builddir install

Now module avail should list singularity.

So module load singularity and now singularity --version should give you singularity version 3.8.7.

Now let’s connect OMERO to our Slurm!

4. OMERO & OMERO Slurm Client

Ok, now we need a OMERO server and a correctly configured OMERO Slurm Client.

TL;DR:

  1. Clone my example docker-example-omero-grid-amc locally: git clone -b processors https://github.com/TorecLuik/docker-example-omero-grid-amc.git

  2. Change the worker-gpu/slurm-config.ini file to point to worker-gpu/slurm-config.gcslurm.ini file (if it is not the same file already)

  3. Fire up the OMERO containers: docker-compose up -d --build

  4. Go to OMERO.web (localhost:4080), login root pw omero

  5. Upload some images (to localhost) with OMERO.Insight (not included).

  6. In web, run the slurm/init_environment script

Details

======= OMERO in Docker =======

You can use your own OMERO setup, but for this tutorial I will refer to a dockerized OMERO that I am working with: get it here.

git clone -b processors https://github.com/TorecLuik/docker-example-omero-grid-amc.git

Change the worker-gpu/slurm-config.ini file to be the worker-gpu/slurm-config.gcslurm.ini file (if it is not the same file already).

What we did was point to gcslurm profile (or rename your SSH profile to slurm)

[SSH]
# -------------------------------------
# SSH settings
# -------------------------------------
# The alias for the SLURM SSH connection
host=gcslurm

And we also set all directories to be relative to the home dir, and we reduced CellPose CPU drastically to fit into the small Slurm cluster we made in Google Cloud.

This way, it will use the right SSH setting to connect with our Google Cloud Slurm.

Let’s (build it and) fire it up:

docker-compose up -d --build

======= OMERO web =======

Once they are running, you should be able to access web at localhost:4080. Login with user root / pw omero.

Import some example data with OMERO Insight (connect with localhost).

======= Connect to Slurm =======

This container’s processor node (worker-5) has already installed our omero-slurm-client library.

======= Add ssh config to OMERO Processor =======

Ok, so SSH works fine from your machine, but we need the OMERO processing server worker-5 to be able to do it too.

By some smart tricks, we have mounted our ~/.ssh folder to the worker container, so it knows and can use our SSH settings and config.

Ok, so now we can connect from within the worker-5 to our Slurm cluster. We can try it out:

...\docker-example-omero-grid> docker-compose exec omeroworker-5 /bin/bash
bash-4.2$ ssh gcslurm

<pretty-slurm-art>

[t_t_luik_amsterdamumc_nl@hpcsmall-login-2aoamjs0-001 ~]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

======= Init environment =======

Now we go to OMERO web and run the slurm/init_environment script to apply this config and setup our Slurm. We will use the default location, no need to fill in anything, just run the script.

Slurm Init Busy

Slurm Init Done

Note, this will take a while, since it is downloading workflow docker images and building (singularity) containers from them.

Congratulations! We have setup workflows CellPose v1.2.7, Cellprofiler Spot v1.0.1 and CellExpansion v1.0.1. And there are no data files yet.

Let’s go run some segmentation workflow then!

5. Workflows!

TL;DR:

  1. In web, select your images and run script slurm/SLURM Run Workflow

    • Tick off E-mail box (not implemented in this Slurm docker setup)

    • For importing results, change 3a) Import into NEW Dataset to CellPose_Masks

    • For importing results, change 3b) Rename the imported images to {original_file}_cpmask.{ext}

    • Select cellpose, but tick off use_gpu off (sadly not implemented in this docker setup)

    • Click Run Script

  2. Now go get a coffee or something, it should take a lot of minutes (about 12m:30s for 4 256x256 images for me!) and then say (a.o.): COMPLETE

    • Or it FAILED, in which case you should check all the details anyway and get your hands dirty with debugging! Or try less and smaller images.

  3. Refresh your Explore window, there should be a new dataset CellPose_Masks with a mask for every input image.

Details

So, I hope you added some data already; if not, import some images now.

Let’s run slurm/SLURM Run Workflow:

Slurm Run Workflow

You can see that this script recognized that we downloaded 3 workflows, and what their parameters are. For more information on this magic, follow the other tutorials.

Let’s select cellpose and click use gpu off (sadly). Tune the other parameters as you like for your images. Also, for output let’s select Import into NEW Dataset by filling in a dataset name: cellpose_images. Click Run Script.

Slurm Run Cellpose

This will take ages because we did not invest in good compute on the Slurm cluster. It took 12m:30s for 4 small images for me.

You can check the progress with the Slurm Get Update script.

That should take you through connecting OMERO with a Google Cloud Slurm setup!


Microsoft Azure Slurm tutorial

Introduction

This library is meant to be used with some external HPC cluster using Slurm, to offload your (OMERO) compute to servers suited for it.

However, if you don’t have ready access (yet) to such a cluster, you might want to spin some test environment up in the Cloud and connect your (local) OMERO to it. This is what we will cover in this tutorial, specifically Microsoft Azure.

0. Requirements

To follow this tutorial, you need:

  • OMERO Insight

  • An Azure account (and credits)

I try to provide a tl;dr when I can, otherwise I go step by step.

1. Setup Microsoft Azure for Slurm

TL;DR:

  1. Make a new Azure account if you don’t have one. Hopefully you get/have some free credits.

  2. Create an new App “BIOMERO” via “App registrations”

    • Copy Application ID

    • Copy Application Secret

  3. Assign roles to App “BIOMERO”:

    • “Azure Container Storage Operator” role on the Subscription (or probably on the Resource Group works too)

    • “Virtual Machine Contributor” role on the Resource Group (“biomero-public”)

    • “Network Contributor” role on the Resource Group (“biomero-public”)

  4. Create storageaccount “biomerostorage” in the “biomero-public” Resource Group

  5. Mainly: Follow this video tutorial from Microsoft Azure

    • However, note that I actually have trouble with their specific version of Slurm in CycleCloud and the default version works fine. Checkout the details below for more details on this part.

  6. Probably use something cheaper than 4x the expensive Standard_ND96amsr_A100_v4 instances, unless you are really rich!

    • Note: Use Ds not Das or Des VM types, if you run into security type <null> errors in deployment.

  7. We need a Slurm accounting database for BIOMERO! See 1 - Addendum chapter below for setting one up, if you don’t have a database.

  8. Add a public key to your Azure CycleCloud profile. Probably use the hpc-slurm-cluster_key that you can find in your Resource Group.

  9. Now you should be able to login to the Slurm scheduler with something like ssh -i C:\<path-to-my>\hpc-slurm-cluster_key.pem azureadmin@<scheduler-vm-public-ip>

  10. Change the cloud-init to install Singularity and 7zip on your nodes.

Details

So, we follow this tutorial and end up with a hpc-slurm-cluster (that’s what I named the VM) VM on Microsoft Azure. It also downloaded the SSH private key for us (hpc-slurm-cluster_key.pem).


Suggested alternative: use a basic Slurm cluster

CycleCloud already comes with a basic Slurm setup, that is more up-to-date than this specific GPU powered version. Especially if you will not use GPU anyway (because $$).

So, given you followed the movie to get a CycleCloud VM up and running, let’s setup a basic Slurm cluster instead.

Let’s start that up:

  • Click + / Add for a new cluster and select Slurm (instead of cc-slurm-ngc)

  • We provide a new name biomero-cluster-basic

  • We change all the VM types:

    • Scheduler: Standard_DC4s_v3

    • HPC, HTC and Dyn: Standard_DC2s_v3

    • Login node we will not use so doesn’t matter (Num Login Nodes stays 0)

  • We change the scaling amount to only 4 cores each (instead of 100), and MaxVMs to 2.

  • Change the network to the default biomero network

  • Next, keep all the Network Attached Storage settings

  • Next, Advanced Settings

    • Here, we need to do 2 major things:

    • First, add the Slurm accounting database. See 1 - Addendum chapter below for setting that up.

    • Second, select appropriate VM images that will work for our software (mainly singularity for containers): Ubuntu 22.04 LTS worked for us.

  • Next, keep all the security settings

  • Finally, let’s change Cloud init for all nodes, to install singularity and 7zip:

#cloud-config  
package_upgrade: true
packages:  
  - htop
  - wget
  - p7zip-full
  - software-properties-common

runcmd:  
  - 'sudo add-apt-repository -y ppa:apptainer/ppa'
  - 'sudo apt update'
  - 'sudo apt install -y apptainer'

Apptainer is singularity and will provide the singularity (alias) command.

1 - Addendum - Setting up Slurm Accounting DB

  1. Expected price: 16.16 euro / month

  2. We follow this recent blog

    • Note that Western Europe has deployment issues on Azure for these DB. See Details for more details.

Details

A. Create extra subnet on your virtual network

  • Go to your hpc-slurm-cluster-vnet

  • Go to Subnets Settings

  • Create a new subnet with + Subnet, named mysql (and default settings)

B. Azure database

We create a MySQL Flexible Server

  • Server name slurm-accounting-database (or whatever is available)

  • Region Western Europe (same as your server/VNET).

    • See Notes below about issues (and solutions) in Western Europe at the time of writing.

  • MySQL version 8.0

  • Workload type For development (unless you are being serious)

  • Authentication method MySQL authentication only

  • User credentials that you like, I used omero user.

  • Next: Networking

    • Connectivity method: Private access (VNet Integration)

    • Virtual network: select your existing hpc-slurm-cluster-vnet net

    • Subnet: select your new subnet hpc-slurm-cluster-vnet/mysql

    • Private DNS, let it create or use existing one.

  • Next: deploy it!

Next, let’s change some Server parameters according to the blog. I think this is optional though.

  1. Go to your slurm-accounting-database in the Azure portal

  2. Open on left-hand side Server parameters

  3. Click on All

  4. Filter on innodb_lock_wait_timeout

  5. Change value to 900

  6. Save changes

Note! Availability issue in Western Europe in march 2024:

We had an issue deploying the database in Western Europe, apparently it is full there. So we deployed the database in UK South instead. If you have different regions, you need to connect the VNETs of both regions though, through something called peering!

For this to work, make sure the IPs of the subnets do not overlap, see A where we made an extra subnet with different IP.

  • We made some extra biomero-public-vnet with a mysql subnet on the 10.1.0.0/24 range. Make sure it is also Delegated to Microsoft.DBforMySQL/flexibleServers.

  • Remove the default subnet

  • Remove the 10.0.0.0/24 address space (as we will connect the other vnet here)

  • Then go to the biomero-public-vnet, Peerings and + Add a new peering.

  • First named hpc, default settings

  • Remote named acct, connecting to the hpc-slurm-cluster-vnet.

C. Slurm Accounting settings

Ok, now back in CycleCloud, we will set Slurm Accounting:

  1. Edit cluster

  2. Advanced Settings

  3. Check the Configure Slurm job accounting box

  • Slurm DBD URL will be your chosen Server name (check the Azure portal). For me it was slurm-accounting-database.mysql.database.azure.com.

  • Slurm DBD User and ... Password are what entered in deployment for the DB.

  • SSL Certificate URL is https://dl.cacerts.digicert.com/DigiCertGlobalRootCA.crt.pem

  1. (Re)start your Slurm cluster.

  2. Test out if the sacct command works!

2. Test Slurm

  1. SSH into the login node: ssh gcslurm

  2. Start some filler jobs: sbatch --wrap="sleep 5 && hostname" &&  sbatch --wrap="sleep 5 && hostname" &&  sbatch --wrap="sleep 5 && hostname" &&  sbatch --wrap="sleep 5 && hostname"

  3. Check the progress: squeue (perhaps also check Azure CycleCloud to see your HPC VMs spinning up, takes a few min)

  4. Check some output when its done, e.g. job 1: cat slurm-1.out

3. Test Singularity on Slurm

For example, run:

sbatch -n 1 --wrap "hostname > lolcow.log && singularity run docker://godlovedc/lolcow >> lolcow.log"

This should say “Submitted batch job 1” Then let’s tail the logfile:

tail -f lolcow.log

First we see the slurm node that is computing, and later we will see the funny cow.

[slurm@slurmctld data]$ tail -f lolcow.log
c1
 _______________________________________
/ Must I hold a candle to my shames?    \
|                                       |
| -- William Shakespeare, "The Merchant |
\ of Venice"                            /
 ---------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

Exit logs with CTRL+C, and the server with exit, and enjoy your Azure Slurm cluster.

5. Setting up (BI)OMERO in Azure too (Optional)

We will install (BI)OMERO on the CycleCloud VM that we have running anyway. Alternatively, you connect your local (BI)OMERO to this cluster now.

  1. SSH into your CycleCloud VM, hpc-slurm-cluster as azureuser

ssh -i C:\<path-to>\hpc-slurm-cluster_key.pem azureuser@<public-ip>

  1. Install a container runner like Docker

  2. Ensure it works so you can look at the lolcow again docker run godlovedc/lolcow

Ok, good enough.

Now let’s pull an easy BIOMERO setup from NL-BIOMERO onto our VM:

  1. git clone https://github.com/Cellular-Imaging-Amsterdam-UMC/NL-BIOMERO.git

  2. Let’s test it: docker compose up -d --build

  3. Now we need to open the OMERO web port to view it 4080.

  • First, go to Azure portal and click on your VM hpc-slurm-cluster

  • Second, go to Networking > Network settings

  • Third, Create port rules > Inbound port rule

    • Destination port ranges 4080, Protocol TCP, Name OMEROWEB. Add it. And wait a bit for it to take effect.

  1. Test it! Open your web browser at <public-ip>:4080 and login with root/omero

Good start!

Now let’s connect BIOMERO to our HPC Slurm cluster:

  1. Copy the SSH private key hpc-slurm-cluster_key.pem (from chapter 1) to the (CycleCloud/OMERO) server:

scp -c C:\<path>\hpc-slurm-cluster_key.pem C:\<path>\hpc-slurm-cluster_key.pem azureuser@<public-ip>:~

  1. Copy your key on the server into ~/.ssh and change permissions, log in:

  • cp hpc-slurm-cluster_key.pem .ssh

  • sudo chmod 700 .ssh/hpc-slurm-cluster_key.pem

  • ssh -i .ssh/hpc-slurm-cluster_key.pem azureadmin@<scheduler-ip>

  • Great, exit back to the CycleCloud server.

The IP of the scheduler (this changes whenever you create a new cluster!) is shown in the Azure CycleCloud screen, when you click on the active scheduler node.

  1. Create a config to setup an alias for the SSH

  • vi ~/.ssh/config

  • press i to insert text

  • copy paste / fill in the config:

Host localslurm
        HostName <scheduler-ip>
        User azureadmin
        Port 22
        IdentityFile ~/.ssh/hpc-slurm-cluster_key.pem
        StrictHostKeyChecking no

Fill in the actual ip, this is just a placeholder!

  • Save with escape followed by :wq

  • chmod the config to 700 too: sudo chmod 700 .ssh/config

  • Ready! ssh localslurm (or whatever you called the alias)

  1. Let’s edit the BIOMERO configuration slurm-config.ini, located in the biomeroworker node

  • vi ~/NL-BIOMERO/biomeroworker/slurm-config.ini

  • Change the host if you did not use the localslurm alias in the config above.

  • Change ALL the [SLURM] paths to match our new slurm setup:

[SLURM]
# -------------------------------------
# Slurm settings
# -------------------------------------
# General settings for where to find things on the Slurm cluster.
# -------------------------------------
# PATHS
# -------------------------------------
# The path on SLURM entrypoint for storing datafiles
#
# Note: 
# This example is relative to the Slurm user's home dir
slurm_data_path=data
# The path on SLURM entrypoint for storing container image files
#
# Note: 
# This example is relative to the Slurm user's home dir
slurm_images_path=singularity_images/workflows
# The path on SLURM entrypoint for storing the slurm job scripts
#
# Note: 
# This example is relative to the Slurm user's home dir
slurm_script_path=slurm-scripts
  • Save the file again with escape + :wq

  1. Now we need to do some Linux shenanigans to mount ssh properly into the container

  • First, create a empty .pub file that we are missing: touch ~/.ssh/empty.pub

  • Second, chmod the .ssh folder and its contents fully open so Docker can access it: chmod -R 777 ~/.ssh

  • Note, if you later want to SSH from commandline again (instead of letting BIOMERO do it), just change the rights back to 700 (chmod -R 700 ~/.ssh). This is just a Linux container building temporary permission thing.

  1. Now we will (re)start / (re)build the BIOMERO servers again

  • cd NL-BIOMERO

  • docker compose down

  • docker compose up -d --build

  • Now docker logs -f nl-biomero-biomeroworker-1 should show some good logs leading to: Starting node biomeroworker.

6. Showtime!

  1. Go to your OMERO web at http://<your-VM-ip>:4080/ (root/omero)

  2. Let’s initialize BIOMERO: Run Script > biomero > init > SLURM Init environment...; run that script

  3. While we’re waiting for that to complete, let’s checkout the basic connection: Run Script > biomero > Example Minimal Slurm Script...;

  • Uncheck the Run Python box, as we didn’t install that

  • Check the Check SLURM status box

  • Check the Check Queue box

  • Run the script, you should get something like this, an empty queue:

JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
  • You can try some other ones, e.g. check the Check Cluster box instead:

PARTITION AVAIL TIMELIMIT NODES STATE NODELIST 
dynamic up infinite 0 n/a 
hpc* up infinite 2 idle~ biomero-cluster-basic-hpc-[1-2] 
htc up infinite 2 idle~ biomero-cluster-basic-htc-[1-2]
  • Note if you click the i button next to the output, you can see the output printed in a lot more detail and better formatting. Especially if you ran multiple commands at the same time.

  1. At some point, the init script will be done, or you get a Ice.ConnectionLostException (which means it took too long).

  • Let’s see what BIOMERO created! Run Run Script > biomero > Example Minimal Slurm Script...;

  • Uncheck the Run Python box

  • Check the Run Other Commmand box

  • Change the Linux Command to ls -la **/* (we want to check all subfolders too).

  • Run it. Press the i button for proper formatting and scroll down to see what we made

=== stdout ===
-rw-r--r-- 1 azureadmin azureadmin 1336 Mar 21 17:44 slurm-scripts/convert_job_array.sh

my-scratch/singularity_images:
total 0
drwxrwxr-x 3 azureadmin azureadmin  24 Mar 21 17:31 .
drwxrwxr-x 3 azureadmin azureadmin  32 Mar 21 17:31 ..
drwxrwxr-x 2 azureadmin azureadmin 117 Mar 21 17:45 converters

singularity_images/workflows:
total 16
drwxrwxr-x 6 azureadmin azureadmin   126 Mar 21 17:31 .
drwxrwxr-x 3 azureadmin azureadmin    23 Mar 21 17:31 ..
drwxrwxr-x 2 azureadmin azureadmin    40 Mar 21 17:42 cellexpansion
drwxrwxr-x 2 azureadmin azureadmin    54 Mar 21 17:36 cellpose
drwxrwxr-x 2 azureadmin azureadmin    52 Mar 21 17:40 cellprofiler_spot
-rw-rw-r-- 1 azureadmin azureadmin   695 Mar 21 17:45 pull_images.sh
-rw-rw-r-- 1 azureadmin azureadmin 10802 Mar 21 17:45 sing.log
drwxrwxr-x 2 azureadmin azureadmin    43 Mar 21 17:44 spotcounting

slurm-scripts/jobs:
total 16
drwxrwxr-x 2 azureadmin azureadmin  100 Mar 21 17:31 .
drwxrwxr-x 3 azureadmin azureadmin   46 Mar 21 17:31 ..
-rw-rw-r-- 1 azureadmin azureadmin 3358 Mar 21 17:44 cellexpansion.sh
-rw-rw-r-- 1 azureadmin azureadmin 3406 Mar 21 17:44 cellpose.sh
-rw-rw-r-- 1 azureadmin azureadmin 3184 Mar 21 17:44 cellprofiler_spot.sh
-rw-rw-r-- 1 azureadmin azureadmin 3500 Mar 21 17:44 spotcounting.sh
  • Or better yet, run this linux command for full info on all (non-hidden) subdirectories: find . -type d -not -path '*/.*' -exec ls -la {} +. This should show that we downloaded some of the workflows to our Slurm cluster already:

./singularity_images/workflows:
total 16
drwxrwxr-x 6 azureadmin azureadmin   126 Mar 21 17:31 .
drwxrwxr-x 3 azureadmin azureadmin    23 Mar 21 17:31 ..
drwxrwxr-x 2 azureadmin azureadmin    40 Mar 21 17:42 cellexpansion
drwxrwxr-x 2 azureadmin azureadmin    54 Mar 21 17:36 cellpose
drwxrwxr-x 2 azureadmin azureadmin    52 Mar 21 17:40 cellprofiler_spot
-rw-rw-r-- 1 azureadmin azureadmin   695 Mar 21 17:45 pull_images.sh
-rw-rw-r-- 1 azureadmin azureadmin 10802 Mar 21 17:45 sing.log
drwxrwxr-x 2 azureadmin azureadmin    43 Mar 21 17:44 spotcounting

./singularity_images/workflows/cellexpansion:
total 982536
drwxrwxr-x 2 azureadmin azureadmin         40 Mar 21 17:42 .
drwxrwxr-x 6 azureadmin azureadmin        126 Mar 21 17:31 ..
-rwxr-xr-x 1 azureadmin azureadmin 1006116864 Mar 21 17:42 w_cellexpansion_v2.0.1.sif

./singularity_images/workflows/cellpose:
total 4672820
drwxrwxr-x 2 azureadmin azureadmin         54 Mar 21 17:36 .
drwxrwxr-x 6 azureadmin azureadmin        126 Mar 21 17:31 ..
-rwxr-xr-x 1 azureadmin azureadmin 4784967680 Mar 21 17:36 t_nucleisegmentation-cellpose_v1.2.9.sif

./singularity_images/workflows/cellprofiler_spot:
total 2215916
drwxrwxr-x 2 azureadmin azureadmin         52 Mar 21 17:40 .
drwxrwxr-x 6 azureadmin azureadmin        126 Mar 21 17:31 ..
-rwxr-xr-x 1 azureadmin azureadmin 2269097984 Mar 21 17:40 w_spotcounting-cellprofiler_v1.0.1.sif

./singularity_images/workflows/spotcounting:
total 982720
drwxrwxr-x 2 azureadmin azureadmin         43 Mar 21 17:44 .
drwxrwxr-x 6 azureadmin azureadmin        126 Mar 21 17:31 ..
-rwxr-xr-x 1 azureadmin azureadmin 1006305280 Mar 21 17:44 w_countmaskoverlap_v1.0.1.sif

./slurm-scripts:
total 8
drwxrwxr-x  3 azureadmin azureadmin   46 Mar 21 17:31 .
drwxr-xr-x 12 azureadmin azureadmin 4096 Mar 21 17:31 ..
-rw-r--r--  1 azureadmin azureadmin 1336 Mar 21 17:44 convert_job_array.sh
drwxrwxr-x  2 azureadmin azureadmin  100 Mar 21 17:31 jobs

./slurm-scripts/jobs:
total 16
drwxrwxr-x 2 azureadmin azureadmin  100 Mar 21 17:31 .
drwxrwxr-x 3 azureadmin azureadmin   46 Mar 21 17:31 ..
-rw-rw-r-- 1 azureadmin azureadmin 3358 Mar 21 17:44 cellexpansion.sh
-rw-rw-r-- 1 azureadmin azureadmin 3406 Mar 21 17:44 cellpose.sh
-rw-rw-r-- 1 azureadmin azureadmin 3184 Mar 21 17:44 cellprofiler_spot.sh
-rw-rw-r-- 1 azureadmin azureadmin 3500 Mar 21 17:44 spotcounting.sh
  1. Ok, let’s get to some data! Upload a file with (a local installation of) omero insight.

  • First, open up the OMERO port 4064 in Azure on your hpc-slurm-cluster, just like we did with port 4080: Add inbound security rule, destination 4064, Protocol TCP, Name OMEROINSIGHT.

  • Change the server to <cyclecloud-vm-ip>:4064

  • Login root/omero

  • Upload some Nuclei fluorescense images. For example, I uploaded the raw images from S-BSST265 into a Project TestProject and Dataset S-BSST265. Add to Queue, and import!

  1. IMPORTANT! Our default job script assumes 4 CPUs, but we have nodes with only 2 cores. So we have to lower this amount for the job script. Otherwise we get this error:

sbatch: error: CPU count per node can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available

We will do this ad-hoc, by changing the configuration for CellPose in the slurm-config.ini in our installation:

  • First, edit the config on the main VM with vi biomeroworker/slurm-config.ini

  • Add this line to your workflows <wf>_job_cpus-per-task=2, e.g. cellpose_job_cpus-per-task=2

  • save file (:wq)

  • Don’t forget to open your .ssh to the container chmod -R 777 ~/.ssh (and close it later)

  • Restart the biomero container(s) (docker compose down & docker compose up -d --build, perhaps specifically for biomeroworker).

  • Check logs to see if biomero started up properly docker logs -f nl-biomero-biomeroworker-1

  1. Next, time to segment! Time to spin up those SLURM compute nodes:

  • First, select your newly imported dataset, then Run Script > biomero > workflows > SLURM Run Workflow...

  • At Select how to import your results (one or more), we will upload the masks back into a new dataset, so:

    • Change 3a) Import into NEW Dataset: into CellPoseMasks

    • Change 3c) Rename the imported images: into {original_file}_Mask_C1.{ext} (these are placeholder values)

  • Next, check the cellpose box and

    • Change nuc channel to 1

    • Uncheck the use gpu box (unless you paid for sweet GPU nodes from Azure)

  • Run Script!

    • We are running the cellpose workflow on channel 1 (with otherwise default parameters) of all the images of the dataset S-BSST265 and import the output mask images back into OMERO as dataset CellPoseMasks.

  • Now, this will take a while again because we are cheap and do not have a GPU node at the ready.

    • Instead, Azure will on-demand create our compute node (to save us money when we are not using it), which only has a few CPUs as well!

    • So this is not a test of speed (unless you setup a nice Slurm cluster with always-available GPU nodes), but of the BIOMERO automation process.

Extra thoughts

  • Perhaps also make the Cluster have a static IP, instead of changing whenever you terminate it: https://learn.microsoft.com/en-us/azure/cyclecloud/how-to/network-security?view=cyclecloud-8