Cellprofiler tutorial
Cellprofiler is already known for excellent interoperability with OMERO. You can directly load images into the cellprofiler pipelines.
Cellprofiler also has options to run in batch mode and headless, for analyzing big data on compute clusters, like we want as well.
However, for our purposes, this is insufficient, as we want to run it from OMERO, and on a compute cluster that only has SSH access.
In this tutorial I will show you how to add a cellprofiler pipeline as a workflow to OMERO and Slurm, with this client library.
0. Prerequisite: OMERO, Slurm and biomero
.
We assume you have these 3 components setup and connected. If not, follow the main README first.
1. Grab the data and pipeline
We want to try something ready-made, and we like spots here at the AMC.
So let’s grab this spot-counting example from the cellprofiler website: https://github.com/tischi/cellprofiler-practical-NeuBIAS-Lisbon-2017/blob/master/practical-handout.md
2. Try the pipeline locally
UI
It is always a good idea to test your algorithms locally before jumping to remote compute.
You can walk through the readme, or open the PLA-dot-counting-with-speckle-enhancement.cpproj. It seems to be a bit older, so we have to fix the threshold ((0.0
to 0.0
) and change the input to our local file location.
Export pipeline file only
Cellprofiler works with both .cpproj
and .cppipe
. The project version hardcodes the filepaths in there, which we don’t want. So go to File
> Export
> Pipeline
and save this as a .cppipe
file.
Another bonus is that .cppipe
is human-readable and editable. Important later on.
headless
After it works, let’s try it headless too:
./cellprofiler.exe -c -p '<path-to>\PLA-dot-counting-with-speckle-enhancement.cppipe' -o '<path-to>\cellprofiler_results' -i '<path-to>\PLA_data
Here we provide the input images (-i
), the output folder (-o
), the project (-p
) and headless mode (-c
).
See this blog for more info on the commandline parameters.
3. Upload the data to OMERO
Let’s make a screen out of these 8 wells, for fun.
Open the importer.
Since there is no screen metadata in the files, first create a project and dataset in OMERO.
Import the PLA_data folder there.
Go to the Web UI.
Select the new dataset.
Activate script
Dataset to Plate
(under omero/util_scripts/).Fill in 8 wells per row (optional)
Screen: PLA_data
Now we have a plate with 8 wells in a screen in OMERO.
4. Package the cellprofiler in a FAIR package
To create a FAIR workflow, let’s follow the steps from Biaflows for creating a new workflow, as they explained it quite well already: https://neubias-wg5.github.io/creating_bia_workflow_and_adding_to_biaflows_instance.html
We just ignore some parts specific to the BIAFLOWS server, like adding as a trusted source. We will add the workflow to OMERO and Slurm instead, as a final step.
0. Create a workflow Github repository
To kickstart, we can reuse some of the workflow setup for CellProfiler from Neubias github.
You can follow along, or just use my version at the end (https://github.com/TorecLuik/W_SpotCounting-CellProfiler)
Login/create an account on Github
Go to link above.
Go to
Use this template
Create a new repository
Name it
W_SpotCounting-CellProfiler
Keep it Public
Clone your new repository locally
Code
>Clone
>HTTPS
> Copygit clone https://github.com/<...>/W_SpotCounting-CellProfiler.git
Open the folder in your favorite editor
Copy the project we want to this folder e.g.
PLA-dot-counting-with-speckle-enhancement.cpproj
a. Create a Dockerfile for cellprofiler
The Dockerfile installs our whole environment.
We want:
Cellprofiler
Cytomine/Biaflows helper libraries (for Input/Output)
Our workflow files:
wrapper.py
(the logic to run our workflow)descriptor.json
(the metadata of our workflow)*.cppipe
(our cellprofiler pipeline)
Now it turns out that this Dockerfile uses an old version of CellProfiler (with Python 2). We want the newest one, so I rewrote the Dockerfile:
Our new/changed Dockerfile
FROM cellprofiler/cellprofiler
Instead of installing cellprofiler manually, it turns out they host containers images themselves, so let’s reuse those.
# Install Python3.7
RUN apt-get update && apt-get install -y python3.7 python3.7-dev python3.7-venv
RUN python3.7 -m pip install --upgrade pip && python3.7 -m pip install Cython
This cellprofiler image is quite modern, but we need an older Python to work with the Cytomine/Biaflows libraries. So we Install Python3.7 (and Cython package).
# ------------------------------------------------------------------------------
# Install Cytomine python client
RUN git clone https://github.com/cytomine-uliege/Cytomine-python-client.git && \
cd Cytomine-python-client && git checkout tags/v2.7.3 && \
python3.7 -m pip install . && \
cd .. && \
rm -r Cytomine-python-client
# ------------------------------------------------------------------------------
# Install BIAFLOWS utilities (annotation exporter, compute metrics, helpers,...)
RUN apt-get update && apt-get install libgeos-dev -y && apt-get clean
RUN git clone https://github.com/Neubias-WG5/biaflows-utilities.git && \
cd biaflows-utilities/ && git checkout tags/v0.9.1 && python3.7 -m pip install .
# install utilities binaries
RUN chmod +x biaflows-utilities/bin/*
RUN cp biaflows-utilities/bin/* /usr/bin/ && \
rm -r biaflows-utilities
These 2 parts install specific versions of the biaflows library and Cytomine library with Python 3.7.
# ------------------------------------------------------------------------------
# Add repository files: wrapper, command and descriptor
RUN mkdir /app
ADD wrapper.py /app/wrapper.py
ADD PLA-dot-counting-with-speckle-enhancement.cppipe /app/PLA-dot-counting-with-speckle-enhancement.cppipe
ADD descriptor.json /app/descriptor.json
ENTRYPOINT ["python3.7","/app/wrapper.py"]
Finally we add our own workflow to /app
folder:
wrapper.py
.cppipe
descriptor.json
And we tell the image to call wrapper.py
with python3.7 when we start it up using an ENTRYPOINT
. This also forwards commandline parameters that you provide to the wrapper.py
script, e.g. workflow parameters.
b. Setup the metadata in descriptor.json
We actually don’t have any input parameters (except the default input/output) at this moment. Look at this extra chapter for more info on how to approach that.
So we can just use the basic descriptor.json
that was given and remove the last 2 non-cytomine parameters.
Mainly, update the name, description and where we will publish the container (your new dockerhub account).
Example full json
{
"name": "SpotCounting-CellProfiler",
"description": "Workflow for spot counting in CellProfiler",
"container-image": {
"image": "torecluik/w_spotcounting-cellprofiler",
"type": "singularity"
},
"command-line": "python wrapper.py CYTOMINE_HOST CYTOMINE_PUBLIC_KEY CYTOMINE_PRIVATE_KEY CYTOMINE_ID_PROJECT CYTOMINE_ID_SOFTWARE",
"inputs": [
{
"id": "cytomine_host",
"value-key": "@ID",
"command-line-flag": "--@id",
"name": "BIAFLOWS host",
"set-by-server": true,
"optional": false,
"type": "String"
},
{
"id": "cytomine_public_key",
"value-key": "@ID",
"command-line-flag": "--@id",
"name": "BIAFLOWS public key",
"set-by-server": true,
"optional": false,
"type": "String"
},
{
"id": "cytomine_private_key",
"value-key": "@ID",
"command-line-flag": "--@id",
"name": "BIAFLOWS private key",
"set-by-server": true,
"optional": false,
"type": "String"
},
{
"id": "cytomine_id_project",
"value-key": "@ID",
"command-line-flag": "--@id",
"name": "BIAFLOWS project ID",
"set-by-server": true,
"optional": false,
"type": "Number"
},
{
"id": "cytomine_id_software",
"value-key": "@ID",
"command-line-flag": "--@id",
"name": "BIAFLOWS software ID",
"set-by-server": true,
"optional": false,
"type": "Number"
}
],
"schema-version": "cytomine-0.1"
}
c. Update the command in wrapper.py
So the wrapper gets called when the container starts. This is where we ‘wrap’ our pipeline by handling input/output and parameters. We also have to make sure that we call the pipeline correctly here.
Our changes to the wrapper
This first part we keep the same: the BiaflowsJob
will parse the commandline parameters for us and provide those as bj.parameter.<param_name>
if we did want them. But we don’t use any right now.
def main(argv):
base_path = "{}".format(os.getenv("HOME")) # Mandatory for Singularity
problem_cls = CLASS_OBJSEG
with BiaflowsJob.from_cli(argv) as bj:
bj.job.update(status=Job.RUNNING, progress=0, statusComment="Initialisation...")
# 1. Prepare data for workflow
in_imgs, gt_imgs, in_path, gt_path, out_path, tmp_path = prepare_data(problem_cls, bj, is_2d=True, **bj.flags)
The second part (where we call our pipeline) we can simplify a bit, as we don’t need to parse parameters for cellprofiler. See later for how to start handling that.
We specifically name the cppipe that we added to /app
, and we use subprocess.run(...)
to execute our cellprofiler headless on the commandline: cellprofiler -c -r -p ... -i ... -o ... -t
.
In theory we could also use the cellprofiler python package here, for more control. But in general, we can run any commandline program with subprocess.run
, so this wrapper will look similar for most workflows.
pipeline = "/app/PLA-dot-counting-with-speckle-enhancement.cppipe"
# 2. Run CellProfiler pipeline
bj.job.update(progress=25, statusComment="Launching workflow...")
## If we want to allow parameters, we have to parse them into the pipeline here
# mod_pipeline = parse_cellprofiler_parameters(bj, pipeline, tmp_path)
mod_pipeline = pipeline
shArgs = [
"cellprofiler", "-c", "-r", "-p", mod_pipeline,
"-i", in_path, "-o", out_path, "-t", tmp_path,
]
status = run(" ".join(shArgs), shell=True)
Finally, we don’t change much to the rest of the script and just handle the return code. 0 means success, so then we just log to the logfile.
There is some built-in logic for Biaflows
, like uploading results and metrics.
We keep it in for the logs, but they are essentially a no-op
because we will provide the command-line parameters --local
and -nmc
(n
o m
etric c
omputation).
Full changes can be found here
d. Run locally
Now that we have a docker, we can run this locally or anywhere that we have docker installed, without the need for having the right version of cellprofiler, etc. Let’s try it out:
Setup your data folder like this:
PLA
as main folderPLA_data
with the 8 images, as subfolderout
as empty subfoldergt
as empty subfolder
Build a container:
docker build -t spotcounting-cp .
(Note the.
is important, it meansthis folder
)Run the container on the
PLA
folder like this:docker run --rm -v <my-drive>\PLA\:/data-it spotcounting-cp --local --infolder /data-it/PLA_data --outfolder /data-it/out --gtfolder /data-it/gt -nmc
This should work the same as before, with a bit of extra logging thrown in.
Except now, we didn’t need to have cellprofiler installed! Anyone with Docker
(or Podman
or Singularity
) can run this workflow now.
e. Publish to GitHub and DockerHub
So how do other people get to use our workflow?
We publish the source online on Github:
Commit to git:
git commit -m 'Update with spotcounting pipeline' -a
Push to github:
git push
Setup automated release to Dockerhub:
First, create a free account on Dockerhub if you don’t have one
On Dockerhub, login and create a new
Access Token
viaAccount Settings
/Security
. Name itGithub
or something. Copy this token (to a file).Back on your Github repository, add 2 secrets by going to
Settings
/Secrets and variables
/Actions
/New repository secret
First, add Name:
DOCKERHUB_USERNAME
and Secret:<your-dockerhub-username>
Also, add Name:
DOCKERHUB_TOKEN
and Secret:<token-that-you-copied>
Now, tag and release this as a new version on Github (and automatically Dockerhub):
Pretty easy to do from Github page:
Releases
>new release
.Add a tag like
v1.0.0
.Now, the Github Action
Docker Image CI
will build the container for you and publish it on Dockerhub via the credentials you provided. This will take a few minutes, you can follow along at theActions
tab.Now you can verify that it is available online: https://hub.docker.com/u/your-dockerhub-user
Great! now everybody (with internet access) can pull your workflow image and run it locally:
docker run --rm -v <my-drive>\PLA\:/data-it <your-dockerhub-user>/w_spotcounting-cellprofiler:v1.0.0 --local --infolder /data-it/PLA_data --outfolder /data-it/out --gtfolder /data-it/gt -nmc
And this is what we will make OMERO do on the Slurm cluster next.
Optional: Manually publish the image on Dockerhub:
First, create an account on Dockerhub if you don’t have one
Login locally on the commandline to this account too:
docker login
(Optional) Build your latest docker image if you didn’t do that yet (
docker build -t spotcounting-cp .
).Tag your local Docker image with a new tag to match this Dockerhub account and release:
docker tag spotcounting-cp:latest <your-dockerhub-user>/w_spotcounting-cellprofiler:v1.0.0
Push your tagged image to Dockerhub:
docker push <your-dockerhub-user>/w_spotcounting-cellprofiler:v1.0.0
Now you can verify that it is available online: https://hub.docker.com/u/your-dockerhub-user
E.g. mine can be found @ https://hub.docker.com/r/torecluik/w_spotcounting-cellprofiler/tags
Great! now everybody (with internet access) can pull your workflow image and run it locally:
docker run --rm -v <my-drive>\PLA\:/data-it <your-dockerhub-user>/w_spotcounting-cellprofiler:v1.0.0 --local --infolder /data-it/PLA_data --outfolder /data-it/out --gtfolder /data-it/gt -nmc
And this is what we will make OMERO do on the Slurm cluster next.
5. Add this workflow to the OMERO Slurm Client
Let’s adjust the
slurm-config.ini
on our OMERO processor server.
In the [MODEL]
section we add our new workflow:
# -------------------------------------
# CELLPROFILER SPOT COUNTING
# -------------------------------------
# The path to store the container on the slurm_images_path
cellprofiler_spot=cellprofiler_spot
# The (e.g. github) repository with the descriptor.json file
cellprofiler_spot_repo=https://github.com/TorecLuik/W_SpotCounting-CellProfiler/tree/v1.0.0
# The jobscript in the 'slurm_script_repo'
cellprofiler_spot_job=jobs/cellprofiler_spot.sh
Note that we link to the v1.0.0
specifically.
When using a new version, like v1.0.1
, update this config again.
For example, I had a bugfix, so I released my workflow to v1.0.1
, using the release + push + update steps.
For me, updating is done by rebuilding my docker container for the processor worker:
docker-compose up -d --build omeroworker-5
and recreate the Slurm environment:
Run
SlurmClient.from_config(init_slurm=true)
on the OMERO processor server.
E.g. using this omero script
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Original work Copyright (C) 2014 University of Dundee
# & Open Microscopy Environment.
# All Rights Reserved.
# Modified work Copyright 2022 Torec Luik, Amsterdam UMC
# Use is subject to license terms supplied in LICENSE.txt
#
# Example OMERO.script to instantiate a 'empty' Slurm connection.
import omero
import omero.gateway
from omero import scripts
from omero.rtypes import rstring, unwrap
from biomero import SlurmClient
import logging
logger = logging.getLogger(__name__)
def runScript():
"""
The main entry point of the script
"""
client = scripts.client(
'Slurm Init',
'''Will initiate the Slurm environment for workflow execution.
You can provide a config file location,
and/or it will look for default locations:
/etc/slurm-config.ini
~/slurm-config.ini
''',
scripts.Bool("Init Slurm", grouping="01", default=True),
scripts.String("Config file", optional=True, grouping="01.1",
description="The path to your configuration file. Optional."),
namespaces=[omero.constants.namespaces.NSDYNAMIC],
)
try:
message = ""
init_slurm = unwrap(client.getInput("Init Slurm"))
if init_slurm:
configfile = unwrap(client.getInput("Config file"))
if not configfile:
configfile = ''
with SlurmClient.from_config(configfile=configfile,
init_slurm=True) as slurmClient:
slurmClient.validate(validate_slurm_setup=True)
message = "Slurm is setup:"
models, data = slurmClient.get_all_image_versions_and_data_files()
message += f"Models: {models}\nData:{data}"
client.setOutput("Message", rstring(str(message)))
finally:
client.closeSession()
if __name__ == '__main__':
runScript()
Now your Slurm cluster has
your image ‘v1.0.0’.
And also a job-script for Slurm, automatically generated (unless you changed that behaviour in the
slurm-config
).
6. Add a OMERO script to run this from the Web UI
select a screen / dataset
select workflow
run workflow!
check progress
Import resulting data
I have created several OMERO scripts using this library, and the run_workflow
can do this for us.
It will attach the results as a zipfile attachment to the screen.
Perhaps we can integrate with OMERO.Tables in the future.
Extra: How to add workflow parameters to cellprofiler?
So normally, adding workflow parameters to your commandline in wrapper.py
is easy, like this:
# Add here the code for running the analysis script
#"--chan", "{:d}".format(nuc_channel)
cmd = ["python", "-m", "cellpose", "--dir", tmp_path, "--pretrained_model", "nuclei", "--save_tif", "--no_npy", "--chan", "{:d}".format(nuc_channel), "--diameter", "{:f}".format(bj.parameters.diameter), "--cellprob_threshold", "{:f}".format(bj.parameters.prob_threshold)]
status = subprocess.run(cmd)
Here we add bj.parameters.diameter
(described here) as "--diameter", "{:f}".format(bj.parameters.diameter)
.
However, cellprofiler does not support changing pipeline parameters from the commandline. Maybe it will in the future. For now, we have 3 options:
Edit the
.cppipe
file and override our parameters there automaticallyUse the Python
cellprofiler
library inwrapper.py
and open and edit thepipeline
.Add an extra python script that does number 2, which we call from the
wrapper.py
and which does accept commandline arguments.
For 1., this is where parseCPparam
function comes in (in wrapper.py
). I have updated it a bit in my version.
It matches the name
in descriptor.json
literally with the same string in .cppipe
, and then changes the values to the new ones provided on the commandline.
However, if you use the same module twice (like in our example pipeline), it will overwrite both of them with the same value.
In our example, that does not work properly, e.g. the size of a nucleus should NOT be the same as the size of a spot.
Options 2 and 3 are an exercise for the reader. There is an example in the OMERO docs of using the CellProfiler Python API: Getting started with CellProfiler and OMERO.
Extra 2: We should add a LICENSE
See the importance of a license here:
You’re under no obligation to choose a license. However, without a license, the default copyright laws apply, meaning that you retain all rights to your source code and no one may reproduce, distribute, or create derivative works from your work. If you’re creating an open source project, we strongly encourage you to include an open source license.
So, we are essentially not allowed to make all these changes and use their template without a license. We will just assume we have a license as they explain all these steps in their docs. To make this easier for the future, always add a license. I asked them to add one to the example workflows.
A nice permissive default is Apache 2.0. It allows people to generally use it however they want, private / commercial / open / closed etc.
But there is also copyleft
, where people can only adapt your code if they also keep the same license on all their code; e.g. GNU. That is a bit more restrictive.
CellExpansion tutorial
Introduction
Different type of aggregates of proteins can form inside a nucleus or inside the cytoplasm of a cell. In our example, we have aggregates (spots) outside of the nucleus and we want to quantify these per cell.
1. Import data to OMERO
Import data as you would normally.
We use this image ‘Cells.tif’, shown as part of this png with a mask here:
2. Extract masks with Cellpose
This process is actually 2 steps: we want the nuclei masks and also the aggregates masks. Luckily these were stained with different colors and are available in different channels:
Channel 3 = Nuclei
Channel 2 = Aggregates
So we can run 2 CellPose workflows on OMERO and retrieve both masks. We store them as images in a new dataset and particularly name them: “{original_file}NucleiLabels.{ext}” and “{original_file}GranulesLabels.{ext}”.
Combine both in the same dataset afterward, this will be our input dataset for the CellExpansion algorithm.
3. CellExpansion
To estimate the amount of aggregates per cell, we actually need the cytoplasm in our example. Then we can calculate overlap.
One could segment the cytoplasm, especially in this image (its just channel 1), but we have a Python script that does this algorithmically instead for the fun of it.
We apply the CellExpansion algorithm on the nuclei mask and estimate the full reach of the cells with new masks.
For this, we have to first add it to OMERO: We could just add the Python code to a OMERO job script. But then the Processor needs to have the right Python libraries installed. Instead, we should package it in a lightweight container with the correct Python environment. This in turn makes the workflow more FAIR.
I made this workflow container for it: github repo.
Release a version and publish a docker image
Add the workflow to Slurm and OMERO:
# -------------------------------------
# CELLEXPANSION SPOT COUNTING
# -------------------------------------
# The path to store the container on the slurm_images_path
cellexpansion=cellexpansion
# The (e.g. github) repository with the descriptor.json file
cellexpansion_repo=https://github.com/TorecLuik/W_CellExpansion/tree/v1.0.1
# The jobscript in the 'slurm_script_repo'
cellexpansion_job=jobs/cellexpansion.sh
Run the workflow on our Nuclei mask. Output the new mask back as image in a new dataset.
Calculate overlap
We calculate overlap with another very short Python script. It outputs the overlap counts of 2 masks.
Example original code:
imCellsCellLabels=imread('images/CellsNucleiLabels.tif',cv2.IMREAD_ANYDEPTH)
imCellsGranulesLabels=imread('images/CellsGranulesLabels.tif',cv2.IMREAD_ANYDEPTH)
numCells=np.max(imCellsCellLabels)
CellNumGranules=np.zeros([numCells,2],dtype=np.int16)
granulesStats=pd.DataFrame(measure.regionprops_table(imCellsGranulesLabels, properties=('centroid',)))
granulesStatsnp=np.ndarray.astype(np.round(granulesStats.to_numpy()),dtype=np.uint16)
granulesStatsInCellLabel=imCellsCellLabels[granulesStatsnp[:,0],granulesStatsnp[:,1]]
for i in range(1,numCells+1):
CellNumGranules[i-1,0]=np.count_nonzero(granulesStatsInCellLabel==i)
pd.DataFrame(CellNumGranules,columns=['Estimated']).style
I added this as a separate workflow at W_CountMaskOverlap.
add the workflow to config.
make one dataset with pairs of our mask files. We name them the same as the original image, but with an extra suffix. E.g. Cells_CellExpansion.tif and Cells_Aggregates.tif.
Call the new workflow on this dataset / image selection, and supply the suffixes chosen (“_CellExpansion” and “_Aggregates”) as parameter. Then make sure to upload the result of the workflow as a zip, as it will be a csv file.
Check the resulting csv for a count of aggregates per cell!
Workflow management?
Of course, this required knowledge and manual manipulation of renaming images and supplying that metadata to the next workflow. Ideally you would be able to string singular workflows together with Input/Output like using NextFlow or Snakemake. We are looking into it for a future version.
Extra
Out of memory
While running CellPose on the Aggregates, my job ran out of memory. So I had to bump up the default memory used by the generated job scripts, in slurm_config.ini
:
# -------------------------------------
# CELLPOSE SEGMENTATION
# -------------------------------------
# The path to store the container on the slurm_images_path
cellpose=cellpose
# The (e.g. github) repository with the descriptor.json file
cellpose_repo=https://github.com/TorecLuik/W_NucleiSegmentation-Cellpose/tree/v1.2.7
# The jobscript in the 'slurm_script_repo'
cellpose_job=jobs/cellpose.sh
# Override the default job values for this workflow
# Or add a job value to this workflow
# If you don't want to override, comment out / delete the line.
# Run CellPose Slurm with 10 GB GPU
cellpose_job_gres=gpu:1g.10gb:1
# Run CellPose Slurm with 15 GB CPU memory
cellpose_job_mem=15GB
I added the ...mem=15GB
configuration, which will add mem=15GB
to the Slurm job command from now on for CellPose workflows.
No need to restart the server, these changes get picked up whenever we start a new client from this config file (which is when we start a new script).
So after updating that ini
file, I kickstart the workflow for channel 2 again and this time it works and returns the mask.
Local Slurm tutorial
Introduction
This library is meant to be used with some external HPC cluster using Slurm, to offload your (OMERO) compute to servers suited for it.
However, if you don’t have ready access (yet) to such a cluster, you might want to spin some test environment up locally and connect your (local) OMERO to it. This is what we will cover in this tutorial.
0. Requirements
To follow this tutorial, you need:
Git
Docker (Desktop for Windows)
OMERO Insight
> 18GB memory
> 8 CPU cores
Warning: I tested with Windows here, and I’ve heard a few issues with (command-line) Linux:
host.docker.internal
address does not work to communicate via the host machine on (command-line) Linux.If you don’t run Docker as root, it won’t have access to the mounted SSH keys because of file rights.
As an example, we run a setup on (rootless) Podman where we add SSH keys as (podman) secrets instead.
System requirements could be less, but then you have to change some configurations for Slurm.
I provide ready-to-go TL;DR, but in the details of each chapter I walk through the steps I took to make these containers ready.
1. Setup Docker containers for Slurm
TL;DR:
Clone my example
slurm-docker-cluster
locally: here
Details
Always a good idea to stand on the shoulders of giants, so we want to spin up a ready-made Slurm container cluster. Here on Github is a nice example with a open source license. It uses Docker containers and Docker Compose to easily orchestrate their interactions.
This setup will spin up a few separate containers (on the same computer) to make 1 slurm cluster:
slurmdbd
, the Slurm DataBase Daemonslurmctld
, the Slurm Control Daemon, our entrypointmysql
, the actual databasec1
andc2
, 2 compute nodes
Note: these compute nodes are not setup to use GPU, that is a whole other challenge that we will not get into. But even on CPU, Slurm can be useful for parallel processing and keeping track of a queue of jobs.
So let’s clone this repository to our local system:
git clone https://github.com/giovtorres/slurm-docker-cluster.git .
You can build and run these containers as described in their README. Then you can already play around with Slurm that way, so try it out!
However, we are missing an ingredient: SSH access!
2. Add SSH access
TL;DR:
Copy your public SSH key (
id_rsa.pub
) into this git folder (it will get copied into the Docker image when you build it)Add a SSH config file, store it as
~/.ssh/config
(no extension):
Host localslurm
HostName host.docker.internal
User slurm
Port 2222
IdentityFile ~/.ssh/id_rsa
StrictHostKeyChecking no
Details
We need to setup our library with SSH access between OMERO and Slurm, but this is not built-in to these containers yet (because Docker actually has a built-in alternative, docker exec
).
Luckily, people have already worked on SSH access into containers too, like here. So let’s borrow their OpenSSH setup and add it to the Dockerfile of the Slurm Control Daemon (slurmctld
):
======= 2a. Make a new Dockerfile for the slurmctld =======
We want to combine the 2 Dockerfiles. However, one is ubuntu
and the other is rockylinux
. The biggest difference is that rockylinux
uses the yum
package manager to install software, instead of apt
. We will stick to the Slurm image as the base image and just add the OpenSSH on top of it.
Turns out, another difference is the use of systemctld
causing all kinds of issues.
So I spent the time to activate OpenSSH server on Rocky linux:
FROM rockylinux:8
... # all the Slurm stuff from original Dockerfile ...
## ------- Setup SSH ------
RUN yum update && yum install openssh-server initscripts sudo -y
# Create a user “sshuser” and group “sshgroup”
# RUN groupadd sshgroup && useradd -ms /bin/bash -g sshgroup sshuser
# Create sshuser directory in home
RUN mkdir -p /home/slurm/.ssh
# Copy the ssh public key in the authorized_keys file. The idkey.pub below is a public key file you get from ssh-keygen. They are under ~/.ssh directory by default.
COPY id_rsa.pub /home/slurm/.ssh/authorized_keys
# change ownership of the key file.
RUN chown slurm:slurm /home/slurm/.ssh/authorized_keys && chmod 600 /home/slurm/.ssh/authorized_keys
# Start SSH service
# RUN service ssh start
# RUN /etc/init.d/sshd start
RUN /usr/bin/ssh-keygen -A
# Expose docker port 22
EXPOSE 22
CMD ["/usr/sbin/sshd","-D"]
# CMD ["slurmdbd"]
We have replaced the slurmdbd
command (CMD) with our setup from sshdocker
, starting a ssh daemon (sshd
) with our SSH public key associated to the slurm
user
.
This last part is important: to build this new version, you need to copy your public SSH key into this Docker image.
This is performed in this line:
# Copy the ssh public key in the authorized_keys file. The idkey.pub below is a public key file you get from ssh-keygen. They are under ~/.ssh directory by default.
COPY id_rsa.pub /home/<user>/.ssh/authorized_keys
So, you need to add your id_rsa.pub
public key to this directory, so Docker can copy it when it builds the image.
Turns out, we also need to change the entrypoint script:
... # other stuff from script
if [ "$1" = "slurmctld" ]
then
echo "---> Starting the MUNGE Authentication service (munged) ..."
gosu munge /usr/sbin/munged
echo "---> Starting SSH Daemon (sshd) ..."
# exec /usr/bin/ssh-keygen -A
exec /usr/sbin/sshd -D &
exec rm /run/nologin &
exec chmod 777 /data &
echo "---> Waiting for slurmdbd to become active before starting slurmctld ..."
... # other stuff from script
We added the command to start the SSH daemon on the CTLD here, where it is actually called.
We also added some quick bugfixes to make the tutorial SSH work.
If you still run into issues with permissions in /data
, login as superuser and also apply write access again.
======= 2b. Tell Docker Compose to use the new Dockerfile for slurmctld
=======
Currently, Docker Compose will spin up all containers from the same Dockerfile definition.
So we will change the Dockerfile for the slurmctld
as defined in the docker-compose.yml
, by replacing image
with build
:
slurmctld:
# image: slurm-docker-cluster:${IMAGE_TAG:-21.08.6}
# Build this image from current folder
# Use a specific file: Dockerfile_slurmctld
build:
context: ./
dockerfile: Dockerfile_slurmctld
command: ["slurmctld"]
container_name: slurmctld
hostname: slurmctld
volumes:
- etc_munge:/etc/munge
- etc_slurm:/etc/slurm
- slurm_jobdir:/data
- var_log_slurm:/var/log/slurm
expose:
- "6817"
ports:
- "2222:22"
depends_on:
- "slurmdbd"
We also mapped port 22 (SSH) from the container to our localhost port 2222. So now we can connect SSH to our localhost and be forwarded to this Slurm container.
Test it out:
Fire up the Slurm cluster:
docker-compose up -d --build
SSH into the control node:
ssh -i C:\Users\<you>\.ssh\id_rsa slurm@localhost -p 2222 -o UserKnownHostsFile=/dev/null
This should connect as the slurm
user to the control container on port 2222 (type yes to connect, we will fix promptless login later).
Last login: Tue Aug 8 15:48:31 2023 from 172.21.0.1
[slurm@slurmctld ~]$
Congratulations!
======= 2c. Add SSH config for simple login =======
But, we can simplify the SSH, and our library needs a simple way to login.
For this, add this config file as your ~/.ssh/config
, no extension. See here for more information.
Of course, first update the values with those you used to SSH before, e.g.:
Host slurm
HostName localhost
User slurm
Port 2222
IdentityFile ~/.ssh/id_rsa
StrictHostKeyChecking no
Then try it out:
ssh slurm
======= StrictHostKeyChecking =======
Note that I added StrictHostKeyChecking no
, as our Slurm container will have different keys all the time. A normal Slurm server likely does not, and won’t require this flag. This is also where we get our pretty warning from:
...> ssh slurm
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The host key changed =)
If you don’t add this flag, it will safe you from danger and deny access. Of course, that is not very useful for our tutorial.
3. Test Slurm
TL;DR:
Spin up the Slurm cluster:
docker-compose up -d --build
SSH into the control node:
ssh localslurm
Start some filler jobs:
sbatch --wrap="sleep 5 && hostname" && sbatch --wrap="sleep 5 && hostname" && sbatch --wrap="sleep 5 && hostname" && sbatch --wrap="sleep 5 && hostname"
Check the progress:
squeue
Check some output, e.g. job 1:
cat slurm-1.out
Details
Now connect via SSH to Slurm, change to /data
(our fileserver shared between the Slurm nodes) and let’s see if Slurm works:
[slurm@slurmctld ~]$ cd /data
[slurm@slurmctld data]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
[slurm@slurmctld data]$
The queue is empty! Let’s fill it up with some short tasks:
[slurm@slurmctld data]$ sbatch --wrap="sleep 5 && hostname" && sbatch --wrap="sleep 5 && hostname" && sbatch --wrap="sleep 5 && hostname" && sbatch --wrap="sleep 5 && hostname"
Submitted batch job 5
Submitted batch job 6
Submitted batch job 7
Submitted batch job 8
[slurm@slurmctld data]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 normal wrap slurm R 0:01 1 c1
8 normal wrap slurm R 0:01 1 c2
[slurm@slurmctld data]$
I fired off 4 jobs that take 2 seconds, so a few remained in the queue by the time I called for an update. You can also see they split over the 2 compute nodes c1
and c2
.
The jobs wrote their stdout output in the current dir (/data
, which is where permission issues might come in):
[slurm@slurmctld data]$ ls
slurm-3.out slurm-4.out slurm-5.out slurm-6.out slurm-7.out slurm-8.out
[slurm@slurmctld data]$ cat slurm-7.out
c1
[slurm@slurmctld data]$ cat slurm-8.out
c2
[slurm@slurmctld data]$
They logged the hostname
command, which returned c1
for some and c2
for others, as those were the hosts the compute was used from.
Now let’s connect OMERO to our Slurm!
4. OMERO & OMERO Slurm Client
Ok, now we need a OMERO server and a correctly configured OMERO Slurm Client.
TL;DR:
Clone my example
docker-example-omero-grid-amc
locally:git clone -b processors https://github.com/TorecLuik/docker-example-omero-grid-amc.git
Fire up the OMERO containers:
docker-compose up -d --build
Go to OMERO.web (
localhost:4080
), loginroot
pwomero
Upload some images (to
localhost
) with OMERO.Insight (e.g. Cells.tiff).In web, run the
slurm/init_environment
script (here)
Details
======= OMERO in Docker =======
You can use your own OMERO setup, but for this tutorial I will refer to a dockerized OMERO that I am working with: get it here.
git clone -b processors https://github.com/TorecLuik/docker-example-omero-grid-amc.git
Let’s (build it and) fire it up:
docker-compose up -d --build
======= OMERO web =======
Once they are running, you should be able to access web at localhost:4080
. Login with user root
/ pw omero
.
Import some example data with OMERO Insight (connect with localhost
).
======= Connect to Slurm =======
This container’s processor node (worker-5
) has already installed our omero-slurm-client
library.
======= Add ssh config to OMERO Processor =======
Ok, so localhost
works fine from your machine, but we need the OMERO processing server worker-5
to be able to do it too, like we did before.
By some smart tricks, we have mounted our ~/.ssh
folder to the worker container, so it knows and can use our SSH settings and config.
However, we need to change the HostName
to match one that the container can understand. localhost
works fine from our machine, but not from within a Docker container. Instead, we need to use host.docker.internal
(documentation).
Host slurm
HostName host.docker.internal
User slurm
Port 2222
IdentityFile ~/.ssh/id_rsa
StrictHostKeyChecking no
Restart your OMERO cluster if you already started it:
docker-compose down
& docker-compose up -d --build
Ok, so now we can connect from within the worker-5 to our Slurm cluster. We can try it out:
...\docker-example-omero-grid> docker-compose exec omeroworker-5 /bin/bash
bash-4.2$ ssh slurm
Last login: Wed Aug 9 13:08:54 2023 from 172.21.0.1
[slurm@slurmctld ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
[slurm@slurmctld ~]$ exit
logout
Connection to host.docker.internal closed.
bash-4.2$ exit
exit
======= slurm-config.ini =======
Let us setup the library’s config file slurm-config.ini correctly.
Now, the omero-slurm-client
library by default expects the Slurm
ssh connection to be called slurm
, but you can adjust it to whatever you named your ssh Host in config.
In this Docker setup, the config file is located at the worker-gpu
folder and in the Dockerfile it is copied to /etc/
, where the library will pick it up.
Let’s use these values:
[SSH]
# -------------------------------------
# SSH settings
# -------------------------------------
# The alias for the SLURM SSH connection
host=slurm
# Set the rest of your SSH configuration in your SSH config under this host name/alias
# Or in e.g. /etc/fabric.yml (see Fabric's documentation for details on config loading)
[SLURM]
# -------------------------------------
# Slurm settings
# -------------------------------------
# General settings for where to find things on the Slurm cluster.
# -------------------------------------
# PATHS
# -------------------------------------
# The path on SLURM entrypoint for storing datafiles
#
# Note:
# This example is relative to the Slurm user's home dir
slurm_data_path=/data/my-scratch/data
# The path on SLURM entrypoint for storing container image files
#
# Note:
# This example is relative to the Slurm user's home dir
slurm_images_path=/data/my-scratch/singularity_images/workflows
# The path on SLURM entrypoint for storing the slurm job scripts
#
# Note:
# This example is relative to the Slurm user's home dir
slurm_script_path=/data/my-scratch/slurm-scripts
We have put all the storage paths on /data/my-scratch/
and named the SSH Host connection slurm
.
The other values we can keep as default, except we don’t have a GPU, so let’s turn that off for CellPose:
# -------------------------------------
# CELLPOSE SEGMENTATION
# -------------------------------------
# The path to store the container on the slurm_images_path
cellpose=cellpose
# The (e.g. github) repository with the descriptor.json file
cellpose_repo=https://github.com/TorecLuik/W_NucleiSegmentation-Cellpose/tree/v1.2.7
# The jobscript in the 'slurm_script_repo'
cellpose_job=jobs/cellpose.sh
# Override the default job values for this workflow
# Or add a job value to this workflow
# For more examples of such parameters, google SBATCH parameters.
# If you don't want to override, comment out / delete the line.
# Run CellPose Slurm with 10 GB GPU
# cellpose_job_gres=gpu:1g.10gb:1
# Run CellPose Slurm with 15 GB CPU memory
cellpose_job_mem=15GB
The gres
will request a 10GB GPU on the Slurm cluster, but we only set up CPU docker slurm.
We will also comment out some of the other algorithms, so we have to download less containers to our Slurm cluster and speed up the tutorial.
This brings us to the following configuration file:
[SSH]
# -------------------------------------
# SSH settings
# -------------------------------------
# The alias for the SLURM SSH connection
host=slurm
# Set the rest of your SSH configuration in your SSH config under this host name/alias
# Or in e.g. /etc/fabric.yml (see Fabric's documentation for details on config loading)
[SLURM]
# -------------------------------------
# Slurm settings
# -------------------------------------
# General settings for where to find things on the Slurm cluster.
# -------------------------------------
# PATHS
# -------------------------------------
# The path on SLURM entrypoint for storing datafiles
#
# Note:
# This example is relative to the Slurm user's home dir
slurm_data_path=/data/my-scratch/data
# The path on SLURM entrypoint for storing container image files
#
# Note:
# This example is relative to the Slurm user's home dir
slurm_images_path=/data/my-scratch/singularity_images/workflows
# The path on SLURM entrypoint for storing the slurm job scripts
#
# Note:
# This example is relative to the Slurm user's home dir
slurm_script_path=/data/my-scratch/slurm-scripts
# -------------------------------------
# REPOSITORIES
# -------------------------------------
# A (github) repository to pull the slurm scripts from.
#
# Note:
# If you provide no repository, we will generate scripts instead!
# Based on the job_template and the descriptor.json
#
# Example:
#slurm_script_repo=https://github.com/TorecLuik/slurm-scripts
slurm_script_repo=
# -------------------------------------
# Processing settings
# -------------------------------------
# General/default settings for processing jobs.
# Note: NOT YET IMPLEMENTED
# Note: If you need to change it for a specific case only,
# you should change the job script instead, either in OMERO or Slurm
[MODELS]
# -------------------------------------
# Model settings
# -------------------------------------
# Settings for models/singularity images that we want to run on Slurm
#
# NOTE: keys have to be unique, and require a <key>_repo and <key>_image value as well.
#
# NOTE 2: Versions for the repo are highly encouraged!
# Latest/master can change and cause issues with reproducability!
# We pickup the container version based on the version of the repository.
# For generic master branch, we pick up generic latest container.
# -------------------------------------
# CELLPOSE SEGMENTATION
# -------------------------------------
# The path to store the container on the slurm_images_path
cellpose=cellpose
# The (e.g. github) repository with the descriptor.json file
cellpose_repo=https://github.com/TorecLuik/W_NucleiSegmentation-Cellpose/tree/v1.2.7
# The jobscript in the 'slurm_script_repo'
cellpose_job=jobs/cellpose.sh
# Override the default job values for this workflow
# Or add a job value to this workflow
# For more examples of such parameters, google SBATCH parameters.
# If you don't want to override, comment out / delete the line.
# Run CellPose Slurm with 10 GB GPU
# cellpose_job_gres=gpu:1g.10gb:1
# Run CellPose Slurm with 15 GB CPU memory
cellpose_job_mem=15GB
# -------------------------------------
# # STARDIST SEGMENTATION
# # -------------------------------------
# # The path to store the container on the slurm_images_path
# stardist=stardist
# # The (e.g. github) repository with the descriptor.json file
# stardist_repo=https://github.com/Neubias-WG5/W_NucleiSegmentation-Stardist/tree/v1.3.2
# # The jobscript in the 'slurm_script_repo'
# stardist_job=jobs/stardist.sh
# -------------------------------------
# CELLPROFILER SEGMENTATION
# # -------------------------------------
# # The path to store the container on the slurm_images_path
# cellprofiler=cellprofiler
# # The (e.g. github) repository with the descriptor.json file
# cellprofiler_repo=https://github.com/Neubias-WG5/W_NucleiSegmentation-CellProfiler/tree/v1.6.4
# # The jobscript in the 'slurm_script_repo'
# cellprofiler_job=jobs/cellprofiler.sh
# -------------------------------------
# DEEPCELL SEGMENTATION
# # -------------------------------------
# # The path to store the container on the slurm_images_path
# deepcell=deepcell
# # The (e.g. github) repository with the descriptor.json file
# deepcell_repo=https://github.com/Neubias-WG5/W_NucleiSegmentation-DeepCell/tree/v.1.4.3
# # The jobscript in the 'slurm_script_repo'
# deepcell_job=jobs/deepcell.sh
# -------------------------------------
# IMAGEJ SEGMENTATION
# # -------------------------------------
# # The path to store the container on the slurm_images_path
# imagej=imagej
# # The (e.g. github) repository with the descriptor.json file
# imagej_repo=https://github.com/Neubias-WG5/W_NucleiSegmentation-ImageJ/tree/v1.12.10
# # The jobscript in the 'slurm_script_repo'
# imagej_job=jobs/imagej.sh
# # -------------------------------------
# # CELLPROFILER SPOT COUNTING
# # -------------------------------------
# The path to store the container on the slurm_images_path
cellprofiler_spot=cellprofiler_spot
# The (e.g. github) repository with the descriptor.json file
cellprofiler_spot_repo=https://github.com/TorecLuik/W_SpotCounting-CellProfiler/tree/v1.0.1
# The jobscript in the 'slurm_script_repo'
cellprofiler_spot_job=jobs/cellprofiler_spot.sh
# # -------------------------------------
# CELLEXPANSION SPOT COUNTING
# -------------------------------------
# The path to store the container on the slurm_images_path
cellexpansion=cellexpansion
# The (e.g. github) repository with the descriptor.json file
cellexpansion_repo=https://github.com/TorecLuik/W_CellExpansion/tree/v1.0.1
# The jobscript in the 'slurm_script_repo'
cellexpansion_job=jobs/cellexpansion.sh
======= Init environment =======
Now we go to OMERO web and run the slurm/init_environment
script to apply this config and setup our Slurm. We will use the default location, no need to fill in anything, just run the script.
Note, this will take a while, since it is downloading workflow docker images and building (singularity) containers from them.
Congratulations! We have setup workflows CellPose v1.2.7
, Cellprofiler Spot v1.0.1
and CellExpansion v1.0.1
. And there are no data files yet.
Let’s go run some segmentation workflow then!
5. Workflows!
TL;DR:
In web, select your images and run script
slurm/SLURM Run Workflow
Tick off
E-mail
box (not implemented in this Slurm docker setup)For importing results, change
3a) Import into NEW Dataset
toCellPose_Masks
For importing results, change
3b) Rename the imported images
to{original_file}_cpmask.{ext}
Select
cellpose
, but tick offuse_gpu
off (sadly not implemented in this docker setup)Click
Run Script
Check activity window (or get a coffee), it should take a few minutes (about 3m:30s for 4 256x256 images for me) and then say (a.o.):
COMPLETE
Or it
FAILED
, in which case you should check all the details anyway and get your hands dirty with debugging! Or try less and smaller images.
Refresh your Explore window, there should be a new dataset
CellPose_Masks
with a mask for every input image.
Details
So, I hope you added some data already; if not, import some images now.
Let’s run slurm/SLURM Run Workflow
:
You can see that this script recognized that we downloaded 3 workflows, and what their parameters are. For more information on this magic, follow the other tutorials.
Let’s select cellpose
and click use gpu
off (sadly). Tune the other parameters as you like for your images. Also, for output let’s select Import into NEW Dataset
by filling in a dataset name: cellpose_images. Click Run Script
.
Result: Job 1 is FAILED. Turns out, our Slurm doesn’t have the compute nodes to execute this operation.
======= Improve Slurm =======
Update the slurm.conf
file in the git repository.
# COMPUTE NODES
NodeName=c[1-2] RealMemory=5120 CPUs=8 State=UNKNOWN
Here, 5GB and 8 CPU each should do the trick!
Rebuild the containers. Note that the config is on a shared volume, so we have to destroy that volume too (it took some headbashing to find this out):
docker-compose down --volumes
docker-compose up --build
That should take you through connecting OMERO with a local Slurm setup.
Batching
Try slurm/SLURM Run Workflow Batched
(here)[https://github.com/NL-BioImaging/omero-slurm-scripts/blob/master/workflows/SLURM_Run_Workflow_Batched.py] to see if there is any speedup by splitting your images over multiple jobs/batches.
We have installed 2 nodes in this Slurm cluster, so you could make 2 batches of half the images and get your results quicker. However we are also limited to compute 2 jobs in parallel, so smaller (than half) batches will just wait in the queue (with some overhead) and probably take longer in total.
Let’s check on the Slurm node:
$ sacct --starttime "2023-06-13T17:00:00" --format Jobid,State,start,end,JobName%-18,Elapsed -n -X --endtime "now"
In my latest example, it was 1 minute (30%) faster to have 2 batches/jobs (32
& 33
) vs 1 job (31
):
31 COMPLETED 2023-08-23T08:41:28 2023-08-23T08:45:02 omero-job-cellpose 00:03:34
32 COMPLETED 2023-08-23T09:22:00 2023-08-23T09:24:27 omero-job-cellpose 00:02:27
33 COMPLETED 2023-08-23T09:22:03 2023-08-23T09:24:40 omero-job-cellpose 00:02:37
Google Cloud Slurm tutorial
Introduction
This library is meant to be used with some external HPC cluster using Slurm, to offload your (OMERO) compute to servers suited for it.
However, if you don’t have ready access (yet) to such a cluster, you might want to spin some test environment up in the Cloud and connect your (local) OMERO to it. This is what we will cover in this tutorial, specifically Google Cloud.
0. Requirements
To follow this tutorial, you need:
Git
Docker
OMERO Insight
A creditcard (but we’ll work with free credits)
I use Windows here, but it should work on Linux/Mac too. If not, let me know.
I provide ready-to-go TL;DR, but in the details of each chapter I walk through the steps I took to make these containers ready.
1. Setup Google Cloud for Slurm
TL;DR:
Follow this tutorial from Google Cloud. Click ‘guide me’.
Make a new Google Account to do this, with free $300 credits to use Slurm for a bit. This requires the creditcard (but no cost).
Details
So, we follow this tutorial and end up with a hpcsmall
VM on Google Cloud.
However, we are missing an ingredient: SSH access!
2. Add SSH access
TL;DR:
Add your public SSH key (
~/.ssh/id_rsa.pub
) to the Google Cloud instance, like here. Easiest is with Cloud shell, upload your public key, and rungcloud compute os-login ssh-keys add --key-file=id_rsa.pub
Turn the firewall setting (e.g.
hpc-small-net-fw-allow-iap-ingress
) to allow0.0.0.0/0
as IP ranges fortcp:22
.Promote the login node’s IP address to a static one: here
Copy that IP and your username.
On your own computer, add a SSH config file, store it as
~/.ssh/config
(no extension) with the ip and user filled in:
Host gcslurm
HostName <fill-in-the-External-IP-of-VM-instance>
User <fill-in-your-Google-Cloud-user>
Port 22
IdentityFile ~/.ssh/id_rsa
Details
We need to setup our library with SSH access between OMERO and Slurm, but this is not built-in to these Virtual Machines yet. We will forward our local SSH to our OMERO (in this tutorial), so we just need to setup SSH access to the Google Cloud VMs.
This sounds easier than it actually is.
Follow the steps at here:
Note that this tutorial by default seems to use the “OS Login” method, using the mail account you signed up with.
Open a Cloud Shell
Upload your public key to this Cloud Shell (with the
...
button).Run the
gcloud compute os-login ssh-keys add --key-file=id_rsa.pub
command they show, pointing at your newly uploaded public key. Leave out the optionalproject
andexpire_time
.
Then, we have to ensure that the firewall accepts requests from outside Google Cloud, if it doesn’t already.
Go to the firewall settings and edit the tcp:22 (e.g. hpc-small-net-fw-allow-iap-ingress
) and add the 0.0.0.0/0
ip ranges.
Now we are ready:
ssh -i ~/.ssh/id_rsa <fill-in-your-Google-Cloud-user>@<fill-in-the-External-IP-of-VM-instance>
E.g. my Google Cloud user became t_t_luik_amsterdamumc_nl
, related to the email I signed up with.
The External IP was on the VM instances page for the login node hpcsmall-login-2aoamjs0-001
.
Now to make this connection easy, we need 2 steps:
Fix this external IP address, so that it will always be the same
Fix a SSH config file for this SSH connection
For 1, we got to here and follow the Console steps to promote the IP address to a static IP address. Now back in the All
screen, your newly named Static IP address should show up. Copy that IP (it should be the same IP as before, but now it will not change when you restart the system)
For 2, On your own computer, add a SSH config file, store it as ~/.ssh/config
(no extension) with the ip and user filled in:
Host gcslurm
HostName <fill-in-the-External-IP-of-VM-instance>
User <fill-in-your-Google-Cloud-user>
Port 22
IdentityFile ~/.ssh/id_rsa
Now you should be able to login with a simple: ssh gcslurm
.
Congratulations!
3. Test Slurm
TL;DR:
SSH into the login node:
ssh gcslurm
Start some filler jobs:
sbatch --wrap="sleep 5 && hostname" && sbatch --wrap="sleep 5 && hostname" && sbatch --wrap="sleep 5 && hostname" && sbatch --wrap="sleep 5 && hostname"
Check the progress:
squeue
Check some output when its done, e.g. job 1:
cat slurm-1.out
Details
Now connect via SSH to Google Cloud Slurm and let’s see if Slurm works:
[t_t_luik_amsterdamumc_nl@hpcsmall-login-2aoamjs0-001 ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
[t_t_luik_amsterdamumc_nl@hpcsmall-login-2aoamjs0-001 ~]$ sbatch --wrap="sleep 5 && hostname" && sbatch --wrap="sleep 5 && hostname" && sbatch --wrap="sleep 5 && hostname" && sbatch --wrap="sleep 5 && hostname"
Submitted batch job 4
Submitted batch job 5
Submitted batch job 6
Submitted batch job 7
[t_t_luik_amsterdamumc_nl@hpcsmall-login-2aoamjs0-001 ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
4 debug wrap t_t_luik CF 0:03 1 hpcsmall-debug-ghpc-3
5 debug wrap t_t_luik PD 0:00 1 (Resources)
6 debug wrap t_t_luik PD 0:00 1 (Priority)
7 debug wrap t_t_luik PD 0:00 1 (Priority)
I fired off 4 jobs that take some seconds, so they are still in the queue by the time I call the squeue
. Note that the first one might take a while since Google Cloud has to fire up a new compute node for the first time.
The jobs wrote their stdout output in the current dir:
[t_t_luik_amsterdamumc_nl@hpcsmall-login-2aoamjs0-001 ~]$ ls
slurm-4.out slurm-5.out slurm-6.out slurm-7.out
[t_t_luik_amsterdamumc_nl@hpcsmall-login-2aoamjs0-001 ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
[t_t_luik_amsterdamumc_nl@hpcsmall-login-2aoamjs0-001 ~]$ cat slurm-4.out
hpcsmall-debug-ghpc-3
[t_t_luik_amsterdamumc_nl@hpcsmall-login-2aoamjs0-001 ~]$ cat slurm-5.out
hpcsmall-debug-ghpc-3
All on the same node that was spun up, on-demand, by Google Cloud. You should be able to see it still alive in the VM instances
tab as well. It will be destroyed again if not used for a while, saving you costs.
3b. Install requirements: Singularity / Apptainer and 7zip
TL;DR:
Follow this guide to install Singularity, but in step 5 please install in
/opt/apps
!/apps
is not actually shared with all nodes.Execute the following to update
~/.bashrc
:
echo 'export PATH=/apps/singularity/3.8.7/bin:/usr/sbin:${PATH}' >> ~/.bashrc && source ~/.bashrc
Install 7zip:
sudo yum install -y p7zip p7zip-plugins
Now we want to run containers on our Slurm cluster using singularity
, but this is not installed by default.
Luckily the folks at Google have a guide for it, so let’s follow that one.
If the ssh connection to the login node doesn’t work from Google Cloud Shell, you can continue with the steps by using the SSH connection (ssh gcslurm
) that we just built from your local commandline.
Use this URL for the singularity tar:
https://github.com/apptainer/singularity/releases/download/v3.8.7/singularity-3.8.7.tar.gz
wget https://github.com/apptainer/singularity/releases/download/v3.8.7/singularity-3.8.7.tar.gz && tar -xzf singularity-${SINGULARITY_VERSION}.tar.gz && cd singularity-${SINGULARITY_VERSION}
The module step did not work for me, because it is the wrong directory in the guide!
In step 5, we need to install to /opt/apps
instead! This is very important because the compute nodes that have to execute the job need to have access to this software too, and this directory is the actual shared directory:
./mconfig --prefix=/opt/apps/singularity/${SINGULARITY_VERSION} && \
make -C ./builddir && \
sudo make -C ./builddir install
Now module avail
should list singularity
.
So module load singularity
and now singularity --version
should give you singularity version 3.8.7
.
Now let’s connect OMERO to our Slurm!
4. OMERO & OMERO Slurm Client
Ok, now we need a OMERO server and a correctly configured OMERO Slurm Client.
TL;DR:
Clone my example
docker-example-omero-grid-amc
locally:git clone -b processors https://github.com/TorecLuik/docker-example-omero-grid-amc.git
Change the
worker-gpu/slurm-config.ini
file to point toworker-gpu/slurm-config.gcslurm.ini
file (if it is not the same file already)Fire up the OMERO containers:
docker-compose up -d --build
Go to OMERO.web (
localhost:4080
), loginroot
pwomero
Upload some images (to
localhost
) with OMERO.Insight (not included).In web, run the
slurm/init_environment
script
Details
======= OMERO in Docker =======
You can use your own OMERO setup, but for this tutorial I will refer to a dockerized OMERO that I am working with: get it here.
git clone -b processors https://github.com/TorecLuik/docker-example-omero-grid-amc.git
Change the worker-gpu/slurm-config.ini
file to be the worker-gpu/slurm-config.gcslurm.ini
file (if it is not the same file already).
What we did was point to gcslurm
profile (or rename your SSH profile to slurm
)
[SSH]
# -------------------------------------
# SSH settings
# -------------------------------------
# The alias for the SLURM SSH connection
host=gcslurm
And we also set all directories to be relative to the home dir, and we reduced CellPose CPU drastically to fit into the small Slurm cluster we made in Google Cloud.
This way, it will use the right SSH setting to connect with our Google Cloud Slurm.
Let’s (build it and) fire it up:
docker-compose up -d --build
======= OMERO web =======
Once they are running, you should be able to access web at localhost:4080
. Login with user root
/ pw omero
.
Import some example data with OMERO Insight (connect with localhost
).
======= Connect to Slurm =======
This container’s processor node (worker-5
) has already installed our omero-slurm-client
library.
======= Add ssh config to OMERO Processor =======
Ok, so SSH works fine from your machine, but we need the OMERO processing server worker-5
to be able to do it too.
By some smart tricks, we have mounted our ~/.ssh
folder to the worker container, so it knows and can use our SSH settings and config.
Ok, so now we can connect from within the worker-5 to our Slurm cluster. We can try it out:
...\docker-example-omero-grid> docker-compose exec omeroworker-5 /bin/bash
bash-4.2$ ssh gcslurm
<pretty-slurm-art>
[t_t_luik_amsterdamumc_nl@hpcsmall-login-2aoamjs0-001 ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
======= Init environment =======
Now we go to OMERO web and run the slurm/init_environment
script to apply this config and setup our Slurm. We will use the default location, no need to fill in anything, just run the script.
Note, this will take a while, since it is downloading workflow docker images and building (singularity) containers from them.
Congratulations! We have setup workflows CellPose v1.2.7
, Cellprofiler Spot v1.0.1
and CellExpansion v1.0.1
. And there are no data files yet.
Let’s go run some segmentation workflow then!
5. Workflows!
TL;DR:
In web, select your images and run script
slurm/SLURM Run Workflow
Tick off
E-mail
box (not implemented in this Slurm docker setup)For importing results, change
3a) Import into NEW Dataset
toCellPose_Masks
For importing results, change
3b) Rename the imported images
to{original_file}_cpmask.{ext}
Select
cellpose
, but tick offuse_gpu
off (sadly not implemented in this docker setup)Click
Run Script
Now go get a coffee or something, it should take a lot of minutes (about 12m:30s for 4 256x256 images for me!) and then say (a.o.):
COMPLETE
Or it
FAILED
, in which case you should check all the details anyway and get your hands dirty with debugging! Or try less and smaller images.
Refresh your Explore window, there should be a new dataset
CellPose_Masks
with a mask for every input image.
Details
So, I hope you added some data already; if not, import some images now.
Let’s run slurm/SLURM Run Workflow
:
You can see that this script recognized that we downloaded 3 workflows, and what their parameters are. For more information on this magic, follow the other tutorials.
Let’s select cellpose
and click use gpu
off (sadly). Tune the other parameters as you like for your images. Also, for output let’s select Import into NEW Dataset
by filling in a dataset name: cellpose_images. Click Run Script
.
This will take ages because we did not invest in good compute on the Slurm cluster. It took 12m:30s for 4 small images for me.
You can check the progress with the Slurm Get Update
script.
That should take you through connecting OMERO with a Google Cloud Slurm setup!
Microsoft Azure Slurm tutorial
Introduction
This library is meant to be used with some external HPC cluster using Slurm, to offload your (OMERO) compute to servers suited for it.
However, if you don’t have ready access (yet) to such a cluster, you might want to spin some test environment up in the Cloud and connect your (local) OMERO to it. This is what we will cover in this tutorial, specifically Microsoft Azure.
0. Requirements
To follow this tutorial, you need:
OMERO Insight
An Azure account (and credits)
I try to provide a tl;dr when I can, otherwise I go step by step.
1. Setup Microsoft Azure for Slurm
TL;DR:
Make a new Azure account if you don’t have one. Hopefully you get/have some free credits.
Create an new App “BIOMERO” via “App registrations”
Copy Application ID
Copy Application Secret
Assign roles to App “BIOMERO”:
“Azure Container Storage Operator” role on the Subscription (or probably on the Resource Group works too)
“Virtual Machine Contributor” role on the Resource Group (“biomero-public”)
“Network Contributor” role on the Resource Group (“biomero-public”)
Create storageaccount “biomerostorage” in the “biomero-public” Resource Group
Mainly: Follow this video tutorial from Microsoft Azure
However, note that I actually have trouble with their specific version of Slurm in CycleCloud and the default version works fine. Checkout the
details
below for more details on this part.
Probably use something cheaper than 4x the expensive
Standard_ND96amsr_A100_v4
instances, unless you are really rich!Note: Use
Ds
notDas
orDes
VM types, if you run intosecurity type <null>
errors in deployment.
We need a Slurm accounting database for BIOMERO! See
1 - Addendum
chapter below for setting one up, if you don’t have a database.Add a public key to your Azure CycleCloud profile. Probably use the
hpc-slurm-cluster_key
that you can find in your Resource Group.Now you should be able to login to the Slurm scheduler with something like
ssh -i C:\<path-to-my>\hpc-slurm-cluster_key.pem azureadmin@<scheduler-vm-public-ip>
Change the cloud-init to install Singularity and 7zip on your nodes.
Details
So, we follow this tutorial and end up with a hpc-slurm-cluster
(that’s what I named the VM) VM on Microsoft Azure. It also downloaded the SSH private key for us (hpc-slurm-cluster_key.pem
).
Suggested alternative: use a basic Slurm cluster
CycleCloud already comes with a basic Slurm setup, that is more up-to-date than this specific GPU powered version. Especially if you will not use GPU anyway (because $$).
So, given you followed the movie to get a CycleCloud VM up and running, let’s setup a basic Slurm cluster instead.
Let’s start that up:
Click
+
/Add
for a new cluster and selectSlurm
(instead ofcc-slurm-ngc
)We provide a new name
biomero-cluster-basic
We change all the VM types:
Scheduler:
Standard_DC4s_v3
HPC, HTC and Dyn:
Standard_DC2s_v3
Login node we will not use so doesn’t matter (
Num Login Nodes
stays0
)
We change the scaling amount to only 4 cores each (instead of 100), and MaxVMs to 2.
Change the network to the default biomero network
Next, keep all the
Network Attached Storage
settingsNext,
Advanced Settings
Here, we need to do 2 major things:
First, add the Slurm accounting database. See
1 - Addendum
chapter below for setting that up.Second, select appropriate VM images that will work for our software (mainly
singularity
for containers):Ubuntu 22.04 LTS
worked for us.
Next, keep all the security settings
Finally, let’s change Cloud init for all nodes, to install singularity and 7zip:
#cloud-config
package_upgrade: true
packages:
- htop
- wget
- p7zip-full
- software-properties-common
runcmd:
- 'sudo add-apt-repository -y ppa:apptainer/ppa'
- 'sudo apt update'
- 'sudo apt install -y apptainer'
Apptainer is singularity and will provide the singularity (alias) command.
1 - Addendum - Setting up Slurm Accounting DB
Expected price: 16.16 euro / month
We follow this recent blog
Note that
Western Europe
has deployment issues on Azure for these DB. SeeDetails
for more details.
Details
A. Create extra subnet on your virtual network
Go to your
hpc-slurm-cluster-vnet
Go to
Subnets
SettingsCreate a new subnet with
+ Subnet
, namedmysql
(and default settings)
B. Azure database
We create a MySQL Flexible Server
Server name
slurm-accounting-database
(or whatever is available)Region
Western Europe
(same as your server/VNET).See Notes below about issues (and solutions) in
Western Europe
at the time of writing.
MySQL version
8.0
Workload type
For development
(unless you are being serious)Authentication method
MySQL authentication only
User credentials that you like, I used
omero
user.Next: Networking
Connectivity method:
Private access (VNet Integration)
Virtual network: select your existing
hpc-slurm-cluster-vnet
netSubnet: select your new subnet
hpc-slurm-cluster-vnet/mysql
Private DNS, let it create or use existing one.
Next: deploy it!
Next, let’s change some Server parameters according to the blog. I think this is optional though.
Go to your
slurm-accounting-database
in the Azure portalOpen on left-hand side
Server parameters
Click on
All
Filter on
innodb_lock_wait_timeout
Change value to
900
Save changes
Note! Availability issue in Western Europe
in march 2024:
We had an issue deploying the database in Western Europe
, apparently it is full there. So we deployed the database in UK South
instead. If you have different regions, you need to connect the VNETs of both regions though, through something called peering
!
For this to work, make sure the IPs of the subnets do not overlap, see A
where we made an extra subnet with different IP.
We made some extra
biomero-public-vnet
with amysql
subnet on the10.1.0.0/24
range. Make sure it is alsoDelegated to
Microsoft.DBforMySQL/flexibleServers
.Remove the
default
subnetRemove the
10.0.0.0/24
address space (as we will connect the other vnet here)Then go to the
biomero-public-vnet
,Peerings
and+ Add
a new peering.First named
hpc
, default settingsRemote named
acct
, connecting to thehpc-slurm-cluster-vnet
.
C. Slurm Accounting settings
Ok, now back in CycleCloud, we will set Slurm Accounting:
Edit cluster
Advanced Settings
Check the
Configure Slurm job accounting
box
Slurm DBD URL
will be your chosen Server name (check the Azure portal). For me it wasslurm-accounting-database.mysql.database.azure.com
.Slurm DBD User
and... Password
are what entered in deployment for the DB.SSL Certificate URL is
https://dl.cacerts.digicert.com/DigiCertGlobalRootCA.crt.pem
(Re)start your Slurm cluster.
Test out if the
sacct
command works!
2. Test Slurm
SSH into the login node:
ssh gcslurm
Start some filler jobs:
sbatch --wrap="sleep 5 && hostname" && sbatch --wrap="sleep 5 && hostname" && sbatch --wrap="sleep 5 && hostname" && sbatch --wrap="sleep 5 && hostname"
Check the progress:
squeue
(perhaps also check Azure CycleCloud to see your HPC VMs spinning up, takes a few min)Check some output when its done, e.g. job 1:
cat slurm-1.out
3. Test Singularity on Slurm
For example, run:
sbatch -n 1 --wrap "hostname > lolcow.log && singularity run docker://godlovedc/lolcow >> lolcow.log"
This should say “Submitted batch job 1” Then let’s tail the logfile:
tail -f lolcow.log
First we see the slurm node that is computing, and later we will see the funny cow.
[slurm@slurmctld data]$ tail -f lolcow.log
c1
_______________________________________
/ Must I hold a candle to my shames? \
| |
| -- William Shakespeare, "The Merchant |
\ of Venice" /
---------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
Exit logs with CTRL+C
, and the server with exit
, and enjoy your Azure Slurm cluster.
5. Setting up (BI)OMERO in Azure too (Optional)
We will install (BI)OMERO on the CycleCloud VM that we have running anyway. Alternatively, you connect your local (BI)OMERO to this cluster now.
SSH into your CycleCloud VM,
hpc-slurm-cluster
asazureuser
ssh -i C:\<path-to>\hpc-slurm-cluster_key.pem azureuser@<public-ip>
Ensure it works so you can look at the lolcow again
docker run godlovedc/lolcow
Ok, good enough.
Now let’s pull an easy BIOMERO setup from NL-BIOMERO onto our VM:
git clone https://github.com/Cellular-Imaging-Amsterdam-UMC/NL-BIOMERO.git
Let’s test it:
docker compose up -d --build
Now we need to open the OMERO web port to view it
4080
.
First, go to Azure portal and click on your VM
hpc-slurm-cluster
Second, go to Networking > Network settings
Third,
Create port rules
>Inbound port rule
Destination port ranges
4080
, ProtocolTCP
, NameOMEROWEB
. Add it. And wait a bit for it to take effect.
Test it! Open your web browser at
<public-ip>:4080
and login withroot
/omero
Good start!
Now let’s connect BIOMERO to our HPC Slurm cluster:
Copy the SSH private key
hpc-slurm-cluster_key.pem
(from chapter 1) to the (CycleCloud/OMERO) server:
scp -c C:\<path>\hpc-slurm-cluster_key.pem C:\<path>\hpc-slurm-cluster_key.pem azureuser@<public-ip>:~
Copy your key on the server into
~/.ssh
and change permissions, log in:
cp hpc-slurm-cluster_key.pem .ssh
sudo chmod 700 .ssh/hpc-slurm-cluster_key.pem
ssh -i .ssh/hpc-slurm-cluster_key.pem azureadmin@<scheduler-ip>
Great, exit back to the CycleCloud server.
The IP of the scheduler (this changes whenever you create a new cluster!) is shown in the Azure CycleCloud screen, when you click on the active scheduler node.
Create a config to setup an alias for the SSH
vi ~/.ssh/config
press
i
to insert textcopy paste / fill in the config:
Host localslurm
HostName <scheduler-ip>
User azureadmin
Port 22
IdentityFile ~/.ssh/hpc-slurm-cluster_key.pem
StrictHostKeyChecking no
Fill in the actual ip, this is just a placeholder!
Save with escape followed by
:wq
chmod the config to 700 too:
sudo chmod 700 .ssh/config
Ready!
ssh localslurm
(or whatever you called the alias)
Let’s edit the BIOMERO configuration
slurm-config.ini
, located in the biomeroworker node
vi ~/NL-BIOMERO/biomeroworker/slurm-config.ini
Change the
host
if you did not use thelocalslurm
alias in the config above.Change ALL the
[SLURM]
paths to match our new slurm setup:
[SLURM]
# -------------------------------------
# Slurm settings
# -------------------------------------
# General settings for where to find things on the Slurm cluster.
# -------------------------------------
# PATHS
# -------------------------------------
# The path on SLURM entrypoint for storing datafiles
#
# Note:
# This example is relative to the Slurm user's home dir
slurm_data_path=data
# The path on SLURM entrypoint for storing container image files
#
# Note:
# This example is relative to the Slurm user's home dir
slurm_images_path=singularity_images/workflows
# The path on SLURM entrypoint for storing the slurm job scripts
#
# Note:
# This example is relative to the Slurm user's home dir
slurm_script_path=slurm-scripts
Save the file again with escape +
:wq
Now we need to do some Linux shenanigans to mount ssh properly into the container
First, create a empty .pub file that we are missing:
touch ~/.ssh/empty.pub
Second, chmod the .ssh folder and its contents fully open so Docker can access it:
chmod -R 777 ~/.ssh
Note, if you later want to SSH from commandline again (instead of letting BIOMERO do it), just change the rights back to 700 (
chmod -R 700 ~/.ssh
). This is just a Linux container building temporary permission thing.
Now we will (re)start / (re)build the BIOMERO servers again
cd NL-BIOMERO
docker compose down
docker compose up -d --build
Now
docker logs -f nl-biomero-biomeroworker-1
should show some good logs leading to:Starting node biomeroworker
.
6. Showtime!
Go to your OMERO web at
http://<your-VM-ip>:4080/
(root
/omero
)Let’s initialize BIOMERO:
Run Script
>biomero
>init
>SLURM Init environment...
; run that scriptWhile we’re waiting for that to complete, let’s checkout the basic connection:
Run Script
>biomero
>Example Minimal Slurm Script...
;
Uncheck the
Run Python
box, as we didn’t install thatCheck the
Check SLURM status
boxCheck the
Check Queue
boxRun the script, you should get something like this, an empty queue:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
You can try some other ones, e.g. check the
Check Cluster
box instead:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
dynamic up infinite 0 n/a
hpc* up infinite 2 idle~ biomero-cluster-basic-hpc-[1-2]
htc up infinite 2 idle~ biomero-cluster-basic-htc-[1-2]
Note if you click the
i
button next to the output, you can see the output printed in a lot more detail and better formatting. Especially if you ran multiple commands at the same time.
At some point, the init script will be done, or you get a
Ice.ConnectionLostException
(which means it took too long).
Let’s see what BIOMERO created! Run
Run Script
>biomero
>Example Minimal Slurm Script...
;Uncheck the
Run Python
boxCheck the
Run Other Commmand
boxChange the Linux Command to
ls -la **/*
(we want to check all subfolders too).Run it. Press the
i
button for proper formatting and scroll down to see what we made
=== stdout ===
-rw-r--r-- 1 azureadmin azureadmin 1336 Mar 21 17:44 slurm-scripts/convert_job_array.sh
my-scratch/singularity_images:
total 0
drwxrwxr-x 3 azureadmin azureadmin 24 Mar 21 17:31 .
drwxrwxr-x 3 azureadmin azureadmin 32 Mar 21 17:31 ..
drwxrwxr-x 2 azureadmin azureadmin 117 Mar 21 17:45 converters
singularity_images/workflows:
total 16
drwxrwxr-x 6 azureadmin azureadmin 126 Mar 21 17:31 .
drwxrwxr-x 3 azureadmin azureadmin 23 Mar 21 17:31 ..
drwxrwxr-x 2 azureadmin azureadmin 40 Mar 21 17:42 cellexpansion
drwxrwxr-x 2 azureadmin azureadmin 54 Mar 21 17:36 cellpose
drwxrwxr-x 2 azureadmin azureadmin 52 Mar 21 17:40 cellprofiler_spot
-rw-rw-r-- 1 azureadmin azureadmin 695 Mar 21 17:45 pull_images.sh
-rw-rw-r-- 1 azureadmin azureadmin 10802 Mar 21 17:45 sing.log
drwxrwxr-x 2 azureadmin azureadmin 43 Mar 21 17:44 spotcounting
slurm-scripts/jobs:
total 16
drwxrwxr-x 2 azureadmin azureadmin 100 Mar 21 17:31 .
drwxrwxr-x 3 azureadmin azureadmin 46 Mar 21 17:31 ..
-rw-rw-r-- 1 azureadmin azureadmin 3358 Mar 21 17:44 cellexpansion.sh
-rw-rw-r-- 1 azureadmin azureadmin 3406 Mar 21 17:44 cellpose.sh
-rw-rw-r-- 1 azureadmin azureadmin 3184 Mar 21 17:44 cellprofiler_spot.sh
-rw-rw-r-- 1 azureadmin azureadmin 3500 Mar 21 17:44 spotcounting.sh
Or better yet, run this linux command for full info on all (non-hidden) subdirectories:
find . -type d -not -path '*/.*' -exec ls -la {} +
. This should show that we downloaded some of the workflows to our Slurm cluster already:
./singularity_images/workflows:
total 16
drwxrwxr-x 6 azureadmin azureadmin 126 Mar 21 17:31 .
drwxrwxr-x 3 azureadmin azureadmin 23 Mar 21 17:31 ..
drwxrwxr-x 2 azureadmin azureadmin 40 Mar 21 17:42 cellexpansion
drwxrwxr-x 2 azureadmin azureadmin 54 Mar 21 17:36 cellpose
drwxrwxr-x 2 azureadmin azureadmin 52 Mar 21 17:40 cellprofiler_spot
-rw-rw-r-- 1 azureadmin azureadmin 695 Mar 21 17:45 pull_images.sh
-rw-rw-r-- 1 azureadmin azureadmin 10802 Mar 21 17:45 sing.log
drwxrwxr-x 2 azureadmin azureadmin 43 Mar 21 17:44 spotcounting
./singularity_images/workflows/cellexpansion:
total 982536
drwxrwxr-x 2 azureadmin azureadmin 40 Mar 21 17:42 .
drwxrwxr-x 6 azureadmin azureadmin 126 Mar 21 17:31 ..
-rwxr-xr-x 1 azureadmin azureadmin 1006116864 Mar 21 17:42 w_cellexpansion_v2.0.1.sif
./singularity_images/workflows/cellpose:
total 4672820
drwxrwxr-x 2 azureadmin azureadmin 54 Mar 21 17:36 .
drwxrwxr-x 6 azureadmin azureadmin 126 Mar 21 17:31 ..
-rwxr-xr-x 1 azureadmin azureadmin 4784967680 Mar 21 17:36 t_nucleisegmentation-cellpose_v1.2.9.sif
./singularity_images/workflows/cellprofiler_spot:
total 2215916
drwxrwxr-x 2 azureadmin azureadmin 52 Mar 21 17:40 .
drwxrwxr-x 6 azureadmin azureadmin 126 Mar 21 17:31 ..
-rwxr-xr-x 1 azureadmin azureadmin 2269097984 Mar 21 17:40 w_spotcounting-cellprofiler_v1.0.1.sif
./singularity_images/workflows/spotcounting:
total 982720
drwxrwxr-x 2 azureadmin azureadmin 43 Mar 21 17:44 .
drwxrwxr-x 6 azureadmin azureadmin 126 Mar 21 17:31 ..
-rwxr-xr-x 1 azureadmin azureadmin 1006305280 Mar 21 17:44 w_countmaskoverlap_v1.0.1.sif
./slurm-scripts:
total 8
drwxrwxr-x 3 azureadmin azureadmin 46 Mar 21 17:31 .
drwxr-xr-x 12 azureadmin azureadmin 4096 Mar 21 17:31 ..
-rw-r--r-- 1 azureadmin azureadmin 1336 Mar 21 17:44 convert_job_array.sh
drwxrwxr-x 2 azureadmin azureadmin 100 Mar 21 17:31 jobs
./slurm-scripts/jobs:
total 16
drwxrwxr-x 2 azureadmin azureadmin 100 Mar 21 17:31 .
drwxrwxr-x 3 azureadmin azureadmin 46 Mar 21 17:31 ..
-rw-rw-r-- 1 azureadmin azureadmin 3358 Mar 21 17:44 cellexpansion.sh
-rw-rw-r-- 1 azureadmin azureadmin 3406 Mar 21 17:44 cellpose.sh
-rw-rw-r-- 1 azureadmin azureadmin 3184 Mar 21 17:44 cellprofiler_spot.sh
-rw-rw-r-- 1 azureadmin azureadmin 3500 Mar 21 17:44 spotcounting.sh
Ok, let’s get to some data! Upload a file with (a local installation of) omero insight.
First, open up the OMERO port
4064
in Azure on yourhpc-slurm-cluster
, just like we did with port4080
: Add inbound security rule, destination4064
, ProtocolTCP
, NameOMEROINSIGHT
.Change the server to
<cyclecloud-vm-ip>:4064
Login
root
/omero
Upload some Nuclei fluorescense images. For example, I uploaded the raw images from S-BSST265 into a Project
TestProject
and DatasetS-BSST265
. Add to Queue, and import!
IMPORTANT! Our default job script assumes 4 CPUs, but we have nodes with only 2 cores. So we have to lower this amount for the job script. Otherwise we get this error:
sbatch: error: CPU count per node can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available
We will do this ad-hoc, by changing the configuration for CellPose in the slurm-config.ini
in our installation:
First, edit the config on the main VM with
vi biomeroworker/slurm-config.ini
Add this line to your workflows
<wf>_job_cpus-per-task=2
, e.g.cellpose_job_cpus-per-task=2
save file (
:wq
)Don’t forget to open your .ssh to the container
chmod -R 777 ~/.ssh
(and close it later)Restart the biomero container(s) (
docker compose down
&docker compose up -d --build
, perhaps specifically forbiomeroworker
).Check logs to see if biomero started up properly
docker logs -f nl-biomero-biomeroworker-1
Next, time to segment! Time to spin up those SLURM compute nodes:
First, select your newly imported dataset, then
Run Script
>biomero
>workflows
>SLURM Run Workflow...
At
Select how to import your results (one or more)
, we will upload the masks back into a new dataset, so:Change
3a) Import into NEW Dataset:
intoCellPoseMasks
Change
3c) Rename the imported images:
into{original_file}_Mask_C1.{ext}
(these are placeholder values)
Next, check the
cellpose
box andChange
nuc channel
to1
Uncheck the
use gpu
box (unless you paid for sweet GPU nodes from Azure)
Run Script!
We are running the
cellpose
workflow on channel1
(with otherwise default parameters) of all the images of the datasetS-BSST265
and import the output mask images back into OMERO as datasetCellPoseMasks
.
Now, this will take a while again because we are cheap and do not have a GPU node at the ready.
Instead, Azure will
on-demand
create our compute node (to save us money when we are not using it), which only has a few CPUs as well!So this is not a test of speed (unless you setup a nice Slurm cluster with always-available GPU nodes), but of the BIOMERO automation process.
Extra thoughts
Perhaps also make the Cluster have a static IP, instead of changing whenever you terminate it: https://learn.microsoft.com/en-us/azure/cyclecloud/how-to/network-security?view=cyclecloud-8