Eventsourcing and Views (Developer) =================================== This page explains how BIOMERO tracks workflow execution using eventsourcing, and how read models ("views") are maintained and migrated. Overview -------- - Event side: domain aggregates emit immutable events and are stored in an event store (via eventsourcing_sqlalchemy). - View side: lightweight SQLAlchemy models are updated by ProcessApplications that listen to events and persist denormalized rows for fast queries and dashboards. Event side ---------- - Aggregates: see ``biomero.eventsourcing`` - ``WorkflowRun`` (create/start/complete/fail; holds list of task IDs) - ``Task`` (create/start/complete/fail; adds Slurm job IDs; status/progress; results) - Application service: ``WorkflowTracker`` orchestrates aggregate lifecycle methods and commits using ``EngineManager``. - Persistence: the event store is managed by the eventsourcing library. Do not write Alembic migrations for its tables. Environment ~~~~~~~~~~~ - Required env vars: - ``PERSISTENCE_MODULE=eventsourcing_sqlalchemy`` - ``SQLALCHEMY_URL=postgresql+psycopg2://...`` (or sqlite for tests) - Engine wiring: ``EngineManager.create_scoped_session()`` configures the SQLAlchemy engine/session used both by eventsourcing and the views. Versioning aggregates ~~~~~~~~~~~~~~~~~~~~~ When changing aggregate or event schemas, keep backward compatibility with stored events: - Bump ``INITIAL_VERSION`` (or class_version) as appropriate. - Add ``upcast_vX_vY(state)`` static methods on Aggregate/Event classes to adapt older event snapshots to the new shape. - See the eventsourcing docs for versioning patterns. View side --------- Views are updated by ProcessApplications that consume events and persist into BIOMERO-owned tables (all start with ``biomero_...``): - ``biomero.views.JobAccounting`` -> ``biomero_job_view`` (user/group + task_id per Slurm job) - ``biomero.views.JobProgress`` -> ``biomero_job_progress_view`` (status/progress per Slurm job) - ``biomero.views.WorkflowProgress`` -> ``biomero_workflow_progress_view`` (status/progress/name/user/group/task) - ``biomero.views.WorkflowAnalytics`` -> ``biomero_task_execution`` (per-task analytics, timings, failures) These are standard SQLAlchemy models defined in ``biomero.database``. They are the only BIOMERO tables covered by Alembic migrations. Rebuilding views (reprojection) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Views are derived data and can be safely rebuilt from the event log: 1) Truncate view tables (optional): - ``biomero_job_view``, ``biomero_job_progress_view``, ``biomero_workflow_progress_view``, ``biomero_task_execution`` 2) Reprocess events from the start with a System runner: :: from eventsourcing.system import System, SingleThreadedRunner from biomero import WorkflowTracker from biomero.views import JobAccounting, JobProgress, WorkflowProgress, WorkflowAnalytics system = System(pipes=[[WorkflowTracker, JobAccounting], [WorkflowTracker, JobProgress], [WorkflowTracker, WorkflowProgress], [WorkflowTracker, WorkflowAnalytics]]) runner = SingleThreadedRunner(system) runner.start() # Optionally call: runner.get(WorkflowTracker).pull_and_process(leader_name=WorkflowTracker.__name__) runner.stop() Notes: - View upserts use ``session.merge(...)`` or primary keys to stay idempotent. - If you change primary keys or uniqueness, do a one-off cleanup before reprojecting. Schema changes and migrations (views only) ----------------------------------------- BIOMERO ships with an Alembic track scoped to its own tables. The event store tables are excluded and managed by the library. - Config: ``biomero/migrations`` (version table: ``alembic_version_biomero``) - Startup migrator: ``biomero.db_migrate.run_migrations_on_startup()`` - Controlled by env: - ``BIOMERO_RUN_MIGRATIONS=1`` (default) to auto-upgrade on start - ``BIOMERO_ALLOW_AUTO_STAMP=1`` to adopt an existing schema (stamps head once if BIOMERO tables already exist) Typical workflow (autogenerate a revision for view changes): 1) Edit SQLAlchemy models in ``biomero.database`` (BIOMERO view tables only). 2) Generate a migration: :: # Ensure SQLALCHEMY_URL points to a dev DB that is up-to-date python -m alembic -c biomero/migrations/alembic.ini revision --autogenerate -m "view change" 3) Commit the migration version file to Git to be packaged with the release Gotchas ------- - Do not include eventsourcing tables in Alembic. Our env.py filters to BIOMERO view tables only. - When changing aggregates, add upcasters so old events can still be rehydrated. - Rebuilding views is safe; prefer that over complex data migrations. - In production, the startup migrator uses a Postgres advisory lock to avoid concurrent upgrades.