In active development, public beta open by request

A configurable gating pipeline for flow cytometry data.

Cytometrika reads FCS and CSV files, applies a sequence of gates defined in TOML, and writes per-gate event counts, population statistics, and figures. The default pipeline targets bacterial flow cytometry with two fluorescent reporters.

Request beta access Default pipeline

Inputs: FCS (2.0, 3.0, 3.1), CSV
Pipeline format: TOML (gates as a DAG)
Outputs: CSV tables, PNG figures
Implementation: Python, scikit-learn, scipy

Fig. 1 Population separation, two-component Gaussian mixture on arcsinh-transformed fluorescence (cofactor 150). Dashed contours mark the 95% mixture components.

Population A, FITC⁺ Population B, mCherry⁺ Excluded by upstream gates

§1. Default pipeline

Twelve gates available, six in the default pipeline.

Gates form a DAG. Each one declares its parameters, its parent, and an enabled flag. Disabling a gate is one line of TOML; reordering means changing the after field. Every gate emits its own event count for verification. The six gates below make up the default pipeline; the other six (classification, embedding, manual polygons, statistical filters, compensation) are described in §2.

01
Time gate

Rolling-window event-rate filtering. Time bins whose acquisition rate falls more than threshold standard deviations from the median are removed.

window_size
1000

threshold
2.5
02
Debris gate

Percentile thresholds on forward and side scatter. Events below the configured percentile in either channel are removed.

fsc_min_percentile
5.0

ssc_min_percentile
5.0
03
Singlet gate

Linear regression of FSC-A on FSC-Width; events whose residual exceeds the configured z-score threshold are flagged as doublets and removed.

z_threshold
3.0
04
Main population

Mahalanobis distance on FSC-A, SSC-A. Events outside an n_std-sigma ellipse are excluded.

method
ellipse

n_std
3.0
05
Fluorescence debris

Three-cluster k-means on arcsinh-transformed FITC and mCherry. The lowest-intensity cluster is treated as fluorescence debris and removed.

method
3cluster_kmeans
06
Population separation

Two-component Gaussian mixture on arcsinh-transformed fluorescence (k-means is available as an alternative). Components are sorted by intensity so labels remain stable across files.

method
gmm

n_components
2

covariance_type
full

cofactor
150

§2. Classification and embedding

Multiple classifiers, swappable as gates.

Beyond the default GMM, Cytometrika exposes a range of clustering, classification, and embedding methods as drop-in pipeline gates. They can be chained, compared on the same input, or evaluated against manual gating.

DensiTree

Fuzue library

SPADE clustering, downsample-driven, density-aware.

A Python library developed by Fuzue providing density-aware spanning-tree clustering for cytometry data. See densitree.fuzue.tech. Wraps SPADE with a sensible default cofactor and integrates with the Cytometrika pipeline as a labelling gate. Adds a cluster column without removing events.

|diag|-guided seeded GMM

Fuzue method

Scale-invariant bacterial population separation.

A custom method for separating two bacterial populations in FITC vs mCherry. Uses the arcsinh diagonal |y − x| as a density-independent debris discriminator: real populations lie on opposite diagonals, debris sits on the identity diagonal. K-means identifies the debris cluster, a hard |diag| < 0.5 filter removes residual diagonal events, then a seeded GMM with chi-square 99% outlier removal classifies the survivors.

UMAP

umap-learn

Non-linear 2D embedding.

Standard UMAP embedding on arcsinh-transformed channels. Fits on a configurable subsample for speed, transforms all events. Adds umap_x and umap_y columns; useful both for visual exploration and as input to a downstream clustering gate.

FlowSOM

flowsom

Self-organising map with metaclustering.

FlowSOM for unsupervised population discovery. Trains a self-organising map on arcsinh-transformed channels, then collapses nodes into n_clusters metaclusters. Every event is labelled; no events are removed.

Scyan

scyan

Marker-table-driven supervised classification.

Scyan assigns each event to a population defined by an expected marker-expression table (positive, negative, or NaN per channel). Outputs a population label and a log-probability. Events the model cannot assign with confidence are labelled unassigned.

Manual polygons

built-in

User-drawn regions in arcsinh space.

For workflows where a polygon gate is preferred, Cytometrika accepts one or more polygons in arcsinh-transformed coordinates. A point is retained if it falls inside the union of the defined polygons. The polygon vertices live in the same TOML file as the rest of the pipeline.

LOF and KDE filters

scikit-learn, scipy

Statistical outlier and low-density removal.

Local Outlier Factor and KDE-based density filters are available as standalone gates for removing low-density or outlier events before downstream classification. They operate on the arcsinh-transformed feature matrix of the user's choosing.

§3. Capabilities

What the tool does.

FCS and CSV input

FCS files are read with fcsparser. Channel metadata (short name, long name, gain, voltage), instrument identifier, and acquisition date are preserved alongside the event matrix. CSV input is also supported.

Pipelines as TOML

The full pipeline (gate types, parameters, ordering, enabled flags) is captured in a TOML file. The same TOML and the same input file produce the same gating output.

[[pipeline.gates]]
id    = "main_pop_gate"
after = "singlets_gate"
type  = "main_population"
[pipeline.gates.params]
fsc_role = "fsc_area"
ssc_role = "ssc_area"
method   = "ellipse"
n_std    = 3.0

Channel role mapping

Gates reference channel roles (fsc_area, ssc_area, fsc_width, time, marker names) rather than specific column names. Auto-detected mappings can be overridden by a .channels.json sidecar, so the same pipeline runs on data from different instruments.

Batch processing

Cytometrika processes individual files, directories of files, or grouped experiments in a single run. Per-file statistics are concatenated into combined tables for downstream analysis.

§4. Outputs

Per-gate event counts and population statistics.

Every run produces a per-gate accounting of the input events. Below: a representative run on a single FCS file. Final populations carry per-channel summary statistics (count, percentage, mean, median, standard deviation, coefficient of variation).

Gate	Events in	Events out	% retained
Raw input	148,302	148,302	100.0
Time	148,302	141,887	95.7
Debris	141,887	118,402	83.4
Singlets	118,402	109,815	92.8
Main population	109,815	96,221	87.6
Fluorescence debris	96,221	92,408	96.0
Population A	n/a	54,118	58.6
Population B	n/a	38,290	41.4

Files written per run

statistics.csv: per-gate counts, per-population summary statistics
gate_histograms/: 1D histograms for every gate
scatter_fsc_ssc.png: gated FSC vs SSC
scatter_fluorescence.png: FITC vs mCherry with population colours
temporal_evolution.png: per-experiment time series
fluorescence_trajectories.png: two-dimensional fluorescence evolution

§5. Beta access

Join the beta program.

Cytometrika is in active development. We are prioritising research groups working on bacterial flow cytometry for early access. Tell us about your data, your instrument, and your downstream analysis. We will respond before sending a build.

Pre-release builds with release notes
Channel of communication with the development team
Input on the public release pipeline and default parameters

A configurable gating pipeline for flow cytometry data.

Time gate

Debris gate

Singlet gate

Main population