In active development, public beta open by request

A configurable gating pipeline for flow cytometry data.

Cytometrika reads FCS and CSV files, applies a sequence of gates defined in TOML, and writes per-gate event counts, population statistics, and figures. The default pipeline targets bacterial flow cytometry with two fluorescent reporters.

Inputs
FCS (2.0, 3.0, 3.1), CSV
Pipeline format
TOML (gates as a DAG)
Outputs
CSV tables, PNG figures
Implementation
Python, scikit-learn, scipy
Fig. 1 Population separation, two-component Gaussian mixture on arcsinh-transformed fluorescence (cofactor 150). Dashed contours mark the 95% mixture components.
Population A, FITC+ Population B, mCherry+ Excluded by upstream gates

§1. Default pipeline

Twelve gates available, six in the default pipeline.

Gates form a DAG. Each one declares its parameters, its parent, and an enabled flag. Disabling a gate is one line of TOML; reordering means changing the after field. Every gate emits its own event count for verification. The six gates below make up the default pipeline; the other six (classification, embedding, manual polygons, statistical filters, compensation) are described in §2.

  1. 01

    Time gate

    Rolling-window event-rate filtering. Time bins whose acquisition rate falls more than threshold standard deviations from the median are removed.

    window_size
    1000
    threshold
    2.5
  2. 02

    Debris gate

    Percentile thresholds on forward and side scatter. Events below the configured percentile in either channel are removed.

    fsc_min_percentile
    5.0
    ssc_min_percentile
    5.0
  3. 03

    Singlet gate

    Linear regression of FSC-A on FSC-Width; events whose residual exceeds the configured z-score threshold are flagged as doublets and removed.

    z_threshold
    3.0
  4. 04

    Main population

    Mahalanobis distance on FSC-A, SSC-A. Events outside an n_std-sigma ellipse are excluded.

    method
    ellipse
    n_std
    3.0
  5. 05

    Fluorescence debris

    Three-cluster k-means on arcsinh-transformed FITC and mCherry. The lowest-intensity cluster is treated as fluorescence debris and removed.

    method
    3cluster_kmeans
  6. 06

    Population separation

    Two-component Gaussian mixture on arcsinh-transformed fluorescence (k-means is available as an alternative). Components are sorted by intensity so labels remain stable across files.

    method
    gmm
    n_components
    2
    covariance_type
    full
    cofactor
    150

§2. Classification and embedding

Multiple classifiers, swappable as gates.

Beyond the default GMM, Cytometrika exposes a range of clustering, classification, and embedding methods as drop-in pipeline gates. They can be chained, compared on the same input, or evaluated against manual gating.

DensiTree

Fuzue library

SPADE clustering, downsample-driven, density-aware.

A Python library developed by Fuzue providing density-aware spanning-tree clustering for cytometry data. See densitree.fuzue.tech. Wraps SPADE with a sensible default cofactor and integrates with the Cytometrika pipeline as a labelling gate. Adds a cluster column without removing events.

|diag|-guided seeded GMM

Fuzue method

Scale-invariant bacterial population separation.

A custom method for separating two bacterial populations in FITC vs mCherry. Uses the arcsinh diagonal |y − x| as a density-independent debris discriminator: real populations lie on opposite diagonals, debris sits on the identity diagonal. K-means identifies the debris cluster, a hard |diag| < 0.5 filter removes residual diagonal events, then a seeded GMM with chi-square 99% outlier removal classifies the survivors.

UMAP

umap-learn

Non-linear 2D embedding.

Standard UMAP embedding on arcsinh-transformed channels. Fits on a configurable subsample for speed, transforms all events. Adds umap_x and umap_y columns; useful both for visual exploration and as input to a downstream clustering gate.

FlowSOM

flowsom

Self-organising map with metaclustering.

FlowSOM for unsupervised population discovery. Trains a self-organising map on arcsinh-transformed channels, then collapses nodes into n_clusters metaclusters. Every event is labelled; no events are removed.

Scyan

scyan

Marker-table-driven supervised classification.

Scyan assigns each event to a population defined by an expected marker-expression table (positive, negative, or NaN per channel). Outputs a population label and a log-probability. Events the model cannot assign with confidence are labelled unassigned.

Manual polygons

built-in

User-drawn regions in arcsinh space.

For workflows where a polygon gate is preferred, Cytometrika accepts one or more polygons in arcsinh-transformed coordinates. A point is retained if it falls inside the union of the defined polygons. The polygon vertices live in the same TOML file as the rest of the pipeline.

LOF and KDE filters

scikit-learn, scipy

Statistical outlier and low-density removal.

Local Outlier Factor and KDE-based density filters are available as standalone gates for removing low-density or outlier events before downstream classification. They operate on the arcsinh-transformed feature matrix of the user's choosing.


§3. Capabilities

What the tool does.

FCS and CSV input

FCS files are read with fcsparser. Channel metadata (short name, long name, gain, voltage), instrument identifier, and acquisition date are preserved alongside the event matrix. CSV input is also supported.

Pipelines as TOML

The full pipeline (gate types, parameters, ordering, enabled flags) is captured in a TOML file. The same TOML and the same input file produce the same gating output.

[[pipeline.gates]]
id    = "main_pop_gate"
after = "singlets_gate"
type  = "main_population"
[pipeline.gates.params]
fsc_role = "fsc_area"
ssc_role = "ssc_area"
method   = "ellipse"
n_std    = 3.0

Channel role mapping

Gates reference channel roles (fsc_area, ssc_area, fsc_width, time, marker names) rather than specific column names. Auto-detected mappings can be overridden by a .channels.json sidecar, so the same pipeline runs on data from different instruments.

Batch processing

Cytometrika processes individual files, directories of files, or grouped experiments in a single run. Per-file statistics are concatenated into combined tables for downstream analysis.


§4. Outputs

Per-gate event counts and population statistics.

Every run produces a per-gate accounting of the input events. Below: a representative run on a single FCS file. Final populations carry per-channel summary statistics (count, percentage, mean, median, standard deviation, coefficient of variation).

Gate Events in Events out % retained
Raw input148,302148,302100.0
Time148,302141,88795.7
Debris141,887118,40283.4
Singlets118,402109,81592.8
Main population109,81596,22187.6
Fluorescence debris96,22192,40896.0
Population An/a54,11858.6
Population Bn/a38,29041.4

Files written per run

  • statistics.csv: per-gate counts, per-population summary statistics
  • gate_histograms/: 1D histograms for every gate
  • scatter_fsc_ssc.png: gated FSC vs SSC
  • scatter_fluorescence.png: FITC vs mCherry with population colours
  • temporal_evolution.png: per-experiment time series
  • fluorescence_trajectories.png: two-dimensional fluorescence evolution

§5. Beta access

Join the beta program.

Cytometrika is in active development. We are prioritising research groups working on bacterial flow cytometry for early access. Tell us about your data, your instrument, and your downstream analysis. We will respond before sending a build.

  • Pre-release builds with release notes
  • Channel of communication with the development team
  • Input on the public release pipeline and default parameters