lunapi.gpa

lunapi.gpa — Python interface to Luna’s GPA association-analysis commands.

Two underlying Luna commands are exposed:

--gpa-prep

build a binary data matrix from tabular input files

--gpa

run linear association models on that matrix

Both are invoked in-process via the lunapi0 C++ bindings (no subprocess).

Functions

gpa_prep(→ str)

Run --gpa-prep to build a binary GPA data matrix.

gpa_manifest(→ pandas.DataFrame)

Return the variable manifest for a .dat file as a DataFrame.

gpa_run(→ Dict[str, pandas.DataFrame])

Run GPA association analysis against a pre-built .dat file.

gpa_dump(→ pandas.DataFrame)

Dump the raw data matrix from a .dat file as a DataFrame.

gpa_get_xy_partial(xvar, yvar, zvars)

Return (ids, x_resid, y_resid) after regressing zvars out of both axes.

gpa_get_xy(xvar, yvar)

Return (ids, x_vals, y_vals) from the cached GPA analysis matrix.

gpa_clear_cache()

Release the cached GPA analysis matrix to free memory.

Module Contents

lunapi.gpa.gpa_prep(dat_path: str, specs: List[dict] | None = None, specs_path: str | None = None) str[source]

Run --gpa-prep to build a binary GPA data matrix.

Exactly one of specs or specs_path should be supplied.

Parameters:
  • dat_path (str) – Output path for the binary .dat file.

  • specs (list[dict] or None) – Structured input-file specification list. Each dict may contain file, group, vars, facs, fixed, mappings. The list is serialised to a temporary JSON file and passed as specs=<tmpfile> to --gpa-prep.

  • specs_path (str or None) – Path to an existing JSON specs file.

Returns:

Manifest text captured from stdout (tab-delimited, same columns as gpa_manifest() output). Empty if no manifest was produced.

Return type:

str

Raises:

RuntimeError – Propagated from Helper::halt() inside the Luna C++ library.

lunapi.gpa.gpa_manifest(dat_path: str) pandas.DataFrame[source]

Return the variable manifest for a .dat file as a DataFrame.

Runs --gpa manifest and parses the tab-delimited stdout.

Columns always include NV, VAR, NI, GRP, BASE, plus any factor columns present in the dataset (e.g. CH, F, SS).

lunapi.gpa.gpa_run(dat_path: str, X: str | List[str] | None = None, Y: str | List[str] | None = None, Z: str | List[str] | None = None, Xg: str | List[str] | None = None, Yg: str | List[str] | None = None, Zg: str | List[str] | None = None, mode: str = 'assoc', nreps: int = 0, fdr: bool = True, bonf: bool = False, holm: bool = False, fdr_by: bool = False, adj_all_x: bool = False, x_factors: bool = False, p: float | None = None, padj: float | None = None, vars: str | None = None, xvars: str | None = None, grps: str | None = None, xgrps: str | None = None, facs: str | None = None, xfacs: str | None = None, faclvls: str | None = None, xfaclvls: str | None = None, n_prop: float | None = None, n_req: int | None = None, knn: int | None = None, winsor: float | None = None, subset: str | None = None, inc_ids: str | None = None, ex_ids: str | None = None, verbose: bool = False) Dict[str, pandas.DataFrame][source]

Run GPA association analysis against a pre-built .dat file.

Parameters:
  • dat_path (str) – Binary data file created by gpa_prep().

  • X (str | list[str] | None) – Predictor, outcome, and covariate variable names. Lists are joined with commas and passed as a single X=a,b,c argument.

  • Y (str | list[str] | None) – Predictor, outcome, and covariate variable names. Lists are joined with commas and passed as a single X=a,b,c argument.

  • Z (str | list[str] | None) – Predictor, outcome, and covariate variable names. Lists are joined with commas and passed as a single X=a,b,c argument.

  • Xg (str | list[str] | None) – Group-based variable selection (predictor, outcome, covariate groups).

  • Yg (str | list[str] | None) – Group-based variable selection (predictor, outcome, covariate groups).

  • Zg (str | list[str] | None) – Group-based variable selection (predictor, outcome, covariate groups).

  • mode ("assoc" | "stats" | "comp") –

    • "assoc" — linear association models (default)

    • "stats" — descriptive statistics only

    • "comp" — comparison-style enrichment tests

  • nreps (int) – Permutation replicates (0 = asymptotic p-values only).

  • fdr (bool) – Apply FDR(B&H) correction (default True; pass fdr=False to disable).

  • bonf (bool) – Additional multiple-testing corrections to add to the output.

  • holm (bool) – Additional multiple-testing corrections to add to the output.

  • fdr_by (bool) – Additional multiple-testing corrections to add to the output.

  • adj_all_x (bool) – Adjust p-values across all X variables jointly instead of per-X.

  • x_factors (bool) – Append X-variable manifest columns (XBASE, XGROUP, XSTRAT) to output.

  • p (float | None) – Only return rows below this nominal or adjusted significance threshold.

  • padj (float | None) – Only return rows below this nominal or adjusted significance threshold.

  • vars (str | None) – Explicit variable include / exclude lists (comma-separated).

  • xvars (str | None) – Explicit variable include / exclude lists (comma-separated).

  • grps (str | None) – Group include / exclude lists.

  • xgrps (str | None) – Group include / exclude lists.

  • facs (str | None) – Factor include / exclude lists.

  • xfacs (str | None) – Factor include / exclude lists.

  • faclvls (str | None) – Factor-level include / exclude filters (CH/FZ|CZ syntax).

  • xfaclvls (str | None) – Factor-level include / exclude filters (CH/FZ|CZ syntax).

  • n_prop (float | None) – Drop columns with more than this proportion of missing values.

  • n_req (int | None) – Drop columns with fewer than this many non-missing values.

  • knn (int | None) – k for kNN imputation of missing values.

  • winsor (float | None) – Winsorisation proportion applied before modelling.

  • subset (str | None) – Include only subjects positive for these variables (+VAR syntax).

  • inc_ids (str | None) – Comma-separated subject ID include / exclude lists.

  • ex_ids (str | None) – Comma-separated subject ID include / exclude lists.

  • verbose (bool)

Returns:

Keys follow "GPA: STRATA" convention, e.g.: "GPA: X,Y" — main association results "GPA: VAR" — descriptive statistics (mode=”stats”) "GPA: X" — comparison test results (mode=”comp”)

Return type:

dict[str, pd.DataFrame]

lunapi.gpa.gpa_dump(dat_path: str, **filter_opts) pandas.DataFrame[source]

Dump the raw data matrix from a .dat file as a DataFrame.

Any keyword argument is forwarded as a Luna parameter string, e.g. X="male", lvars="PSD_CH_CZ_F_13.5".

lunapi.gpa.gpa_get_xy_partial(xvar: str, yvar: str, zvars: List[str])[source]

Return (ids, x_resid, y_resid) after regressing zvars out of both axes.

Uses the same Rz = I - Z(Z’Z)^{-1}Z’ projection as the GPA linear model, so the residual scatter exactly matches what went into the regression. Falls back to gpa_get_xy() when zvars is empty.

Raises RuntimeError if no matrix is cached (call gpa_run() first).

lunapi.gpa.gpa_get_xy(xvar: str, yvar: str)[source]

Return (ids, x_vals, y_vals) from the cached GPA analysis matrix.

Filters to rows where both xvar and yvar are non-NaN — the exact same subjects used in the most recent gpa_run() call.

Raises RuntimeError if no matrix is cached (call gpa_run() first).

lunapi.gpa.gpa_clear_cache()[source]

Release the cached GPA analysis matrix to free memory.