lunapi.gpa
==========

.. py:module:: lunapi.gpa

.. autoapi-nested-parse::

   lunapi.gpa — Python interface to Luna's GPA association-analysis commands.

   Two underlying Luna commands are exposed:

     --gpa-prep  build a binary data matrix from tabular input files
     --gpa       run linear association models on that matrix

   Both are invoked in-process via the lunapi0 C++ bindings (no subprocess).


Functions
---------

.. autoapisummary::

   lunapi.gpa.gpa_prep
   lunapi.gpa.gpa_manifest
   lunapi.gpa.gpa_run
   lunapi.gpa.gpa_dump
   lunapi.gpa.gpa_get_xy_partial
   lunapi.gpa.gpa_get_xy
   lunapi.gpa.gpa_clear_cache


Module Contents
---------------

.. py:function:: gpa_prep(dat_path: str, specs: Optional[List[dict]] = None, specs_path: Optional[str] = None) -> str

   Run ``--gpa-prep`` to build a binary GPA data matrix.

   Exactly one of *specs* or *specs_path* should be supplied.

   :param dat_path: Output path for the binary ``.dat`` file.
   :type dat_path: str
   :param specs: Structured input-file specification list.  Each dict may contain
                 ``file``, ``group``, ``vars``, ``facs``, ``fixed``, ``mappings``.
                 The list is serialised to a temporary JSON file and passed as
                 ``specs=<tmpfile>`` to ``--gpa-prep``.
   :type specs: list[dict] or None
   :param specs_path: Path to an existing JSON specs file.
   :type specs_path: str or None

   :returns: Manifest text captured from stdout (tab-delimited, same columns as
             :func:`gpa_manifest` output).  Empty if no manifest was produced.
   :rtype: str

   :raises RuntimeError: Propagated from ``Helper::halt()`` inside the Luna C++ library.


.. py:function:: gpa_manifest(dat_path: str) -> pandas.DataFrame

   Return the variable manifest for a ``.dat`` file as a DataFrame.

   Runs ``--gpa manifest`` and parses the tab-delimited stdout.

   Columns always include ``NV``, ``VAR``, ``NI``, ``GRP``, ``BASE``, plus
   any factor columns present in the dataset (e.g. ``CH``, ``F``, ``SS``).


.. py:function:: gpa_run(dat_path: str, X: Union[str, List[str], None] = None, Y: Union[str, List[str], None] = None, Z: Union[str, List[str], None] = None, Xg: Union[str, List[str], None] = None, Yg: Union[str, List[str], None] = None, Zg: Union[str, List[str], None] = None, mode: str = 'assoc', nreps: int = 0, fdr: bool = True, bonf: bool = False, holm: bool = False, fdr_by: bool = False, adj_all_x: bool = False, x_factors: bool = False, p: Optional[float] = None, padj: Optional[float] = None, vars: Optional[str] = None, xvars: Optional[str] = None, grps: Optional[str] = None, xgrps: Optional[str] = None, facs: Optional[str] = None, xfacs: Optional[str] = None, faclvls: Optional[str] = None, xfaclvls: Optional[str] = None, n_prop: Optional[float] = None, n_req: Optional[int] = None, knn: Optional[int] = None, winsor: Optional[float] = None, subset: Optional[str] = None, inc_ids: Optional[str] = None, ex_ids: Optional[str] = None, verbose: bool = False) -> Dict[str, pandas.DataFrame]

   Run GPA association analysis against a pre-built ``.dat`` file.

   :param dat_path: Binary data file created by :func:`gpa_prep`.
   :type dat_path: str
   :param X: Predictor, outcome, and covariate variable names.
             Lists are joined with commas and passed as a single ``X=a,b,c`` argument.
   :type X: str | list[str] | None
   :param Y: Predictor, outcome, and covariate variable names.
             Lists are joined with commas and passed as a single ``X=a,b,c`` argument.
   :type Y: str | list[str] | None
   :param Z: Predictor, outcome, and covariate variable names.
             Lists are joined with commas and passed as a single ``X=a,b,c`` argument.
   :type Z: str | list[str] | None
   :param Xg: Group-based variable selection (predictor, outcome, covariate groups).
   :type Xg: str | list[str] | None
   :param Yg: Group-based variable selection (predictor, outcome, covariate groups).
   :type Yg: str | list[str] | None
   :param Zg: Group-based variable selection (predictor, outcome, covariate groups).
   :type Zg: str | list[str] | None
   :param mode:
                * ``"assoc"``  — linear association models (default)
                * ``"stats"``  — descriptive statistics only
                * ``"comp"``   — comparison-style enrichment tests
   :type mode: "assoc" | "stats" | "comp"
   :param nreps: Permutation replicates (0 = asymptotic p-values only).
   :type nreps: int
   :param fdr: Apply FDR(B&H) correction (default True; pass ``fdr=False`` to disable).
   :type fdr: bool
   :param bonf: Additional multiple-testing corrections to add to the output.
   :type bonf: bool
   :param holm: Additional multiple-testing corrections to add to the output.
   :type holm: bool
   :param fdr_by: Additional multiple-testing corrections to add to the output.
   :type fdr_by: bool
   :param adj_all_x: Adjust p-values across all X variables jointly instead of per-X.
   :type adj_all_x: bool
   :param x_factors: Append X-variable manifest columns (XBASE, XGROUP, XSTRAT) to output.
   :type x_factors: bool
   :param p: Only return rows below this nominal or adjusted significance threshold.
   :type p: float | None
   :param padj: Only return rows below this nominal or adjusted significance threshold.
   :type padj: float | None
   :param vars: Explicit variable include / exclude lists (comma-separated).
   :type vars: str | None
   :param xvars: Explicit variable include / exclude lists (comma-separated).
   :type xvars: str | None
   :param grps: Group include / exclude lists.
   :type grps: str | None
   :param xgrps: Group include / exclude lists.
   :type xgrps: str | None
   :param facs: Factor include / exclude lists.
   :type facs: str | None
   :param xfacs: Factor include / exclude lists.
   :type xfacs: str | None
   :param faclvls: Factor-level include / exclude filters (``CH/FZ|CZ`` syntax).
   :type faclvls: str | None
   :param xfaclvls: Factor-level include / exclude filters (``CH/FZ|CZ`` syntax).
   :type xfaclvls: str | None
   :param n_prop: Drop columns with more than this proportion of missing values.
   :type n_prop: float | None
   :param n_req: Drop columns with fewer than this many non-missing values.
   :type n_req: int | None
   :param knn: k for kNN imputation of missing values.
   :type knn: int | None
   :param winsor: Winsorisation proportion applied before modelling.
   :type winsor: float | None
   :param subset: Include only subjects positive for these variables (``+VAR`` syntax).
   :type subset: str | None
   :param inc_ids: Comma-separated subject ID include / exclude lists.
   :type inc_ids: str | None
   :param ex_ids: Comma-separated subject ID include / exclude lists.
   :type ex_ids: str | None
   :param verbose:
   :type verbose: bool

   :returns: Keys follow ``"GPA: STRATA"`` convention, e.g.:
             ``"GPA: X,Y"`` — main association results
             ``"GPA: VAR"`` — descriptive statistics (mode="stats")
             ``"GPA: X"``   — comparison test results (mode="comp")
   :rtype: dict[str, pd.DataFrame]


.. py:function:: gpa_dump(dat_path: str, **filter_opts) -> pandas.DataFrame

   Dump the raw data matrix from a ``.dat`` file as a DataFrame.

   Any keyword argument is forwarded as a Luna parameter string, e.g.
   ``X="male"``, ``lvars="PSD_CH_CZ_F_13.5"``.


.. py:function:: gpa_get_xy_partial(xvar: str, yvar: str, zvars: List[str])

   Return (ids, x_resid, y_resid) after regressing *zvars* out of both axes.

   Uses the same Rz = I - Z(Z'Z)^{-1}Z' projection as the GPA linear model,
   so the residual scatter exactly matches what went into the regression.
   Falls back to :func:`gpa_get_xy` when *zvars* is empty.

   Raises ``RuntimeError`` if no matrix is cached (call :func:`gpa_run` first).


.. py:function:: gpa_get_xy(xvar: str, yvar: str)

   Return (ids, x_vals, y_vals) from the cached GPA analysis matrix.

   Filters to rows where both *xvar* and *yvar* are non-NaN — the exact
   same subjects used in the most recent :func:`gpa_run` call.

   Raises ``RuntimeError`` if no matrix is cached (call :func:`gpa_run` first).


.. py:function:: gpa_clear_cache()

   Release the cached GPA analysis matrix to free memory.