lunapi.gpa ========== .. py:module:: lunapi.gpa .. autoapi-nested-parse:: lunapi.gpa — Python interface to Luna's GPA association-analysis commands. Two underlying Luna commands are exposed: --gpa-prep build a binary data matrix from tabular input files --gpa run linear association models on that matrix Both are invoked in-process via the lunapi0 C++ bindings (no subprocess). Functions --------- .. autoapisummary:: lunapi.gpa.gpa_prep lunapi.gpa.gpa_manifest lunapi.gpa.gpa_run lunapi.gpa.gpa_dump lunapi.gpa.gpa_get_xy_partial lunapi.gpa.gpa_get_xy lunapi.gpa.gpa_clear_cache Module Contents --------------- .. py:function:: gpa_prep(dat_path: str, specs: Optional[List[dict]] = None, specs_path: Optional[str] = None) -> str Run ``--gpa-prep`` to build a binary GPA data matrix. Exactly one of *specs* or *specs_path* should be supplied. :param dat_path: Output path for the binary ``.dat`` file. :type dat_path: str :param specs: Structured input-file specification list. Each dict may contain ``file``, ``group``, ``vars``, ``facs``, ``fixed``, ``mappings``. The list is serialised to a temporary JSON file and passed as ``specs=`` to ``--gpa-prep``. :type specs: list[dict] or None :param specs_path: Path to an existing JSON specs file. :type specs_path: str or None :returns: Manifest text captured from stdout (tab-delimited, same columns as :func:`gpa_manifest` output). Empty if no manifest was produced. :rtype: str :raises RuntimeError: Propagated from ``Helper::halt()`` inside the Luna C++ library. .. py:function:: gpa_manifest(dat_path: str) -> pandas.DataFrame Return the variable manifest for a ``.dat`` file as a DataFrame. Runs ``--gpa manifest`` and parses the tab-delimited stdout. Columns always include ``NV``, ``VAR``, ``NI``, ``GRP``, ``BASE``, plus any factor columns present in the dataset (e.g. ``CH``, ``F``, ``SS``). .. py:function:: gpa_run(dat_path: str, X: Union[str, List[str], None] = None, Y: Union[str, List[str], None] = None, Z: Union[str, List[str], None] = None, Xg: Union[str, List[str], None] = None, Yg: Union[str, List[str], None] = None, Zg: Union[str, List[str], None] = None, mode: str = 'assoc', nreps: int = 0, fdr: bool = True, bonf: bool = False, holm: bool = False, fdr_by: bool = False, adj_all_x: bool = False, x_factors: bool = False, p: Optional[float] = None, padj: Optional[float] = None, vars: Optional[str] = None, xvars: Optional[str] = None, grps: Optional[str] = None, xgrps: Optional[str] = None, facs: Optional[str] = None, xfacs: Optional[str] = None, faclvls: Optional[str] = None, xfaclvls: Optional[str] = None, n_prop: Optional[float] = None, n_req: Optional[int] = None, knn: Optional[int] = None, winsor: Optional[float] = None, subset: Optional[str] = None, inc_ids: Optional[str] = None, ex_ids: Optional[str] = None, verbose: bool = False) -> Dict[str, pandas.DataFrame] Run GPA association analysis against a pre-built ``.dat`` file. :param dat_path: Binary data file created by :func:`gpa_prep`. :type dat_path: str :param X: Predictor, outcome, and covariate variable names. Lists are joined with commas and passed as a single ``X=a,b,c`` argument. :type X: str | list[str] | None :param Y: Predictor, outcome, and covariate variable names. Lists are joined with commas and passed as a single ``X=a,b,c`` argument. :type Y: str | list[str] | None :param Z: Predictor, outcome, and covariate variable names. Lists are joined with commas and passed as a single ``X=a,b,c`` argument. :type Z: str | list[str] | None :param Xg: Group-based variable selection (predictor, outcome, covariate groups). :type Xg: str | list[str] | None :param Yg: Group-based variable selection (predictor, outcome, covariate groups). :type Yg: str | list[str] | None :param Zg: Group-based variable selection (predictor, outcome, covariate groups). :type Zg: str | list[str] | None :param mode: * ``"assoc"`` — linear association models (default) * ``"stats"`` — descriptive statistics only * ``"comp"`` — comparison-style enrichment tests :type mode: "assoc" | "stats" | "comp" :param nreps: Permutation replicates (0 = asymptotic p-values only). :type nreps: int :param fdr: Apply FDR(B&H) correction (default True; pass ``fdr=False`` to disable). :type fdr: bool :param bonf: Additional multiple-testing corrections to add to the output. :type bonf: bool :param holm: Additional multiple-testing corrections to add to the output. :type holm: bool :param fdr_by: Additional multiple-testing corrections to add to the output. :type fdr_by: bool :param adj_all_x: Adjust p-values across all X variables jointly instead of per-X. :type adj_all_x: bool :param x_factors: Append X-variable manifest columns (XBASE, XGROUP, XSTRAT) to output. :type x_factors: bool :param p: Only return rows below this nominal or adjusted significance threshold. :type p: float | None :param padj: Only return rows below this nominal or adjusted significance threshold. :type padj: float | None :param vars: Explicit variable include / exclude lists (comma-separated). :type vars: str | None :param xvars: Explicit variable include / exclude lists (comma-separated). :type xvars: str | None :param grps: Group include / exclude lists. :type grps: str | None :param xgrps: Group include / exclude lists. :type xgrps: str | None :param facs: Factor include / exclude lists. :type facs: str | None :param xfacs: Factor include / exclude lists. :type xfacs: str | None :param faclvls: Factor-level include / exclude filters (``CH/FZ|CZ`` syntax). :type faclvls: str | None :param xfaclvls: Factor-level include / exclude filters (``CH/FZ|CZ`` syntax). :type xfaclvls: str | None :param n_prop: Drop columns with more than this proportion of missing values. :type n_prop: float | None :param n_req: Drop columns with fewer than this many non-missing values. :type n_req: int | None :param knn: k for kNN imputation of missing values. :type knn: int | None :param winsor: Winsorisation proportion applied before modelling. :type winsor: float | None :param subset: Include only subjects positive for these variables (``+VAR`` syntax). :type subset: str | None :param inc_ids: Comma-separated subject ID include / exclude lists. :type inc_ids: str | None :param ex_ids: Comma-separated subject ID include / exclude lists. :type ex_ids: str | None :param verbose: :type verbose: bool :returns: Keys follow ``"GPA: STRATA"`` convention, e.g.: ``"GPA: X,Y"`` — main association results ``"GPA: VAR"`` — descriptive statistics (mode="stats") ``"GPA: X"`` — comparison test results (mode="comp") :rtype: dict[str, pd.DataFrame] .. py:function:: gpa_dump(dat_path: str, **filter_opts) -> pandas.DataFrame Dump the raw data matrix from a ``.dat`` file as a DataFrame. Any keyword argument is forwarded as a Luna parameter string, e.g. ``X="male"``, ``lvars="PSD_CH_CZ_F_13.5"``. .. py:function:: gpa_get_xy_partial(xvar: str, yvar: str, zvars: List[str]) Return (ids, x_resid, y_resid) after regressing *zvars* out of both axes. Uses the same Rz = I - Z(Z'Z)^{-1}Z' projection as the GPA linear model, so the residual scatter exactly matches what went into the regression. Falls back to :func:`gpa_get_xy` when *zvars* is empty. Raises ``RuntimeError`` if no matrix is cached (call :func:`gpa_run` first). .. py:function:: gpa_get_xy(xvar: str, yvar: str) Return (ids, x_vals, y_vals) from the cached GPA analysis matrix. Filters to rows where both *xvar* and *yvar* are non-NaN — the exact same subjects used in the most recent :func:`gpa_run` call. Raises ``RuntimeError`` if no matrix is cached (call :func:`gpa_run` first). .. py:function:: gpa_clear_cache() Release the cached GPA analysis matrix to free memory.