lunapi.moonbeam¶

NSRR data access via the official sleepdata.org API.

This module exports moonbeam, a helper for querying NSRR datasets, downloading EDF/annotation assets into a local cache, and creating Luna instances. Dataset/file mappings are driven by a curated TSV manifest maintained at https://github.com/remnrem/luna-api (nsrr/MANIFEST).

Classes¶

moonbeam

Client for the NSRR sleepdata.org data catalog.

Module Contents¶

class lunapi.moonbeam.moonbeam(nsrr_tok=None, cdir=None)[source]¶

Client for the NSRR sleepdata.org data catalog.

File/ID mappings are read from a curated TSV manifest rather than crawled at runtime, which makes cohort loading instant and immune to dataset layout differences across studies.

Parameters:

nsrr_tok (str, optional) – Personal NSRR API token (obtain at https://sleepdata.org/token). If omitted, the token saved by a previous call is used.
cdir (str, optional) – Local download cache. Defaults to luna-nsrr inside the system temp directory.

Examples

>>> mb = moonbeam('my-token')          # first use — token is cached
>>> mb = moonbeam()                    # subsequent use — token loaded automatically
>>> moonbeam.clear_token()             # remove cached token
>>> mb.cohorts()
>>> mb.cohort('cfs')                   # all subcohorts
>>> mb.cohort('cfs', 'cfs-visit5')     # one subcohort
>>> p  = mb.inst('cfs-visit5-800001')
>>> mb.pull_many(['cfs-visit5-800001', 'cfs-visit5-800002'])
>>> mb.status()

nsrr_tok = None[source]¶

df1 = None[source]¶

df2 = None[source]¶

curr_cohort = None[source]¶

curr_subcohort = None[source]¶

curr_id = None[source]¶

curr_edf = None[source]¶

curr_annots = [][source]¶

static save_token(token)[source]¶

Save token to the local cache for passwordless future sessions.

The token is obfuscated using XOR with a SHA-256 key derived from the current username and hostname, then base64-encoded and written to ~/.config/lunapi/.token with permissions 0600 (owner read/write only). This is not cryptographic encryption, but the file is not human-readable and is bound to the specific user account and machine — a copy of the file will not decode on a different system.

Call clear_token() to remove the cached token.

Parameters:: token (str) – NSRR API token (obtain at https://sleepdata.org/token).

static clear_token()[source]¶

Remove the cached NSRR token from ~/.config/lunapi/.token.

After calling this method, an explicit nsrr_tok argument will be required when constructing a new moonbeam instance.

set_cache(cdir)[source]¶

Set the local folder used to cache downloaded files.

Parameters:: cdir (str) – Path to the desired cache directory. Created automatically.

cached(rel_path)[source]¶

Return whether rel_path already exists in the local cache.

Parameters:: rel_path (str) – Path relative to the cache root, i.e. {cohort}/{remote_path} (e.g. 'cfs/polysomnography/edfs/cfs-visit5-800001.edf').
Returns:: True if the file is present on disk; False otherwise.
Return type:: bool

clear_cache(cohort=None)[source]¶

Delete downloaded files from the local cache.

The cached manifest (.manifest) is always preserved so that the next session does not need to re-fetch it from GitHub.

Parameters:: cohort (str, optional) – If given, remove only that cohort’s sub-folder (e.g. 'cfs'). If omitted, all cohort sub-folders are removed.

status(cohort=None)[source]¶

Print a tree of downloaded files with sizes.

Lists every file under each cohort sub-folder, grouped by cohort, with a grand total at the end.

Parameters:: cohort (str, optional) – Restrict the report to one cohort (e.g. 'cfs'). If omitted, all cohorts present in the cache are shown.

refresh_manifest()[source]¶

Re-download the manifest from GitHub, replacing the cached copy.

Use this after new datasets or individuals have been added to the upstream manifest at nsrr/MANIFEST in the luna-api repository. The in-memory _mf dict and the local .manifest file are both updated immediately.

allowed_cohorts(refresh=False)[source]¶

Return the dataset slugs visible to the current NSRR token.

This queries the NSRR dataset listing API and caches the resulting slug set on the instance. Public datasets and datasets explicitly granted to the token are both included because both are downloadable.

Parameters:: refresh (bool, optional) – Force a fresh API query even if cached results are available.
Returns:: Dataset slugs visible to the token.
Return type:: set[str]

cohorts()[source]¶

Return a DataFrame of cohorts defined in the manifest.

Uses the manifest for cohort membership counts and the cached result of allowed_cohorts() for authorization annotations. The result is also stored as self.df1.

Returns:

One row per cohort with columns:

Cohort: NSRR dataset slug (e.g. 'cfs').
Subcohorts: Comma-separated list of subcohort labels defined for this cohort.
N: Total number of individuals across all subcohorts.
Cached: Number of individuals whose EDF is already on disk.
Authorized: True when the current NSRR token can see/download the cohort in the NSRR dataset listing, False otherwise.

Return type:

pandas.DataFrame

cohort(cohort1, subcohort=None)[source]¶

Set the active cohort and return its individual manifest.

Sets self.curr_cohort (and self.curr_subcohort when subcohort is given). The result is also stored as self.df2. Does not contact the network.

Parameters:

cohort1 (str or int) – NSRR dataset slug (e.g. 'cfs') or integer row index into the DataFrame returned by cohorts().
subcohort (str, optional) – If given, restrict the view to this subcohort (e.g. 'baseline') and record it as the current subcohort. When omitted, all subcohorts are included and curr_subcohort is cleared.

Returns:

One row per individual with columns:

Subcohort: Subcohort label for this row.
ID: Subject identifier (e.g. 'cfs-visit5-800001').
EDF: Remote path to the EDF file relative to the cohort root.
Annot: Remote path to the primary annotation file, or '.' if none is defined.

Return type:

pandas.DataFrame

pull(iid, subcohort=None)[source]¶

Download EDF and annotation files for one individual.

Files already present in the cache are skipped. For compressed EDF files (.edf.gz / .edfz) the companion .idx index file is downloaded automatically. Updates curr_id, curr_edf, curr_annots, and curr_subcohort.

A call to cohort() must have been made first to set the active cohort.

Parameters:

iid (str or int) – Individual ID string, or integer row index into self.df2.
subcohort (str, optional) – Subcohort label. Defaults to curr_subcohort; must be supplied explicitly when the same ID appears in more than one subcohort.

Raises:

RuntimeError – If no cohort has been set, or if iid is ambiguous across subcohorts.
KeyError – If iid is not found in the manifest.

pull_file(cohort, remote_path)[source]¶

Download a single file from NSRR into the local cache.

The file is stored at {cdir}/{cohort}/{remote_path}, mirroring the remote directory structure. Download progress is shown via a tqdm progress bar. If the file is already present on disk the download is silently skipped.

Parameters:

cohort (str) – NSRR dataset slug (e.g. 'cfs').
remote_path (str) – Path of the file within the dataset, relative to the dataset root (e.g. 'polysomnography/edfs/cfs-visit5-800001.edf').

Raises:

RuntimeError – If the server returns a non-200 HTTP status code.

pull_many(iids, subcohort=None, cohort=None, max_workers=_MAX_WORKERS)[source]¶

Download files for multiple individuals using parallel connections.

Builds a flat list of all EDF, annotation, and (where applicable) .idx files required by iids, then fetches them concurrently using a thread pool. Files already present in the cache are skipped before a thread is even allocated. A summary line is printed on completion; individual failures are reported inline and do not abort remaining downloads.

A call to cohort() must have been made first.

Parameters:

iids (list of str or int) – Individual IDs to download. Integer entries are resolved as row indices into self.df2.
subcohort (str, optional) – Subcohort label applied to all IDs. Defaults to curr_subcohort. IDs that are ambiguous across subcohorts and have no subcohort specified are skipped with a warning.
cohort (str, optional) – Dataset slug. Defaults to curr_cohort.
max_workers (int, optional) – Maximum number of simultaneous download connections (default: 4).

inst(iid, subcohort=None)[source]¶

Return a Luna instance for one individual, downloading if needed.

Calls pull() to ensure the EDF (and all annotation files) are present in the cache, then creates and returns a fully attached inst object. When multiple annotations are listed in the manifest, only the first is attached; the full list is available via self.curr_annots.

A call to cohort() must have been made first.

Parameters:

iid (str or int) – Individual ID string, or integer row index into self.df2.
subcohort (str, optional) – Subcohort label. Defaults to curr_subcohort.

Returns:

A fully attached instance ready for Luna commands, or None if no cohort has been set.

Return type:

lunapi.instance.inst or None

lunapi.moonbeam¶

Classes¶

Module Contents¶

lunapi

Navigation

Related Topics