lunapi.moonbeam¶
NSRR data access via the official sleepdata.org API.
This module exports moonbeam, a helper for querying NSRR datasets,
downloading EDF/annotation assets into a local cache, and creating Luna
instances. Dataset/file mappings are driven by a curated TSV manifest
maintained at https://github.com/remnrem/luna-api (nsrr/MANIFEST).
Classes¶
Client for the NSRR sleepdata.org data catalog. |
Module Contents¶
- class lunapi.moonbeam.moonbeam(nsrr_tok=None, cdir=None)[source]¶
Client for the NSRR sleepdata.org data catalog.
File/ID mappings are read from a curated TSV manifest rather than crawled at runtime, which makes cohort loading instant and immune to dataset layout differences across studies.
- Parameters:
nsrr_tok (str, optional) – Personal NSRR API token (obtain at
https://sleepdata.org/token). If omitted, the token saved by a previous call is used.cdir (str, optional) – Local download cache. Defaults to
luna-nsrrinside the system temp directory.
Examples
>>> mb = moonbeam('my-token') # first use — token is cached >>> mb = moonbeam() # subsequent use — token loaded automatically >>> moonbeam.clear_token() # remove cached token >>> mb.cohorts() >>> mb.cohort('cfs') # all subcohorts >>> mb.cohort('cfs', 'cfs-visit5') # one subcohort >>> p = mb.inst('cfs-visit5-800001') >>> mb.pull_many(['cfs-visit5-800001', 'cfs-visit5-800002']) >>> mb.status()
- static save_token(token)[source]¶
Save token to the local cache for passwordless future sessions.
The token is obfuscated using XOR with a SHA-256 key derived from the current username and hostname, then base64-encoded and written to
~/.config/lunapi/.tokenwith permissions0600(owner read/write only). This is not cryptographic encryption, but the file is not human-readable and is bound to the specific user account and machine — a copy of the file will not decode on a different system.Call
clear_token()to remove the cached token.- Parameters:
token (str) – NSRR API token (obtain at
https://sleepdata.org/token).
- static clear_token()[source]¶
Remove the cached NSRR token from
~/.config/lunapi/.token.After calling this method, an explicit nsrr_tok argument will be required when constructing a new
moonbeaminstance.
- set_cache(cdir)[source]¶
Set the local folder used to cache downloaded files.
- Parameters:
cdir (str) – Path to the desired cache directory. Created automatically.
- cached(rel_path)[source]¶
Return whether rel_path already exists in the local cache.
- Parameters:
rel_path (str) – Path relative to the cache root, i.e.
{cohort}/{remote_path}(e.g.'cfs/polysomnography/edfs/cfs-visit5-800001.edf').- Returns:
Trueif the file is present on disk;Falseotherwise.- Return type:
bool
- clear_cache(cohort=None)[source]¶
Delete downloaded files from the local cache.
The cached manifest (
.manifest) is always preserved so that the next session does not need to re-fetch it from GitHub.- Parameters:
cohort (str, optional) – If given, remove only that cohort’s sub-folder (e.g.
'cfs'). If omitted, all cohort sub-folders are removed.
- status(cohort=None)[source]¶
Print a tree of downloaded files with sizes.
Lists every file under each cohort sub-folder, grouped by cohort, with a grand total at the end.
- Parameters:
cohort (str, optional) – Restrict the report to one cohort (e.g.
'cfs'). If omitted, all cohorts present in the cache are shown.
- refresh_manifest()[source]¶
Re-download the manifest from GitHub, replacing the cached copy.
Use this after new datasets or individuals have been added to the upstream manifest at
nsrr/MANIFESTin the luna-api repository. The in-memory_mfdict and the local.manifestfile are both updated immediately.
- allowed_cohorts(refresh=False)[source]¶
Return the dataset slugs visible to the current NSRR token.
This queries the NSRR dataset listing API and caches the resulting slug set on the instance. Public datasets and datasets explicitly granted to the token are both included because both are downloadable.
- Parameters:
refresh (bool, optional) – Force a fresh API query even if cached results are available.
- Returns:
Dataset slugs visible to the token.
- Return type:
set[str]
- cohorts()[source]¶
Return a DataFrame of cohorts defined in the manifest.
Uses the manifest for cohort membership counts and the cached result of
allowed_cohorts()for authorization annotations. The result is also stored asself.df1.- Returns:
One row per cohort with columns:
CohortNSRR dataset slug (e.g.
'cfs').SubcohortsComma-separated list of subcohort labels defined for this cohort.
NTotal number of individuals across all subcohorts.
CachedNumber of individuals whose EDF is already on disk.
AuthorizedTruewhen the current NSRR token can see/download the cohort in the NSRR dataset listing,Falseotherwise.
- Return type:
pandas.DataFrame
- cohort(cohort1, subcohort=None)[source]¶
Set the active cohort and return its individual manifest.
Sets
self.curr_cohort(andself.curr_subcohortwhen subcohort is given). The result is also stored asself.df2. Does not contact the network.- Parameters:
cohort1 (str or int) – NSRR dataset slug (e.g.
'cfs') or integer row index into the DataFrame returned bycohorts().subcohort (str, optional) – If given, restrict the view to this subcohort (e.g.
'baseline') and record it as the current subcohort. When omitted, all subcohorts are included andcurr_subcohortis cleared.
- Returns:
One row per individual with columns:
SubcohortSubcohort label for this row.
IDSubject identifier (e.g.
'cfs-visit5-800001').EDFRemote path to the EDF file relative to the cohort root.
AnnotRemote path to the primary annotation file, or
'.'if none is defined.
- Return type:
pandas.DataFrame
- pull(iid, subcohort=None)[source]¶
Download EDF and annotation files for one individual.
Files already present in the cache are skipped. For compressed EDF files (
.edf.gz/.edfz) the companion.idxindex file is downloaded automatically. Updatescurr_id,curr_edf,curr_annots, andcurr_subcohort.A call to
cohort()must have been made first to set the active cohort.- Parameters:
iid (str or int) – Individual ID string, or integer row index into
self.df2.subcohort (str, optional) – Subcohort label. Defaults to
curr_subcohort; must be supplied explicitly when the same ID appears in more than one subcohort.
- Raises:
RuntimeError – If no cohort has been set, or if iid is ambiguous across subcohorts.
KeyError – If iid is not found in the manifest.
- pull_file(cohort, remote_path)[source]¶
Download a single file from NSRR into the local cache.
The file is stored at
{cdir}/{cohort}/{remote_path}, mirroring the remote directory structure. Download progress is shown via atqdmprogress bar. If the file is already present on disk the download is silently skipped.- Parameters:
cohort (str) – NSRR dataset slug (e.g.
'cfs').remote_path (str) – Path of the file within the dataset, relative to the dataset root (e.g.
'polysomnography/edfs/cfs-visit5-800001.edf').
- Raises:
RuntimeError – If the server returns a non-200 HTTP status code.
- pull_many(iids, subcohort=None, cohort=None, max_workers=_MAX_WORKERS)[source]¶
Download files for multiple individuals using parallel connections.
Builds a flat list of all EDF, annotation, and (where applicable)
.idxfiles required by iids, then fetches them concurrently using a thread pool. Files already present in the cache are skipped before a thread is even allocated. A summary line is printed on completion; individual failures are reported inline and do not abort remaining downloads.A call to
cohort()must have been made first.- Parameters:
iids (list of str or int) – Individual IDs to download. Integer entries are resolved as row indices into
self.df2.subcohort (str, optional) – Subcohort label applied to all IDs. Defaults to
curr_subcohort. IDs that are ambiguous across subcohorts and have no subcohort specified are skipped with a warning.cohort (str, optional) – Dataset slug. Defaults to
curr_cohort.max_workers (int, optional) – Maximum number of simultaneous download connections (default: 4).
- inst(iid, subcohort=None)[source]¶
Return a Luna instance for one individual, downloading if needed.
Calls
pull()to ensure the EDF (and all annotation files) are present in the cache, then creates and returns a fully attachedinstobject. When multiple annotations are listed in the manifest, only the first is attached; the full list is available viaself.curr_annots.A call to
cohort()must have been made first.- Parameters:
iid (str or int) – Individual ID string, or integer row index into
self.df2.subcohort (str, optional) – Subcohort label. Defaults to
curr_subcohort.
- Returns:
A fully attached instance ready for Luna commands, or
Noneif no cohort has been set.- Return type:
lunapi.instance.inst or None