API Reference#
This section documents the core modules of esrf-data-compressor.
esrf_data_compressor.finder#
- esrf_data_compressor.finder.finder.discover_datasets(path_components, base_root)[source]#
- Parameters:
path_components (
List[str])base_root (
str)
- Return type:
List[str]
- esrf_data_compressor.finder.finder.find_vds_files(path_components, base_root, filter_expr, *, max_workers=None)[source]#
- Discover each dataset HDF5, then for each top-level group (e.g. “1.1”):
treat each filter key “A/B/C” as a dataset path under that group, i.e. grp[“A”][“B”][“C”][()].
if any filter’s desired substring is found in the dataset’s value, classify that group’s VDS sources into TO COMPRESS, reason=”grp/A/B/C contains ‘val’”.
otherwise into REMAINING, reason=”grp/A/B/C=<actual>”.
Adds a check for datasets already compressed with the JP2KCompressor’s Blosc2/Grok filter (ID 32026) and classifies those files as REMAINING with reason “<already compressed>”.
Returns two lists of (vds_source_path, reason).
- Parameters:
path_components (
List[str])base_root (
str)filter_expr (
Optional[str])max_workers (
Optional[int])
- Return type:
Tuple[List[Tuple[str,str]],List[Tuple[str,str]]]
esrf_data_compressor.checker#
- esrf_data_compressor.checker.run_check.run_ssim_check(raw_files, method, report_path, layout='sibling')[source]#
- Given a list of raw HDF5 file paths, partitions into:
to_check → those with an expected compressed counterpart according to layout missing → those without one
- Writes a report to report_path:
‘=== NOT COMPRESSED FILES ===’ listing each missing
then for each to_check pair, computes SSIM in parallel and appends per‐dataset SSIM lines under ‘=== <stem> ===’ with full paths
- Parameters:
raw_files (
list[str])method (
str)report_path (
str)layout (
str)
- Return type:
None
- esrf_data_compressor.checker.ssim.compute_ssim_for_dataset_pair(orig_path, comp_path, dataset_relpath)[source]#
Given two HDF5 files and the relative 3D dataset path (e.g., ‘entry_0000/ESRF-ID11/marana/data’), compute SSIM on the first (z=0) and last (z=Z-1) slices. Returns (ssim_first, ssim_last). If a slice is constant, SSIM = 1.0.
- Parameters:
orig_path (
str)comp_path (
str)dataset_relpath (
str)
- Return type:
tuple[float,float]
- esrf_data_compressor.checker.ssim.compute_ssim_for_file_pair(orig_path, comp_path)[source]#
Compute SSIM for every 3D dataset under orig_path vs. comp_path. Returns (basename, [report_lines…]), where each line is either: “<dataset_relpath>: SSIM_first=… SSIM_last=…” or an error message.
- Parameters:
orig_path (
str)comp_path (
str)
- Return type:
tuple[str,list[str]]
esrf_data_compressor.compressors#
- class esrf_data_compressor.compressors.base.Compressor[source]#
Bases:
objectAbstract base class. Subclasses must implement compress_file().
- class esrf_data_compressor.compressors.base.CompressorManager(workers=None, cratio=10, method='jp2k', layout='sibling')[source]#
Bases:
objectManages parallel compression and overwrite.
Each worker process is given up to 2 Blosc2 threads (or fewer if the machine has fewer than 4 cores). The number of worker processes is then total_cores // threads_per_worker (at least 1). If the user explicitly passes workers, we cap it to total_cores, then recompute threads_per_worker = min(2, total_cores // workers).
- Usage:
mgr = CompressorManager(cratio=10, method=’jp2k’) mgr.compress_files([…]) mgr.overwrite_files([…])
- Parameters:
workers (
int|None)cratio (
int)method (
str)layout (
str)
- compress_files(file_list)[source]#
Compress each .h5 in file_list in parallel. - sibling layout: produce <basename>_<method>.h5 next to each source. - mirror layout: write compressed files to RAW_DATA_COMPRESSED with same file names. Does not overwrite originals. At the end, prints total elapsed time and data rate in MB/s.
- Parameters:
file_list (
list[str])- Return type:
None
- class esrf_data_compressor.compressors.jp2k.JP2KCompressor[source]#
Bases:
objectUses Blosc2+Grok (JPEG2000) to compress each z‐slice of a 3D HDF5 dataset.
esrf_data_compressor.utils#
- esrf_data_compressor.utils.hdf5_helpers.copy_attrs(src, dst)[source]#
Copy all attributes from src to dst.
- Parameters:
src (
AttributeManager)dst (
AttributeManager)