Hello World Tutorial#
This tutorial demonstrates how to use the esrf-data-compressor Python API in a short script. We will:
Locate all raw .h5 dataset files for a given experiment.
Compress them with JP2K at a specified ratio.
Run an SSIM check to generate a consistency report.
Overwrite the originals after inspecting SSIM.
Prerequisites#
pip install esrf-data-compressor
(or, during development)
pip install -e .
Quickstart Script#
Create a file named compress_experiment.py with the following content:
from esrf_data_compressor.finder.finder import discover_datasets
from esrf_data_compressor.compressors.base import CompressorManager
from esrf_data_compressor.checker.run_check import run_ssim_check
# 1) Define path components and base root
path_components = ["ma5567"]
base_root = "/data/visitor"
# 2) Locate raw HDF5 dataset files
raw_files = discover_datasets(path_components, base_root)
# 3) Compress all files in parallel (ratio=10) with JP2K method
mgr = CompressorManager(cratio=10, method="jp2k")
mgr.compress_files(raw_files)
# 4) Run SSIM‐based consistency check
report_file = "ma5567_jp2k_ssim_report.txt"
run_ssim_check(raw_files, "jp2k", report_file)
print(f"SSIM report written to {report_file}")
# 5) (After verifying the report) Overwrite originals
mgr.overwrite_files(raw_files)
print("Originals overwritten; backups saved with .bak extension")
Running the Tutorial#
python compress_experiment.py
Upon completion, you will have:
Compressed siblings: each raw file gains a <basename>_jp2k.h5 next to it.
SSIM report: ma5567_jp2k_ssim_report.txt in your working directory.
Backups: original .h5 files renamed to .h5.bak after overwrite.