Hello World Tutorial#

This tutorial demonstrates how to use the esrf-data-compressor Python API in a short script. We will:

  1. Locate all raw .h5 dataset files for a given experiment.

  2. Compress them with JP2K at a specified ratio.

  3. Run an SSIM check to generate a consistency report.

  4. Overwrite the originals after inspecting SSIM.

Prerequisites#

pip install esrf-data-compressor

(or, during development)

pip install -e .

Quickstart Script#

Create a file named compress_experiment.py with the following content:

from esrf_data_compressor.finder.finder import discover_datasets
from esrf_data_compressor.compressors.base import CompressorManager
from esrf_data_compressor.checker.run_check import run_ssim_check

# 1) Define path components and base root
path_components = ["ma5567"]
base_root = "/data/visitor"

# 2) Locate raw HDF5 dataset files
raw_files = discover_datasets(path_components, base_root)

# 3) Compress all files in parallel (ratio=10) with JP2K method
mgr = CompressorManager(cratio=10, method="jp2k")
mgr.compress_files(raw_files)

# 4) Run SSIM‐based consistency check
report_file = "ma5567_jp2k_ssim_report.txt"
run_ssim_check(raw_files, "jp2k", report_file)
print(f"SSIM report written to {report_file}")

# 5) (After verifying the report) Overwrite originals
mgr.overwrite_files(raw_files)
print("Originals overwritten; backups saved with .bak extension")

Running the Tutorial#

python compress_experiment.py

Upon completion, you will have:

  • Compressed siblings: each raw file gains a <basename>_jp2k.h5 next to it.

  • SSIM report: ma5567_jp2k_ssim_report.txt in your working directory.

  • Backups: original .h5 files renamed to .h5.bak after overwrite.