Using `bffile`

Quick Start

The simplest way to read an image is with imread:

from bffile import imread

data = imread("image.nd2")
print(data.shape, data.dtype)
# (10, 2, 5, 512, 512) uint16

# or, to read a specific series/resolution-level:
data = imread("image.nd2", series=1, resolution=0)

This reads the specified series/resolution into memory as a numpy array with shape (T, C, Z, Y, X). For most other use cases, you'll want more control — that's where BioFile comes in.

from bffile import BioFile

with BioFile("image.nd2") as bf:
    arr = bf.as_array()       # lazy array accessor
    plane = arr[0, 0, 2]      # indexing is just a view — no data read yet!
    data = np.asarray(plane)  # call np.asarray to read the data into memory

For info on extracting data, see:

The LazyBioArray API for reading pixel data with lazy loading and sub-region slicing.
Using as a zarr store for interoperability with the OME-Zarr ecosystem without file conversion.
Using dask for lazy computation for wrapping the lazy array in a dask array for parallelized computations.
Presenting as an xarray DataArray which makes it easy to work with labeled dimensions and coordinates parsed from the OME metadata.

Opening Files with BioFile

BioFile manages the lifecycle of the underlying Java reader and the associated file handle. The recommended pattern is a context manager:

with BioFile("image.nd2") as bf:
    data = bf.read_plane()
# reader fully cleaned up here

You can also manage the lifecycle explicitly:

bf = BioFile("image.nd2")
bf.open()
# ... use bf ...
bf.close()    # release file handle (fast reopen later)
bf.open()     # cheap — just reacquires the file handle
# ... use bf again ...
bf.destroy()  # full cleanup (or let GC handle it)

Critical: Some operations require an open file

BioFile does not attempt to magically open/close files for you as needed. However, some methods like as_array() and to_dask() return objects that require the file to be open when indexed or computed. You are responsible for ensuring the file is open while using those objects.

from bffile import BioFile

with BioFile("image.nd2") as bf:
    arr = bf.as_array()

try:
    arr[0, 0, 2]  # ERROR!!
except RuntimeError as e:
    print(e)  # "File not open - call open() first"

Understanding Lifecycle

BioFile has three states:

UNINITIALIZED: The Python BioFile object exists, but the file handle is not open, and no Java resources are allocated.
OPEN: The file is open and the Java reader is initialized. You can read data and metadata.
SUSPENDED: The file handle is released but the Java reader and all parsed metadata remain in memory. You cannot read data, but you can still access metadata. Re-opening the file from this state is fast.

---
title: BioFile Lifecycle
---
stateDiagram-v2
    direction LR
    [*] --> UNINITIALIZED : <code>\_\_init__()</code>
    UNINITIALIZED --> OPEN : <code>open()</code>
    OPEN --> SUSPENDED : <code>close()</code>
    SUSPENDED --> OPEN : <code>open()</code>
    OPEN --> UNINITIALIZED : <code>destroy()</code>
    SUSPENDED --> UNINITIALIZED : <code>destroy()</code>

Transition	What happens
`__init__()`	Creates the `BioFile` object but does not open the file or initialize the reader.
`open()` (first call)	Full initialization — format detection, header parsing (`setId` in Java). Slow.
`close()`	Releases the OS file handle but keeps all parsed state in memory.
`open()` (after `close()`)	Just reopens the file handle (`reopenFile` in Java). Fast.
`destroy()` / `__exit__()`	Full teardown — Java reader and all cached state released.

close() is lightweight: metadata (via core_metadata(), len(), etc.) remains accessible while the file handle is released. This is useful when you want to avoid holding file descriptors open but plan to read more data later.

Context manager vs explicit close

The context manager (with) calls destroy() on exit — full cleanup. If you want the fast-reopen behavior, use explicit open() / close() calls instead.

Memoization

Memoization speeds up future calls to open(), from an uninitialized state, even across different Python sessions.

The memoize parameter controls whether the initialized reader state is cached to a .bfmemo file on disk. This can improve performance for the UNINITIALIZED → OPEN transition (i.e., when the java reader is fully initialized from scratch) for subsequent reads of the same file in a new Python session, or after destroy() has been called.

# First open: full init + saves .bfmemo file to disk
with BioFile("image.nd2", memoize=True) as bf:
    ...

# Subsequent opens are faster: loads from .bfmemo instead of re-parsing
with BioFile("image.nd2", memoize=True) as bf:
    ...

You must have memoize=True on both the initial open and subsequent opens for this to work.

BIOFORMATS_MEMO_DIR

By default, the .bfmemo file is saved in the same directory as the original image. You must have write permission to this directory. You can change this with the BIOFORMATS_MEMO_DIR environment variable.

The Series Data Model

Bio-Formats models files as a sequence of series (e.g., wells in a plate, fields of view, tiles in a mosaic, etc...). Each series is a 5D dataset with shape (T, C, Z, Y, X), and may have multiple resolution levels (pyramid layers).

Mental model

BioFile                    # Container (usually a file, but possibly multiple)
├── Series 0               # e.g., first field of view
│   ├── Resolution 0       # full-resolution data
│   └── Resolution 1       # downsampled pyramid level (if present)
├── Series 1
│   └── ...
└── ...

`bffile` is Stateless

If you're familiar with the Bio-Formats Java API, you will be used to using setSeries to change the active series before following up with calls to read data or metadata.

bffile.BioFile aims for a stateless API: all methods that pertain to a specific series or resolution level take an explicit series argument and an optional resolution level. Omitting these arguments defaults to series=0 and resolution=0. As a convenience, BioFile also provides a Series proxy object, described below.

Accessing Series

A Series object is a lightweight proxy that pre-fills the series= argument on all calls back to the parent BioFile:

BioFile implements Sequence[bffile.Series], so you can index, and iterate:

with BioFile("multi_scene.czi") as bf:
    print(len(bf))         # number of series

    s = bf[0]              # first series
    print(s.shape)             # (10, 2, 5, 512, 512)
    print(s.dtype)             # uint16
    print(s.is_rgb)            # False
    print(s.resolution_count)  # 1

    s = bf[-1]             # last series
    first_two = bf[0:2]    # slice returns list[Series]

    for series in bf:      # iterate over all series
        print(series.shape, series.dtype)

Series objects also have methods that mirror those on BioFile, but with the series argument pre-filled:

with BioFile("image.nd2") as bf:
    s = bf[0]
    arr = s.as_array()         # same as bf.as_array(series=0)
    meta = s.core_metadata()   # same as bf.core_metadata(series=0)

critical

The BioFile must be open while you use any Series objects obtained from it. If you don't want to use the context manager, you should manage the lifecycle explicitly with open() and close().

Reading Data with LazyBioArray

The recommended way to read pixel data is through LazyBioArray, obtained via as_array(). This object behaves like a numpy array but reads the minimal amount of data from disk when you index into it (including sub-plane/XY slicing).

with BioFile("image.nd2") as bf:
    arr = bf[0].as_array()  # no data read yet!
    print(arr)
    # LazyBioArray(shape=(10, 2, 5, 512, 512), dtype=uint16, file='image.nd2')

No data is loaded when you create the array. Indexing creates lazy views without reading data. Data is read from disk only when you materialize the view with np.asarray() or numpy other operations:

with BioFile("image.nd2") as bf:
    arr = bf[0].as_array()  # no reading yet!

    # create a lazy view of a single plane (t=0, c=0, z=2)
    plane_view = arr[0, 0, 2]              # LazyBioArray, shape: (512, 512)
    plane = np.asarray(plane_view)         # NOW data is read from disk

    # create lazy view of all timepoints for one channel and z-slice
    timeseries_view = arr[:, 0, 2]         # LazyBioArray, shape: (10, 512, 512)
    timeseries = np.asarray(timeseries_view)  # reads data

    # lazy view of a (100, 100) sub-region within the YX plane
    # in the third timepoint, for all channels, and the first z-slice
    roi_view = arr[2, :, 0, 100:200, 50:150]  # LazyBioArray, shape: (2, 100, 100)
    roi = np.asarray(roi_view)             # only the requested pixels are read

    # materialize the full dataset
    full = np.asarray(arr)                 # shape: (10, 2, 5, 512, 512)

LazyBioArray indexing creates lazy views with the following behavior:

Integer indexing squeezes that dimension: arr[0, 0, 2] returns a LazyBioArray view with shape (Y, X) instead of (1, 1, 1, Y, X).
Slice indexing keeps the dimension: arr[0:1, 0:1, 2:3] returns a view with shape (1, 1, 1, Y, X).
Ellipsis: arr[..., 100:200, 50:150] works as expected.
Negative indices: arr[-1] creates a view of the last timepoint.

Unsupported indexing

Step slicing (arr[::2]), fancy indexing (arr[[0, 2]]), and boolean masks (arr[arr > 100]) are not supported and will raise NotImplementedError.

Sub-region reads are efficient

When you slice the Y or X dimensions and then materialize the view, only the requested pixels are read from disk — not the full plane:

# create a view of a 100x100 pixel region
roi_view = arr[:, :, :, 200:300, 300:400]  # lazy view, no I/O yet
roi = np.asarray(roi_view)  # reads only the 100x100 region from each plane

This makes LazyBioArray well-suited for exploring large images without loading everything into memory.

Numpy integration

LazyBioArray implements the __array__ protocol, so you can pass it directly to numpy functions:

full_data = np.array(arr)           # materialize to ndarray
max_proj = np.max(arr, axis=2)      # z-projection (reads all data)

Keep the file open

The parent BioFile must remain open while you use LazyBioArray. Always use it inside the with block (or between explicit open()/close() calls).

Third-party array types

We support casting LazyBioArray to various third-party array types for interoperability with their ecosystems:

to_xarray() → xarray.DataArray
to_zarr_store() → zarr.abc.store.Store
to_dask() → dask.array.Array

You will find each of these methods on

BioFile: where you can specify series and resolution as arguments
Series: where the series argument is pre-filled, and you can specify resolution if needed.
LazyBioArray: where the returned object shares the same lazy view onto the data.

Labeled dimensions/coordinates with `xarray`

xarray provides a powerful data structure for working with labeled, multi-dimensional arrays. Bioforamts data is naturally represented as an xarray.DataArray with dimensions ("T", "C", "Z", "Y", "X", and sometimes "S" for RGB). Metadata is parsed and used to populate the .dims and .coords attributes. As you index into the array, the lazy reading behavior is preserved, so you can explore large datasets without loading everything into memory. Semantics for coords are as follows:

T: delta_t timestamps taken from OME pixels.planes
C: Channel names from OME pixels.channels (e.g. "DAPI", "FITC", etc...)
Z, Y, X: physical pixel sizes from OME pixels.physical_size_...
S: RGB/RGBA channels, if applicable.

with BioFile("image.nd2") as bf:
    xarr = bf.to_xarray(series=0)  # xarray.DataArray with dims and coords
    print(xarr.dims)       # ('T', 'C', 'Z', 'Y', 'X')
    print(xarr.coords)     # coordinates parsed from OME metadata

    # indexing still creates lazy views
    plane_view = xarr[0, 0, 2]  # xarray.DataArray view of a single plane
    assert "T" not in plane_view.dims  # dimension is squeezed out by integer indexing

    # you can also use named indexing with .sel and .isel
    plane_view = xarr.isel(T=0, C=0, Z=2)  # same plane view using named indexing
    red_channel = xarr.sel(C="Widefield Red")  # select by channel name (if available)

    # the full ome-types.OME metadata is available in the .attrs of the DataArray
    ome = xarr.attrs["ome_metadata"]

Complete virtual OME-Zarr view

The to_zarr_store() method returns a zarr.Store that can be passed to zarr.open(). When you cast a complete BioFile to a zarr store, without specifying a series or resolution, the returned store provides a virtual view of the entire file as a spec compliant OME-Zarr, (similar to what you would get if you converted the file to using bioformats2raw, but without requiring any conversion or additional disk space).

Groups follow the bioformats2raw.layout transitional spec

It's not 'free'

While viewing any bioformats-supported file as an OME-Zarr without conversion is a powerful feature: you should not assume that you will get the anywhere near same performance as a native OME-Zarr directory store. Performance will depend entirely on the native file structure, and many will not be as optimized for chunked access as a purpose-built, rechunked and compressed OME-Zarr.

This pattern is intended for quick interoperability and convenience, not for high performance.

from bffile import open_ome_zarr_group
import zarr

ome_zarr = open_ome_zarr_group("image.nd2")
assert isinstance(ome_zarr, zarr.Group)

ome_meta = ome_zarr.attrs["ome"]
assert "bioformats2raw.layout" in ome_meta

series0 = ome_zarr["0"]  # group for series 0
assert "multiscales" in series0.attrs["ome"]  # multiscales metadata is present
level0 = series0["0"]    # array for resolution level 0
assert isinstance(level0, zarr.Array)
print(level0.shape, level0.dtype)

Lazy computation with `dask`

For computations over large datasets, BioFile.to_dask wraps LazyBioArray in a dask array:

with BioFile("image.nd2") as bf:
    darr = bf.to_dask(chunks=(1, 1, 1, -1, -1))
    result = darr.mean(axis=2).compute()  # lazy z-projection

You don't gain any additional data reading "laziness" here. But you can use dask's rich ecosystem of chunked, parallelized computations and out-of-core algorithms on top of the lazy reading provided by LazyBioArray.

File must be open

Remember that the BioFile must be open when you .compute() the dask array.

You can also use tile-based chunking for very large planes:

with BioFile("image.nd2") as bf:
    darr = bf.to_dask(tile_size=(512, 512))       # explicit tile size
    darr = bf.to_dask(tile_size="auto")           # query Bio-Formats for optimal size

Dask is optional

Dask is not installed by default. Install it with:

pip install dask

# or, to get a version that we guarantee is compatible with bffile
# install the extra:
pip install bffile[dask]

Metadata

OME Metadata

ome_metadata() returns a rich, structured ome_types.OME object, with all of the metadata parsed and organized according to the OME data model.

with BioFile("image.nd2") as bf:
    ome = bf.ome_metadata        # parsed OME object
    xml_str = bf.ome_xml         # raw OME-XML string

    # explore structured metadata
    print(ome.images[0].pixels.size_x)
    print(ome.images[0].pixels.physical_size_x)

Core Metadata

core_metadata() returns a CoreMetadata dataclass with shape, dtype, and acquisition flags for a given series/resolution:

with BioFile("image.nd2") as bf:
    meta = bf.core_metadata(series=0, resolution=0)
    print(meta.shape)             # OMEShape(t=10, c=2, z=5, y=512, x=512, rgb=1)
    print(meta.dtype)             # uint16
    print(meta.dimension_order)   # "XYCZT"
    print(meta.is_little_endian)  # True

Global Metadata

global_metadata() returns reader/file-specific key/value pairs:

with BioFile("image.nd2") as bf:
    for key, value in bf.global_metadata().items():
        print(f"{key}: {value}")

Reader Discovery

You can query Bio-Formats for supported formats without opening a file:

from bffile import BioFile

# Bio-Formats version
print(BioFile.bioformats_version())  # "8.1.1"

# all supported file extensions
suffixes = BioFile.list_supported_suffixes()  # {"nd2", "czi", "tiff", ...}

# detailed reader info
for reader in BioFile.list_available_readers():
    print(f"{reader.format}: {reader.suffixes} (GPL={reader.is_gpl})")

Environment Variables

Variable	Description	Default
`BIOFORMATS_VERSION`	Bio-Formats version or full Maven coordinate (e.g. `"7.0.0"` or `"ome:formats-gpl:7.0.0"`)	`"ome:formats-gpl:RELEASE"`
`BIOFORMATS_MEMO_DIR`	Directory for `.bfmemo` cache files	same as file
`BFF_JAVA_VERSION`	Java version to use (e.g. `11`, `17`, `21`)	`11`
`BFF_JAVA_VENDOR`	Java vendor (e.g. `zulu-jre`, `temurin`, `adoptium`)	`zulu-jre`
`BFF_JAVA_FETCH`	Java fetch behavior: `always`, `never`, or `auto` (See scyjava docs).	`always`

Using bffile