annbatch.abc.Sampler#

class annbatch.abc.Sampler#

Base sampler class.

Samplers control how data is batched and loaded from the underlying datasets.

Attributes table#

batch_size

The batch size for data loading.

shuffle

Whether data is shuffled.

Methods table#

_sample(n_obs)

Implementation of the sample method.

sample(n_obs)

Sample load requests given the total number of observations.

validate(n_obs)

Validate the sampler configuration against the given n_obs.

Attributes#

Sampler.batch_size#

The batch size for data loading.

Note

This property is only used when the splits argument is not supplied in the annbatch.types.LoadRequest. When splits are explicitly provided, they determine the batch boundaries instead.

Returns:

int The number of observations per batch.

Sampler.shuffle#

Whether data is shuffled.

If batch_size is provided and annbatch.types.LoadRequest.splits is not, in-memory loaded data will be shuffled or not based on this param.

Shuffling of on-disk data is up to the user (controlled by chunks parameter in annbatch.types.LoadRequest).

Returns:

bool True if data should be shuffled, False otherwise.

Methods#

abstractmethod Sampler._sample(n_obs)#

Implementation of the sample method.

This method is called by the sample method to perform the actual sampling after validation has passed.

Parameters:
n_obs int

The total number of observations available.

Yields:

LoadRequest – Load requests for batching data.

Return type:

Iterator[LoadRequest]

Sampler.sample(n_obs)#

Sample load requests given the total number of observations.

Base implemention simply calls validate() and then yields via _sample().

Parameters:
n_obs int

The total number of observations available.

Yields:

LoadRequest – Load requests for batching data.

Return type:

Iterator[LoadRequest]

abstractmethod Sampler.validate(n_obs)#

Validate the sampler configuration against the given n_obs.

This method is called at the start of each sample() call. Override this method to add custom validation for sampler parameters.

Parameters:
n_obs int

The total number of observations in the loader.

Raises:

ValueError – If the sampler configuration is invalid for the given n_obs.

Return type:

None