Changelog#
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[0.0.8]#
Loaderacccepts anrngargument now
[0.0.7]#
Make the in-memory concatenation strategy configurable for
annbatch.Loader.__iter__()via aconcat_strategyargument to__init__- sparse on-disk will concatenated then shuffled/yielded (faster, higher memory usage) but dense will be shuffled and then concated/yielded (lower memory usage).Downcast
indicesof sparse matrices if possible when writing to disk viaanndata.settings.write_csr_csc_indices_with_min_possible_dtype
[0.0.6]#
Don’t concatenate all i/o-ed chunks in-memory, instead yielding from individual chunks as though they were concatenated (i.e., not abreaking hcange with the
annbatch.abc.SamplerAPI). Should improve memory performance especially for dense data
[0.0.5]#
Fix bug with bringing the nullable/categorical columns into memory by default
Breaking#
Now
annbatch.Loaderexpectspreload_nchunks * chunk_size % batch_size == 0for simplification and efficiency.
Added#
Introduced an
annbatch.abc.Samplerabstract base class. Users can implement and pass any class instance that is a subclass to thebatch_samplerargument ofannbatch.Loader.Exposed the older default sampling scheme as
annbatch.ChunkSampler, which is used internally to match older behavior whenbatch_samplerisn’t provided toannbatch.Loader.
[0.0.4]#
Load into memory nullables/categoricals from
obsby default when shuffling (i.e., no customload_adataargument toannbatch.DatasetCollection.add_adatas())
[0.0.3]#
Breaking#
Revert
h5adshuffling into one big store (i.e., go back to sharding into individual files) and add warning thath5adis not fully supported byannbatch.is_collection_h5adargument to initialization ofannbatch.DatasetCollectionmust be passed when initializing into to use a preshuffled collection ofh5adfiles, reading or writing.Renamed
annbatch.types.LoaderOutput["labels"]and["data"]to["obs"]and["X"]respectively.
[0.0.2]#
Breaking#
ZarrSparseDatasetandZarrDenseDatasethave been conslidated intoannbatch.Loadercreate_anndata_collectionandadd_to_collectionhave been moved into theannbatch.DatasetCollection.add_adatas()methodDefault reading of input data is now fully lazy in
annbatch.DatasetCollection.add_adatas(), and therefore the shuffle process may now be slower although have better memory properties. Useload_adataargument inannbatch.DatasetCollection.add_adatas()to customize this behavior.Files shuffled under the old
create_anndata_collectionwill not be recognized byannbatch.DatasetCollectionand therefore are not usable with the newannbatch.Loader.use_collectionAPI. At the moment, the file metadata we maintain is only for internal purposes - however, if you wish to migrate to be able to useannbatch.DatasetCollectionin conjunction withannbatch.Loader.use_collection, the root folder of the old collection must have attrs{"encoding-type": "annbatch-preshuffled", "encoding-version": "0.1.0"}and be azarr.Group. The subfolders (i.e., datasets) must be calleddataset_([0-9]*). Otherwise you can use theannbatch.Loader.add_anndatas()as before.
Changed#
preload_to_gpunow depends on whethercupyis installed instead of defaulting toTrue
[0.0.1]#
Added#
First release