cutcutcodec.core.nn.loader.Dataset

class cutcutcodec.core.nn.loader.Dataset(root: str | bytes | Path, selector: Callable[[Path], bool], **kwargs)[source]

Select files managing the probability.

Examples

>>> from cutcutcodec.core.nn.loader import Dataset
>>> from cutcutcodec.utils import get_project_root
>>> def selector(path):
...     return path.suffix == ".py"
...
>>> dataset = Dataset(get_project_root(), selector, max_size=128)
>>> len(dataset)
128
>>> dataset[0].relative_to(get_project_root())
PosixPath('__init__.py')
>>> dataset[1].relative_to(get_project_root())
PosixPath('__main__.py')
>>> dataset[2].relative_to(get_project_root())
PosixPath('utils.py')
>>> dataset[3].relative_to(get_project_root())
PosixPath('config/__init__.py')
>>> dataset[4].relative_to(get_project_root())
PosixPath('core/__init__.py')
>>> dataset[5].relative_to(get_project_root())
PosixPath('testing/__init__.py')
>>> dataset[6].relative_to(get_project_root())
PosixPath('config/config.py')
>>>

Initialise and create the class.

Parameters

rootpathlike

The root folder containing all the files of the dataset.

selectorcallable

Function that take a file pathlib.Path and return True to keep it or False to reject.

follow_symlinksbool, default=False

Follow the symbolink links if set to True.

max_sizeint, optional

The maximum number of files contained in the dataset.

decision_depthint, default=1

The thresold level befor to flatten the tree. If 0, all the file have the same proba to be drawn. If 1, the decision tree has only one root node If n, the decision tree has a maximum of n decks.

scan(*, _root=None, _depth=0) list[Path | list][source]

Rescan the dataset to update the properties.