cutcutcodec.core.nn.dataset.base.Dataset
- class cutcutcodec.core.nn.dataset.base.Dataset(root: Path | str | bytes, selector: Callable[[Path], bool], **kwargs)[source]
Select files managing the probability.
Examples
>>> from cutcutcodec.core.nn.dataset.base import Dataset >>> from cutcutcodec.utils import get_project_root >>> def selector(path) -> bool: ... return path.suffix == ".py" ... >>> dataset = Dataset(get_project_root(), selector, max_len=128) >>> len(dataset) 128 >>> dataset[0].relative_to(get_project_root()) PosixPath('__init__.py') >>> dataset[1].relative_to(get_project_root()) PosixPath('__main__.py') >>> dataset[2].relative_to(get_project_root()) PosixPath('doc.py') >>> dataset[3].relative_to(get_project_root()) PosixPath('utils.py') >>> dataset[4].relative_to(get_project_root()) PosixPath('config/__init__.py') >>> dataset[5].relative_to(get_project_root()) PosixPath('core/__init__.py') >>> dataset[6].relative_to(get_project_root()) PosixPath('testing/__init__.py') >>>
Initialise and create the class.
Parameters
- rootpathlike
The root folder containing all the files of the dataset.
- selectorcallable
Function that take a file pathlib.Path and return True to keep it or False to reject.
- follow_symlinksbool, default=False
Follow the symbolink links if set to True.
- max_lenint, optional
The maximum number of files contained in the dataset.
- decision_depthint, default=1
The threshold level before to flatten the tree. If 0, all the file have the same proba to be drawn. If 1, the decision tree has only one root node If n, the decision tree has a maximum of n decks.