pugh_torch.datasets package¶

Subpackages¶

Submodules¶

pugh_torch.datasets.base module¶

Design philosophies/rules:

All datasets in this repo are a child of Dataset.
All paths are pathlib.Path objects.
- If something cannot handle it as a Path object, cast it to a string as late as possible.
Whenever possible, require the least amount of effort on the dev’s part to get a dataset downloaded and properly formatted.
Dataset directories are automatically parsed/derived, so no need to prompt the developer on where they want their dataset files.
self.transform is ONLY ever used in the dev’s implementation of self.__getitem__. However, the package albumentations does a great job, so when in doubt, assume this is a albumentations.Compose.

To implement your own dataset:

Subclass the pugh_torch.datasets.Dataset class. This class itself is a subclass of torch.utils.data.Dataset.
Implement the download method:

def download(self):
# the local folder (guarenteed to exist) is self.path

This will only be called if the downloaded data isn’t available. The download being available is determined by a sentinel “downloaded” file.
Implement the unpack method:

def unpack(self):
# the local folder (guarenteed to exist) is self.path

This will only be called if the data hasn’t been unpacked yet. The unpacked being available is determined by a sentinel “unpacked” file.
Follow the other remaining instructions at:
https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset
Registration, path-handling, and all of that other stuff is automatically handled.

class pugh_torch.datasets.base.Dataset(split='train', *, transform=None, **kwargs)[source]¶

Bases: torch.utils.data.dataset.Dataset

Attempts to download data.

Parameters

split (str) – One of {“train”, “val”, “test”}. Which data partition to use. Case insensitive.
transform (obj) – Whatever format you want. Depends on dataset __getitem__ implementation. Defaults to just a ToTensor transform. This attribute is NOT used anywhere except in the dataset-specific __get__ implementation, or other parent classes of the dataset..

download()[source]¶

Function to download data to self.path.

The directories up to self.path have already been created.

Will only be called if data has not been downloaded.

property downloaded¶: We detect if the data has been fully downloaded by a “downloaded” file in the root of the data directory.

property downloaded_file¶

property path¶: pathlib.Path to the root of the stored data

unpack()[source]¶

Post-process the downloaded payload.

Typically this will be something like unpacking a tar file, or possibly re-arranging files.

property unpacked¶: We detect if the data has been fully unpacked by a “unpacked” file in the root of the data directory.

property unpacked_file¶

pugh_torch.datasets.nyuv2 module¶

class pugh_torch.datasets.nyuv2.NYUv2(*args, raw_depth=False, types=['rgb', 'depth'], transform=None, **kwargs)[source]¶

Bases: pugh_torch.datasets.base.Dataset

rgbnp.array uint8: Images in RGB order
depthnp.array float32: Depth in meters

Parameters

raw_depth (bool) – Return the depth data before invalid areas were infilled. Defaults to False.
types (list of str) – Data types to return.

DOWNLOAD_URL = 'http://horatio.cs.nyu.edu/mit/silberman/nyu_depth_v2/nyu_depth_v2_labeled.mat'¶

K = array([[518.85790117, 0. , 325.58244941], [ 0. , 519.46961112, 253.73616633], [ 0. , 0. , 1. ]])¶

K4 = array([518.85790117, 519.46961112, 325.58244941, 253.73616633])¶

PAYLOAD_NAME = 'nyu_depth_v2_labeled.mat'¶

available_types = {'depth', 'instances', 'labels', 'rgb'}¶

cx = 325.58244941119034¶

cy = 253.73616633400465¶

download()[source]¶

Function to download data to self.path.

The directories up to self.path have already been created.

Will only be called if data has not been downloaded.

fx = 518.8579011745019¶

fy = 519.4696111212749¶

unpack()[source]¶: No unpacking necessary

pugh_torch.datasets.torchvision module¶

Lightly wraps torchvision datasets.

This just allows us greater customization without modifying another repo.

Most notably, this:

Automatically gets the torchvision dataset constructor based on name
Moves the transform responsibility to us
Applies our automatic opinionated pathing rules.

class pugh_torch.datasets.torchvision.TorchVisionDataset(*args, **kwargs)[source]¶

Bases: pugh_torch.datasets.base.Dataset

Attempts to download data.

Parameters

split (str) – One of {“train”, “val”, “test”}. Which data partition to use. Case insensitive.
transform (obj) – Whatever format you want. Depends on dataset __getitem__ implementation. Defaults to just a ToTensor transform. This attribute is NOT used anywhere except in the dataset-specific __get__ implementation, or other parent classes of the dataset..

auto_construct = True¶

property class_to_idx¶

property classes¶

download()[source]¶: Handled by the torchvision dataset

unpack()[source]¶: Handled by the torchvision dataset

Module contents¶

pugh_torch.datasets.__init__

The root dataset path can be set via the environmental variable PUGH_TORCH_DATASETS_PATH.

I don’t expose this in code because I think it just clutters the code.

pugh_torch.datasets.get(*args)[source]¶

Gets dataset constructor from string identifiers

Example:: constructor = get(“classification”, “imagenet”)

Parameters: *args (str) – Case-insensitive Strings that lead to a dataset. Typically in form (genre, name) Type of dataset. e.x. “classification”.

pugh_torch.datasets.get_dataset(*args)¶

Gets dataset constructor from string identifiers

Example:: constructor = get(“classification”, “imagenet”)

Parameters: *args (str) – Case-insensitive Strings that lead to a dataset. Typically in form (genre, name) Type of dataset. e.x. “classification”.