pugh_torch.datasets package¶
Submodules¶
pugh_torch.datasets.base module¶
- Design philosophies/rules:
All datasets in this repo are a child of
Dataset.- All paths are pathlib.Path objects.
If something cannot handle it as a Path object, cast it to a string as late as possible.
Whenever possible, require the least amount of effort on the dev’s part to get a dataset downloaded and properly formatted.
Dataset directories are automatically parsed/derived, so no need to prompt the developer on where they want their dataset files.
self.transformis ONLY ever used in the dev’s implementation ofself.__getitem__. However, the packagealbumentationsdoes a great job, so when in doubt, assume this is aalbumentations.Compose.
- To implement your own dataset:
Subclass the
pugh_torch.datasets.Datasetclass. This class itself is a subclass oftorch.utils.data.Dataset.- Implement the download method:
- def download(self):
# the local folder (guarenteed to exist) is
self.path
This will only be called if the downloaded data isn’t available. The download being available is determined by a sentinel “downloaded” file.
- Implement the unpack method:
- def unpack(self):
# the local folder (guarenteed to exist) is
self.path
This will only be called if the data hasn’t been unpacked yet. The unpacked being available is determined by a sentinel “unpacked” file.
- Follow the other remaining instructions at:
https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset
Registration, path-handling, and all of that other stuff is automatically handled.
-
class
pugh_torch.datasets.base.Dataset(split='train', *, transform=None, **kwargs)[source]¶ Bases:
torch.utils.data.dataset.DatasetAttempts to download data.
- Parameters
split (str) – One of {“train”, “val”, “test”}. Which data partition to use. Case insensitive.
transform (obj) – Whatever format you want. Depends on dataset __getitem__ implementation. Defaults to just a
ToTensortransform. This attribute is NOT used anywhere except in the dataset-specific __get__ implementation, or other parent classes of the dataset..
-
download()[source]¶ Function to download data to
self.path.The directories up to
self.pathhave already been created.Will only be called if data has not been downloaded.
-
property
downloaded¶ We detect if the data has been fully downloaded by a “downloaded” file in the root of the data directory.
-
property
downloaded_file¶
-
property
path¶ pathlib.Path to the root of the stored data
-
unpack()[source]¶ Post-process the downloaded payload.
Typically this will be something like unpacking a tar file, or possibly re-arranging files.
-
property
unpacked¶ We detect if the data has been fully unpacked by a “unpacked” file in the root of the data directory.
-
property
unpacked_file¶
pugh_torch.datasets.nyuv2 module¶
-
class
pugh_torch.datasets.nyuv2.NYUv2(*args, raw_depth=False, types=['rgb', 'depth'], transform=None, **kwargs)[source]¶ Bases:
pugh_torch.datasets.base.Dataset- rgbnp.array uint8
Images in RGB order
- depthnp.array float32
Depth in meters
- Parameters
raw_depth (bool) – Return the depth data before invalid areas were infilled. Defaults to
False.types (list of str) – Data types to return.
-
DOWNLOAD_URL= 'http://horatio.cs.nyu.edu/mit/silberman/nyu_depth_v2/nyu_depth_v2_labeled.mat'¶
-
K= array([[518.85790117, 0. , 325.58244941], [ 0. , 519.46961112, 253.73616633], [ 0. , 0. , 1. ]])¶
-
K4= array([518.85790117, 519.46961112, 325.58244941, 253.73616633])¶
-
PAYLOAD_NAME= 'nyu_depth_v2_labeled.mat'¶
-
available_types= {'depth', 'instances', 'labels', 'rgb'}¶
-
cx= 325.58244941119034¶
-
cy= 253.73616633400465¶
-
download()[source]¶ Function to download data to
self.path.The directories up to
self.pathhave already been created.Will only be called if data has not been downloaded.
-
fx= 518.8579011745019¶
-
fy= 519.4696111212749¶
pugh_torch.datasets.torchvision module¶
Lightly wraps torchvision datasets.
This just allows us greater customization without modifying another repo.
- Most notably, this:
Automatically gets the torchvision dataset constructor based on name
Moves the transform responsibility to us
Applies our automatic opinionated pathing rules.
-
class
pugh_torch.datasets.torchvision.TorchVisionDataset(*args, **kwargs)[source]¶ Bases:
pugh_torch.datasets.base.DatasetAttempts to download data.
- Parameters
split (str) – One of {“train”, “val”, “test”}. Which data partition to use. Case insensitive.
transform (obj) – Whatever format you want. Depends on dataset __getitem__ implementation. Defaults to just a
ToTensortransform. This attribute is NOT used anywhere except in the dataset-specific __get__ implementation, or other parent classes of the dataset..
-
auto_construct= True¶
-
property
class_to_idx¶
-
property
classes¶
Module contents¶
pugh_torch.datasets.__init__
The root dataset path can be set via the environmental variable
PUGH_TORCH_DATASETS_PATH.
I don’t expose this in code because I think it just clutters the code.
-
pugh_torch.datasets.get(*args)[source]¶ Gets dataset constructor from string identifiers
- Example:
constructor = get(“classification”, “imagenet”)
- Parameters
*args (str) – Case-insensitive Strings that lead to a dataset. Typically in form
(genre, name)Type of dataset. e.x. “classification”.
-
pugh_torch.datasets.get_dataset(*args)¶ Gets dataset constructor from string identifiers
- Example:
constructor = get(“classification”, “imagenet”)
- Parameters
*args (str) – Case-insensitive Strings that lead to a dataset. Typically in form
(genre, name)Type of dataset. e.x. “classification”.