![]() |
|
DeepHealth Deep Learning Dataset. More...
#include <support_eddl.h>
Public Member Functions | |
| DLDataset (const filesystem::path &filename, const int batch_size, const DatasetAugmentations &augs, const ColorType ctype=ColorType::RGB, const ColorType ctype_gt=ColorType::GRAY, const unsigned num_workers=1, const double queue_ratio_size=1., const std::vector< bool > &drop_last={}, bool verify=false) | |
| void | ResetBatch (const ecvl::any &split=-1, bool shuffle=false) |
| Reset the batch counter and optionally shuffle samples indices of the specified split. More... | |
| void | ResetAllBatches (bool shuffle=false) |
| Reset the batch counter of each split and optionally shuffle samples indices (within each split). More... | |
| void | LoadBatch (Tensor *&images, Tensor *&labels) |
Load a batch into images and labels tensor. More... | |
| void | LoadBatch (Tensor *&images) |
Load a batch into images tensor. Useful for tests set when you don't have labels. More... | |
| void | SetBatchSize (int bs) |
| Set a new batch size inside the dataset. More... | |
| virtual void | ProduceImageLabel (DatasetAugmentations &augs, Sample &elem) |
| Load a sample and its label, and push them to the producers-consumer queue. More... | |
| void | ThreadFunc (int thread_index) |
| Function called when the thread are spawned. More... | |
| std::tuple< std::vector< Sample >, std::shared_ptr< Tensor >, std::shared_ptr< Tensor > > | GetBatch () |
| Pop batch_size samples from the queue and copy them into EDDL tensors. More... | |
| void | Start (int split_index=-1) |
| Spawn num_workers thread. More... | |
| void | Stop () |
| Join all the threads. More... | |
| auto | GetQueueSize () const |
| Get the current size of the producers-consumer queue of the dataset. More... | |
| void | SetAugmentations (const DatasetAugmentations &da) |
| Set the dataset augmentations. More... | |
| const int | GetNumBatches (const ecvl::any &split=-1) |
| Get the number of batches of the specified split. More... | |
| void | ToTensorPlane (const std::vector< int > &label, Tensor *&tensor) |
| Convert the sample labels in a one-hot encoded tensor and copy it to the batch tensor. More... | |
| void | SetWorkers (const unsigned num_workers) |
| Change the number of workers. More... | |
| void | SetNumChannels (const int n_channels, const int n_channels_gt=1) |
| Change the number of channels of the Image produced by ECVL and update the internal EDDL tensors shape accordingly. Useful for custom data loading. More... | |
Public Member Functions inherited from ecvl::Dataset | |
| Dataset () | |
| Dataset (const filesystem::path &filename, bool verify=false) | |
| virtual | ~Dataset () |
| std::vector< int > & | GetSplit (const ecvl::any &split=-1) |
| Returns the image indexes of the requested split. More... | |
| void | SetSplit (const ecvl::any &split) |
| Set the current split. More... | |
| void | Dump (const filesystem::path &file_path) |
| Dump the Dataset into a YAML file following the DeepHealth Dataset Format. More... | |
| std::vector< std::vector< filesystem::path > > | GetLocations () const |
| Retrieve the list of all samples locations in the dataset file. More... | |
Static Public Member Functions | |
| static void | SetSplitSeed (unsigned seed) |
| Set a fixed seed for the random generated values. Useful to reproduce experiments with same shuffling during training. More... | |
Public Attributes | |
| int | n_channels_ |
| Number of channels of the images. More... | |
| int | n_channels_gt_ = -1 |
| Number of channels of the ground truth images. More... | |
| std::vector< int > | resize_dims_ |
| Dimensions (HxW) to which Dataset images must be resized. More... | |
Public Attributes inherited from ecvl::Dataset | |
| std::string | name_ = "DeepHealth dataset" |
| Name of the Dataset. More... | |
| std::string | description_ = "This is the DeepHealth example dataset!" |
| Description of the Dataset. More... | |
| std::vector< std::string > | classes_ |
| Vector with all the classes available in the Dataset. More... | |
| std::vector< std::string > | features_ |
| Vector with all the features available in the Dataset. More... | |
| std::vector< Sample > | samples_ |
| Vector containing all the Dataset samples. See Sample. More... | |
| std::vector< Split > | split_ |
| Splits of the Dataset. See Split. More... | |
| int | current_split_ = -1 |
| Current split from which images are loaded. More... | |
| Task | task_ |
| Task of the dataset. More... | |
Protected Member Functions | |
| void | InitTC (int split_index) |
| Set which are the indices of the samples managed by each thread. More... | |
| void | SetTensorsShape () |
| Set internal EDDL tensors shape. More... | |
Protected Member Functions inherited from ecvl::Dataset | |
| std::vector< ecvl::Split >::iterator | GetSplitIt (ecvl::any split) |
| const int | GetSplitIndex (ecvl::any split) |
Protected Attributes | |
| int | batch_size_ |
| Size of each dataset mini batch. More... | |
| std::vector< int > | current_batch_ |
| Number of batches already loaded for each split. More... | |
| ColorType | ctype_ |
| ecvl::ColorType of the Dataset images. More... | |
| ColorType | ctype_gt_ |
| ecvl::ColorType of the Dataset ground truth images. More... | |
| DatasetAugmentations | augs_ |
| ecvl::DatasetAugmentations to be applied to the Dataset images (and ground truth if exist) for each split. More... | |
| unsigned | num_workers_ |
| Number of parallel workers. More... | |
| ProducersConsumerQueue | queue_ |
| Producers-consumer queue of the dataset. More... | |
| std::pair< std::vector< int >, std::vector< int > > | tensors_shape_ |
| Shape of sample and label tensors. More... | |
| std::vector< std::vector< ThreadCounters > > | splits_tc_ |
| Each dataset split has its own vector of threads, each of which has its counters: <counter,min,max>. More... | |
| std::vector< std::thread > | producers_ |
| Vector of threads representing the samples producers. More... | |
| bool | active_ = false |
| Whether the threads have already been launched or not. More... | |
| std::mutex | active_mutex_ |
| Mutex for active_ variable. More... | |
Static Protected Attributes | |
| static std::default_random_engine | re_ |
| Engine used for random number generation. More... | |
Additional Inherited Members | |
Static Public Attributes inherited from ecvl::Dataset | |
| static const std::regex | url_regex_ |
DeepHealth Deep Learning Dataset.
This class extends the DeepHealth Dataset with Deep Learning specific members.
Definition at line 275 of file support_eddl.h.
|
inline |
| [in] | filename | Path to the Dataset file. |
| [in] | batch_size | Size of each dataset mini batch. |
| [in] | augs | Array with DatasetAugmentations to be applied to the Dataset images (and ground truth if exists) for each split. If no augmentation is required nullptr has to be passed. |
| [in] | ctype | ecvl::ColorType of the Dataset images. Default is RGB. |
| [in] | ctype_gt | ecvl::ColorType of the Dataset ground truth images. Default is GRAY. |
| [in] | num_workers | Number of parallel threads spawned. |
| [in] | queue_ratio_size | The producers-consumer queue will have a maximum size equal to \(batch\_size \times queue\_ratio\_size \times num\_workers\). |
| [in] | drop_last | For each split, whether to drop the last samples that don't fit the batch size or not. The vector dimensions must match the number of splits. |
| [in] | verify | If true, a list of all the images in the Dataset file which don't exist is printed with an ECVL_WARNING_MSG. |
Definition at line 332 of file support_eddl.h.
| std::tuple<std::vector<Sample>, std::shared_ptr<Tensor>, std::shared_ptr<Tensor> > ecvl::DLDataset::GetBatch | ( | ) |
Pop batch_size samples from the queue and copy them into EDDL tensors.
| const int ecvl::DLDataset::GetNumBatches | ( | const ecvl::any & | split = -1 | ) |
Get the number of batches of the specified split.
If no split is provided or an illegal value is provided, the number of batches of the current split is returned.
| [in] | split | index, name or ecvl::SplitType representing the split from which to get the number of batches. |
|
inline |
Get the current size of the producers-consumer queue of the dataset.
Definition at line 483 of file support_eddl.h.
|
protected |
Set which are the indices of the samples managed by each thread.
| [in] | split_index | index of the split to initialize. |
| void ecvl::DLDataset::LoadBatch | ( | Tensor *& | images, |
| Tensor *& | labels | ||
| ) |
Load a batch into images and labels tensor.
| [out] | images | tensor which stores the batch of images. |
| [out] | labels | tensor which stores the batch of labels. |
| void ecvl::DLDataset::LoadBatch | ( | Tensor *& | images | ) |
Load a batch into images tensor. Useful for tests set when you don't have labels.
| [out] | images | tensor which stores the batch of images. |
|
virtual |
Load a sample and its label, and push them to the producers-consumer queue.
| [in] | elem | Sample to load and push to the queue. |
| void ecvl::DLDataset::ResetAllBatches | ( | bool | shuffle = false | ) |
Reset the batch counter of each split and optionally shuffle samples indices (within each split).
| [in] | shuffle | boolean which indicates whether to shuffle the samples indices or not. |
| void ecvl::DLDataset::ResetBatch | ( | const ecvl::any & | split = -1, |
| bool | shuffle = false |
||
| ) |
Reset the batch counter and optionally shuffle samples indices of the specified split.
If no split is provided or an illegal value is provided, the current split is reset.
| [in] | split_index | index, name or SplitType of the split to reset. |
| [in] | shuffle | boolean which indicates whether to shuffle the split samples indices or not. |
| void ecvl::DLDataset::SetAugmentations | ( | const DatasetAugmentations & | da | ) |
Set the dataset augmentations.
| [in] | da | DatasetAugmentations to set. |
| void ecvl::DLDataset::SetBatchSize | ( | int | bs | ) |
Set a new batch size inside the dataset.
Notice that this will not affect the EDDL network batch size, that it has to be changed too.
| [in] | bs | Value to set for the batch size. |
|
inline |
Change the number of channels of the Image produced by ECVL and update the internal EDDL tensors shape accordingly. Useful for custom data loading.
| [in] | n_channels | Number of channels of input Image. |
| [in] | n_channels_gt | Number of channels of ground truth. |
Definition at line 528 of file support_eddl.h.
|
inlinestatic |
Set a fixed seed for the random generated values. Useful to reproduce experiments with same shuffling during training.
| [in] | seed | Value of the seed for the random engine. |
Definition at line 439 of file support_eddl.h.
|
inlineprotected |
Set internal EDDL tensors shape.
Definition at line 300 of file support_eddl.h.
|
inline |
Change the number of workers.
| [in] | num_workers | Number of threads/workers that will be spawned. |
Definition at line 510 of file support_eddl.h.
| void ecvl::DLDataset::Start | ( | int | split_index = -1 | ) |
Spawn num_workers thread.
| [in] | split_index | Index of the split to use in the GetBatch function. If not specified, current split is used. |
| void ecvl::DLDataset::Stop | ( | ) |
Join all the threads.
| void ecvl::DLDataset::ThreadFunc | ( | int | thread_index | ) |
Function called when the thread are spawned.
ProduceImageLabel is called for each sample under the competence of the thread.
| [in] | thread_index | index of the thread. |
| void ecvl::DLDataset::ToTensorPlane | ( | const std::vector< int > & | label, |
| Tensor *& | tensor | ||
| ) |
Convert the sample labels in a one-hot encoded tensor and copy it to the batch tensor.
| [in] | label | vector of the sample labels |
| [out] | tensor | EDDL Tensor in which to copy the labels (dimensions: [batch_size, num_classes]) |
|
protected |
Whether the threads have already been launched or not.
Definition at line 289 of file support_eddl.h.
|
protected |
Mutex for active_ variable.
Definition at line 290 of file support_eddl.h.
|
protected |
ecvl::DatasetAugmentations to be applied to the Dataset images (and ground truth if exist) for each split.
Definition at line 283 of file support_eddl.h.
|
protected |
Size of each dataset mini batch.
Definition at line 279 of file support_eddl.h.
|
protected |
ecvl::ColorType of the Dataset images.
Definition at line 281 of file support_eddl.h.
|
protected |
ecvl::ColorType of the Dataset ground truth images.
Definition at line 282 of file support_eddl.h.
|
protected |
Number of batches already loaded for each split.
Definition at line 280 of file support_eddl.h.
| int ecvl::DLDataset::n_channels_ |
Number of channels of the images.
Definition at line 317 of file support_eddl.h.
| int ecvl::DLDataset::n_channels_gt_ = -1 |
Number of channels of the ground truth images.
Definition at line 318 of file support_eddl.h.
|
protected |
Number of parallel workers.
Definition at line 284 of file support_eddl.h.
|
protected |
Vector of threads representing the samples producers.
Definition at line 288 of file support_eddl.h.
|
protected |
Producers-consumer queue of the dataset.
Definition at line 285 of file support_eddl.h.
|
staticprotected |
Engine used for random number generation.
Definition at line 291 of file support_eddl.h.
| std::vector<int> ecvl::DLDataset::resize_dims_ |
Dimensions (HxW) to which Dataset images must be resized.
Definition at line 319 of file support_eddl.h.
|
protected |
Each dataset split has its own vector of threads, each of which has its counters: <counter,min,max>.
Definition at line 287 of file support_eddl.h.
|
protected |
Shape of sample and label tensors.
Definition at line 286 of file support_eddl.h.
1.8.15