![]() |
|
DeepHealth Deep Learning Dataset. More...
#include <support_eddl.h>
Public Member Functions | |
DLDataset (const filesystem::path &filename, const int batch_size, const DatasetAugmentations &augs, const ColorType ctype=ColorType::RGB, const ColorType ctype_gt=ColorType::GRAY, const unsigned num_workers=1, const double queue_ratio_size=1., const std::vector< bool > &drop_last={}, bool verify=false) | |
void | ResetBatch (const ecvl::any &split=-1, bool shuffle=false) |
Reset the batch counter and optionally shuffle samples indices of the specified split. More... | |
void | ResetAllBatches (bool shuffle=false) |
Reset the batch counter of each split and optionally shuffle samples indices (within each split). More... | |
void | LoadBatch (Tensor *&images, Tensor *&labels) |
Load a batch into images and labels tensor . More... | |
void | LoadBatch (Tensor *&images) |
Load a batch into images tensor . Useful for tests set when you don't have labels. More... | |
void | SetBatchSize (int bs) |
Set a new batch size inside the dataset. More... | |
virtual void | ProduceImageLabel (DatasetAugmentations &augs, Sample &elem) |
Load a sample and its label, and push them to the producers-consumer queue. More... | |
void | ThreadFunc (int thread_index) |
Function called when the thread are spawned. More... | |
std::tuple< std::vector< Sample >, std::shared_ptr< Tensor >, std::shared_ptr< Tensor > > | GetBatch () |
Pop batch_size samples from the queue and copy them into EDDL tensors. More... | |
void | Start (int split_index=-1) |
Spawn num_workers thread. More... | |
void | Stop () |
Join all the threads. More... | |
auto | GetQueueSize () const |
Get the current size of the producers-consumer queue of the dataset. More... | |
void | SetAugmentations (const DatasetAugmentations &da) |
Set the dataset augmentations. More... | |
const int | GetNumBatches (const ecvl::any &split=-1) |
Get the number of batches of the specified split. More... | |
void | ToTensorPlane (const std::vector< int > &label, Tensor *&tensor) |
Convert the sample labels in a one-hot encoded tensor and copy it to the batch tensor. More... | |
void | SetWorkers (const unsigned num_workers) |
Change the number of workers. More... | |
void | SetNumChannels (const int n_channels, const int n_channels_gt=1) |
Change the number of channels of the Image produced by ECVL and update the internal EDDL tensors shape accordingly. Useful for custom data loading. More... | |
![]() | |
Dataset () | |
Dataset (const filesystem::path &filename, bool verify=false) | |
virtual | ~Dataset () |
std::vector< int > & | GetSplit (const ecvl::any &split=-1) |
Returns the image indexes of the requested split. More... | |
void | SetSplit (const ecvl::any &split) |
Set the current split. More... | |
void | Dump (const filesystem::path &file_path) |
Dump the Dataset into a YAML file following the DeepHealth Dataset Format. More... | |
std::vector< std::vector< filesystem::path > > | GetLocations () const |
Retrieve the list of all samples locations in the dataset file. More... | |
Static Public Member Functions | |
static void | SetSplitSeed (unsigned seed) |
Set a fixed seed for the random generated values. Useful to reproduce experiments with same shuffling during training. More... | |
Public Attributes | |
int | n_channels_ |
Number of channels of the images. More... | |
int | n_channels_gt_ = -1 |
Number of channels of the ground truth images. More... | |
std::vector< int > | resize_dims_ |
Dimensions (HxW) to which Dataset images must be resized. More... | |
![]() | |
std::string | name_ = "DeepHealth dataset" |
Name of the Dataset. More... | |
std::string | description_ = "This is the DeepHealth example dataset!" |
Description of the Dataset. More... | |
std::vector< std::string > | classes_ |
Vector with all the classes available in the Dataset. More... | |
std::vector< std::string > | features_ |
Vector with all the features available in the Dataset. More... | |
std::vector< Sample > | samples_ |
Vector containing all the Dataset samples. See Sample. More... | |
std::vector< Split > | split_ |
Splits of the Dataset. See Split. More... | |
int | current_split_ = -1 |
Current split from which images are loaded. More... | |
Task | task_ |
Task of the dataset. More... | |
Protected Member Functions | |
void | InitTC (int split_index) |
Set which are the indices of the samples managed by each thread. More... | |
void | SetTensorsShape () |
Set internal EDDL tensors shape. More... | |
![]() | |
std::vector< ecvl::Split >::iterator | GetSplitIt (ecvl::any split) |
const int | GetSplitIndex (ecvl::any split) |
Protected Attributes | |
int | batch_size_ |
Size of each dataset mini batch. More... | |
std::vector< int > | current_batch_ |
Number of batches already loaded for each split. More... | |
ColorType | ctype_ |
ecvl::ColorType of the Dataset images. More... | |
ColorType | ctype_gt_ |
ecvl::ColorType of the Dataset ground truth images. More... | |
DatasetAugmentations | augs_ |
ecvl::DatasetAugmentations to be applied to the Dataset images (and ground truth if exist) for each split. More... | |
unsigned | num_workers_ |
Number of parallel workers. More... | |
ProducersConsumerQueue | queue_ |
Producers-consumer queue of the dataset. More... | |
std::pair< std::vector< int >, std::vector< int > > | tensors_shape_ |
Shape of sample and label tensors. More... | |
std::vector< std::vector< ThreadCounters > > | splits_tc_ |
Each dataset split has its own vector of threads, each of which has its counters: <counter,min,max>. More... | |
std::vector< std::thread > | producers_ |
Vector of threads representing the samples producers. More... | |
bool | active_ = false |
Whether the threads have already been launched or not. More... | |
std::mutex | active_mutex_ |
Mutex for active_ variable. More... | |
Static Protected Attributes | |
static std::default_random_engine | re_ |
Engine used for random number generation. More... | |
Additional Inherited Members | |
![]() | |
static const std::regex | url_regex_ |
DeepHealth Deep Learning Dataset.
This class extends the DeepHealth Dataset with Deep Learning specific members.
Definition at line 275 of file support_eddl.h.
|
inline |
[in] | filename | Path to the Dataset file. |
[in] | batch_size | Size of each dataset mini batch. |
[in] | augs | Array with DatasetAugmentations to be applied to the Dataset images (and ground truth if exists) for each split. If no augmentation is required nullptr has to be passed. |
[in] | ctype | ecvl::ColorType of the Dataset images. Default is RGB. |
[in] | ctype_gt | ecvl::ColorType of the Dataset ground truth images. Default is GRAY. |
[in] | num_workers | Number of parallel threads spawned. |
[in] | queue_ratio_size | The producers-consumer queue will have a maximum size equal to \(batch\_size \times queue\_ratio\_size \times num\_workers\). |
[in] | drop_last | For each split, whether to drop the last samples that don't fit the batch size or not. The vector dimensions must match the number of splits. |
[in] | verify | If true, a list of all the images in the Dataset file which don't exist is printed with an ECVL_WARNING_MSG. |
Definition at line 332 of file support_eddl.h.
std::tuple<std::vector<Sample>, std::shared_ptr<Tensor>, std::shared_ptr<Tensor> > ecvl::DLDataset::GetBatch | ( | ) |
Pop batch_size samples from the queue and copy them into EDDL tensors.
const int ecvl::DLDataset::GetNumBatches | ( | const ecvl::any & | split = -1 | ) |
Get the number of batches of the specified split.
If no split is provided or an illegal value is provided, the number of batches of the current split is returned.
[in] | split | index, name or ecvl::SplitType representing the split from which to get the number of batches. |
|
inline |
Get the current size of the producers-consumer queue of the dataset.
Definition at line 483 of file support_eddl.h.
|
protected |
Set which are the indices of the samples managed by each thread.
[in] | split_index | index of the split to initialize. |
void ecvl::DLDataset::LoadBatch | ( | Tensor *& | images, |
Tensor *& | labels | ||
) |
Load a batch into images and labels tensor
.
[out] | images | tensor which stores the batch of images. |
[out] | labels | tensor which stores the batch of labels. |
void ecvl::DLDataset::LoadBatch | ( | Tensor *& | images | ) |
Load a batch into images tensor
. Useful for tests set when you don't have labels.
[out] | images | tensor which stores the batch of images. |
|
virtual |
Load a sample and its label, and push them to the producers-consumer queue.
[in] | elem | Sample to load and push to the queue. |
void ecvl::DLDataset::ResetAllBatches | ( | bool | shuffle = false | ) |
Reset the batch counter of each split and optionally shuffle samples indices (within each split).
[in] | shuffle | boolean which indicates whether to shuffle the samples indices or not. |
void ecvl::DLDataset::ResetBatch | ( | const ecvl::any & | split = -1 , |
bool | shuffle = false |
||
) |
Reset the batch counter and optionally shuffle samples indices of the specified split.
If no split is provided or an illegal value is provided, the current split is reset.
[in] | split_index | index, name or SplitType of the split to reset. |
[in] | shuffle | boolean which indicates whether to shuffle the split samples indices or not. |
void ecvl::DLDataset::SetAugmentations | ( | const DatasetAugmentations & | da | ) |
Set the dataset augmentations.
[in] | da | DatasetAugmentations to set. |
void ecvl::DLDataset::SetBatchSize | ( | int | bs | ) |
Set a new batch size inside the dataset.
Notice that this will not affect the EDDL network batch size, that it has to be changed too.
[in] | bs | Value to set for the batch size. |
|
inline |
Change the number of channels of the Image produced by ECVL and update the internal EDDL tensors shape accordingly. Useful for custom data loading.
[in] | n_channels | Number of channels of input Image. |
[in] | n_channels_gt | Number of channels of ground truth. |
Definition at line 528 of file support_eddl.h.
|
inlinestatic |
Set a fixed seed for the random generated values. Useful to reproduce experiments with same shuffling during training.
[in] | seed | Value of the seed for the random engine. |
Definition at line 439 of file support_eddl.h.
|
inlineprotected |
Set internal EDDL tensors shape.
Definition at line 300 of file support_eddl.h.
|
inline |
Change the number of workers.
[in] | num_workers | Number of threads/workers that will be spawned. |
Definition at line 510 of file support_eddl.h.
void ecvl::DLDataset::Start | ( | int | split_index = -1 | ) |
Spawn num_workers thread.
[in] | split_index | Index of the split to use in the GetBatch function. If not specified, current split is used. |
void ecvl::DLDataset::Stop | ( | ) |
Join all the threads.
void ecvl::DLDataset::ThreadFunc | ( | int | thread_index | ) |
Function called when the thread are spawned.
ProduceImageLabel is called for each sample under the competence of the thread.
[in] | thread_index | index of the thread. |
void ecvl::DLDataset::ToTensorPlane | ( | const std::vector< int > & | label, |
Tensor *& | tensor | ||
) |
Convert the sample labels in a one-hot encoded tensor and copy it to the batch tensor.
[in] | label | vector of the sample labels |
[out] | tensor | EDDL Tensor in which to copy the labels (dimensions: [batch_size, num_classes]) |
|
protected |
Whether the threads have already been launched or not.
Definition at line 289 of file support_eddl.h.
|
protected |
Mutex for active_ variable.
Definition at line 290 of file support_eddl.h.
|
protected |
ecvl::DatasetAugmentations to be applied to the Dataset images (and ground truth if exist) for each split.
Definition at line 283 of file support_eddl.h.
|
protected |
Size of each dataset mini batch.
Definition at line 279 of file support_eddl.h.
|
protected |
ecvl::ColorType of the Dataset images.
Definition at line 281 of file support_eddl.h.
|
protected |
ecvl::ColorType of the Dataset ground truth images.
Definition at line 282 of file support_eddl.h.
|
protected |
Number of batches already loaded for each split.
Definition at line 280 of file support_eddl.h.
int ecvl::DLDataset::n_channels_ |
Number of channels of the images.
Definition at line 317 of file support_eddl.h.
int ecvl::DLDataset::n_channels_gt_ = -1 |
Number of channels of the ground truth images.
Definition at line 318 of file support_eddl.h.
|
protected |
Number of parallel workers.
Definition at line 284 of file support_eddl.h.
|
protected |
Vector of threads representing the samples producers.
Definition at line 288 of file support_eddl.h.
|
protected |
Producers-consumer queue of the dataset.
Definition at line 285 of file support_eddl.h.
|
staticprotected |
Engine used for random number generation.
Definition at line 291 of file support_eddl.h.
std::vector<int> ecvl::DLDataset::resize_dims_ |
Dimensions (HxW) to which Dataset images must be resized.
Definition at line 319 of file support_eddl.h.
|
protected |
Each dataset split has its own vector of threads, each of which has its counters: <counter,min,max>.
Definition at line 287 of file support_eddl.h.
|
protected |
Shape of sample and label tensors.
Definition at line 286 of file support_eddl.h.