Public Member Functions | Static Public Member Functions | Public Attributes | Protected Member Functions | Protected Attributes | Static Protected Attributes | List of all members
ecvl::DLDataset Class Reference

DeepHealth Deep Learning Dataset. More...

#include <support_eddl.h>

Inheritance diagram for ecvl::DLDataset:
ecvl::Dataset

Public Member Functions

 DLDataset (const filesystem::path &filename, const int batch_size, const DatasetAugmentations &augs, const ColorType ctype=ColorType::RGB, const ColorType ctype_gt=ColorType::GRAY, const unsigned num_workers=1, const double queue_ratio_size=1., const std::vector< bool > &drop_last={}, bool verify=false)
 
void ResetBatch (const ecvl::any &split=-1, bool shuffle=false)
 Reset the batch counter and optionally shuffle samples indices of the specified split. More...
 
void ResetAllBatches (bool shuffle=false)
 Reset the batch counter of each split and optionally shuffle samples indices (within each split). More...
 
void LoadBatch (Tensor *&images, Tensor *&labels)
 Load a batch into images and labels tensor. More...
 
void LoadBatch (Tensor *&images)
 Load a batch into images tensor. Useful for tests set when you don't have labels. More...
 
void SetBatchSize (int bs)
 Set a new batch size inside the dataset. More...
 
virtual void ProduceImageLabel (DatasetAugmentations &augs, Sample &elem)
 Load a sample and its label, and push them to the producers-consumer queue. More...
 
void ThreadFunc (int thread_index)
 Function called when the thread are spawned. More...
 
std::tuple< std::vector< Sample >, std::shared_ptr< Tensor >, std::shared_ptr< Tensor > > GetBatch ()
 Pop batch_size samples from the queue and copy them into EDDL tensors. More...
 
void Start (int split_index=-1)
 Spawn num_workers thread. More...
 
void Stop ()
 Join all the threads. More...
 
auto GetQueueSize () const
 Get the current size of the producers-consumer queue of the dataset. More...
 
void SetAugmentations (const DatasetAugmentations &da)
 Set the dataset augmentations. More...
 
const int GetNumBatches (const ecvl::any &split=-1)
 Get the number of batches of the specified split. More...
 
void ToTensorPlane (const std::vector< int > &label, Tensor *&tensor)
 Convert the sample labels in a one-hot encoded tensor and copy it to the batch tensor. More...
 
void SetWorkers (const unsigned num_workers)
 Change the number of workers. More...
 
void SetNumChannels (const int n_channels, const int n_channels_gt=1)
 Change the number of channels of the Image produced by ECVL and update the internal EDDL tensors shape accordingly. Useful for custom data loading. More...
 
- Public Member Functions inherited from ecvl::Dataset
 Dataset ()
 
 Dataset (const filesystem::path &filename, bool verify=false)
 
virtual ~Dataset ()
 
std::vector< int > & GetSplit (const ecvl::any &split=-1)
 Returns the image indexes of the requested split. More...
 
void SetSplit (const ecvl::any &split)
 Set the current split. More...
 
void Dump (const filesystem::path &file_path)
 Dump the Dataset into a YAML file following the DeepHealth Dataset Format. More...
 
std::vector< std::vector< filesystem::path > > GetLocations () const
 Retrieve the list of all samples locations in the dataset file. More...
 

Static Public Member Functions

static void SetSplitSeed (unsigned seed)
 Set a fixed seed for the random generated values. Useful to reproduce experiments with same shuffling during training. More...
 

Public Attributes

int n_channels_
 Number of channels of the images. More...
 
int n_channels_gt_ = -1
 Number of channels of the ground truth images. More...
 
std::vector< int > resize_dims_
 Dimensions (HxW) to which Dataset images must be resized. More...
 
- Public Attributes inherited from ecvl::Dataset
std::string name_ = "DeepHealth dataset"
 Name of the Dataset. More...
 
std::string description_ = "This is the DeepHealth example dataset!"
 Description of the Dataset. More...
 
std::vector< std::string > classes_
 Vector with all the classes available in the Dataset. More...
 
std::vector< std::string > features_
 Vector with all the features available in the Dataset. More...
 
std::vector< Samplesamples_
 Vector containing all the Dataset samples. See Sample. More...
 
std::vector< Splitsplit_
 Splits of the Dataset. See Split. More...
 
int current_split_ = -1
 Current split from which images are loaded. More...
 
Task task_
 Task of the dataset. More...
 

Protected Member Functions

void InitTC (int split_index)
 Set which are the indices of the samples managed by each thread. More...
 
void SetTensorsShape ()
 Set internal EDDL tensors shape. More...
 
- Protected Member Functions inherited from ecvl::Dataset
std::vector< ecvl::Split >::iterator GetSplitIt (ecvl::any split)
 
const int GetSplitIndex (ecvl::any split)
 

Protected Attributes

int batch_size_
 Size of each dataset mini batch. More...
 
std::vector< int > current_batch_
 Number of batches already loaded for each split. More...
 
ColorType ctype_
 ecvl::ColorType of the Dataset images. More...
 
ColorType ctype_gt_
 ecvl::ColorType of the Dataset ground truth images. More...
 
DatasetAugmentations augs_
 ecvl::DatasetAugmentations to be applied to the Dataset images (and ground truth if exist) for each split. More...
 
unsigned num_workers_
 Number of parallel workers. More...
 
ProducersConsumerQueue queue_
 Producers-consumer queue of the dataset. More...
 
std::pair< std::vector< int >, std::vector< int > > tensors_shape_
 Shape of sample and label tensors. More...
 
std::vector< std::vector< ThreadCounters > > splits_tc_
 Each dataset split has its own vector of threads, each of which has its counters: <counter,min,max>. More...
 
std::vector< std::thread > producers_
 Vector of threads representing the samples producers. More...
 
bool active_ = false
 Whether the threads have already been launched or not. More...
 
std::mutex active_mutex_
 Mutex for active_ variable. More...
 

Static Protected Attributes

static std::default_random_engine re_
 Engine used for random number generation. More...
 

Additional Inherited Members

- Static Public Attributes inherited from ecvl::Dataset
static const std::regex url_regex_
 

Detailed Description

DeepHealth Deep Learning Dataset.

This class extends the DeepHealth Dataset with Deep Learning specific members.

Examples
example_ecvl_eddl.cpp.

Definition at line 275 of file support_eddl.h.

Constructor & Destructor Documentation

◆ DLDataset()

ecvl::DLDataset::DLDataset ( const filesystem::path &  filename,
const int  batch_size,
const DatasetAugmentations augs,
const ColorType  ctype = ColorType::RGB,
const ColorType  ctype_gt = ColorType::GRAY,
const unsigned  num_workers = 1,
const double  queue_ratio_size = 1.,
const std::vector< bool > &  drop_last = {},
bool  verify = false 
)
inline
Parameters
[in]filenamePath to the Dataset file.
[in]batch_sizeSize of each dataset mini batch.
[in]augsArray with DatasetAugmentations to be applied to the Dataset images (and ground truth if exists) for each split. If no augmentation is required nullptr has to be passed.
[in]ctypeecvl::ColorType of the Dataset images. Default is RGB.
[in]ctype_gtecvl::ColorType of the Dataset ground truth images. Default is GRAY.
[in]num_workersNumber of parallel threads spawned.
[in]queue_ratio_sizeThe producers-consumer queue will have a maximum size equal to \(batch\_size \times queue\_ratio\_size \times num\_workers\).
[in]drop_lastFor each split, whether to drop the last samples that don't fit the batch size or not. The vector dimensions must match the number of splits.
[in]verifyIf true, a list of all the images in the Dataset file which don't exist is printed with an ECVL_WARNING_MSG.

Definition at line 332 of file support_eddl.h.

Member Function Documentation

◆ GetBatch()

std::tuple<std::vector<Sample>, std::shared_ptr<Tensor>, std::shared_ptr<Tensor> > ecvl::DLDataset::GetBatch ( )

Pop batch_size samples from the queue and copy them into EDDL tensors.

Returns
tuples of Samples and EDDL Tensors, the first with the image and the second with the label.

◆ GetNumBatches()

const int ecvl::DLDataset::GetNumBatches ( const ecvl::any split = -1)

Get the number of batches of the specified split.

If no split is provided or an illegal value is provided, the number of batches of the current split is returned.

Parameters
[in]splitindex, name or ecvl::SplitType representing the split from which to get the number of batches.
Returns
number of batches of the specified split.

◆ GetQueueSize()

auto ecvl::DLDataset::GetQueueSize ( ) const
inline

Get the current size of the producers-consumer queue of the dataset.

Returns
Size of the producers-consumer queue of the dataset.

Definition at line 483 of file support_eddl.h.

◆ InitTC()

void ecvl::DLDataset::InitTC ( int  split_index)
protected

Set which are the indices of the samples managed by each thread.

Parameters
[in]split_indexindex of the split to initialize.

◆ LoadBatch() [1/2]

void ecvl::DLDataset::LoadBatch ( Tensor *&  images,
Tensor *&  labels 
)

Load a batch into images and labels tensor.

Parameters
[out]imagestensor which stores the batch of images.
[out]labelstensor which stores the batch of labels.
Examples
example_ecvl_eddl.cpp.

◆ LoadBatch() [2/2]

void ecvl::DLDataset::LoadBatch ( Tensor *&  images)

Load a batch into images tensor. Useful for tests set when you don't have labels.

Parameters
[out]imagestensor which stores the batch of images.

◆ ProduceImageLabel()

virtual void ecvl::DLDataset::ProduceImageLabel ( DatasetAugmentations augs,
Sample elem 
)
virtual

Load a sample and its label, and push them to the producers-consumer queue.

Parameters
[in]elemSample to load and push to the queue.

◆ ResetAllBatches()

void ecvl::DLDataset::ResetAllBatches ( bool  shuffle = false)

Reset the batch counter of each split and optionally shuffle samples indices (within each split).

Parameters
[in]shuffleboolean which indicates whether to shuffle the samples indices or not.

◆ ResetBatch()

void ecvl::DLDataset::ResetBatch ( const ecvl::any split = -1,
bool  shuffle = false 
)

Reset the batch counter and optionally shuffle samples indices of the specified split.

If no split is provided or an illegal value is provided, the current split is reset.

Parameters
[in]split_indexindex, name or SplitType of the split to reset.
[in]shuffleboolean which indicates whether to shuffle the split samples indices or not.

◆ SetAugmentations()

void ecvl::DLDataset::SetAugmentations ( const DatasetAugmentations da)

Set the dataset augmentations.

Parameters
[in]daDatasetAugmentations to set.

◆ SetBatchSize()

void ecvl::DLDataset::SetBatchSize ( int  bs)

Set a new batch size inside the dataset.

Notice that this will not affect the EDDL network batch size, that it has to be changed too.

Parameters
[in]bsValue to set for the batch size.

◆ SetNumChannels()

void ecvl::DLDataset::SetNumChannels ( const int  n_channels,
const int  n_channels_gt = 1 
)
inline

Change the number of channels of the Image produced by ECVL and update the internal EDDL tensors shape accordingly. Useful for custom data loading.

Parameters
[in]n_channelsNumber of channels of input Image.
[in]n_channels_gtNumber of channels of ground truth.

Definition at line 528 of file support_eddl.h.

◆ SetSplitSeed()

static void ecvl::DLDataset::SetSplitSeed ( unsigned  seed)
inlinestatic

Set a fixed seed for the random generated values. Useful to reproduce experiments with same shuffling during training.

Parameters
[in]seedValue of the seed for the random engine.

Definition at line 439 of file support_eddl.h.

◆ SetTensorsShape()

void ecvl::DLDataset::SetTensorsShape ( )
inlineprotected

Set internal EDDL tensors shape.

Definition at line 300 of file support_eddl.h.

◆ SetWorkers()

void ecvl::DLDataset::SetWorkers ( const unsigned  num_workers)
inline

Change the number of workers.

Parameters
[in]num_workersNumber of threads/workers that will be spawned.

Definition at line 510 of file support_eddl.h.

◆ Start()

void ecvl::DLDataset::Start ( int  split_index = -1)

Spawn num_workers thread.

Parameters
[in]split_indexIndex of the split to use in the GetBatch function. If not specified, current split is used.

◆ Stop()

void ecvl::DLDataset::Stop ( )

Join all the threads.

◆ ThreadFunc()

void ecvl::DLDataset::ThreadFunc ( int  thread_index)

Function called when the thread are spawned.

ProduceImageLabel is called for each sample under the competence of the thread.

Parameters
[in]thread_indexindex of the thread.

◆ ToTensorPlane()

void ecvl::DLDataset::ToTensorPlane ( const std::vector< int > &  label,
Tensor *&  tensor 
)

Convert the sample labels in a one-hot encoded tensor and copy it to the batch tensor.

Parameters
[in]labelvector of the sample labels
[out]tensorEDDL Tensor in which to copy the labels (dimensions: [batch_size, num_classes])

Member Data Documentation

◆ active_

bool ecvl::DLDataset::active_ = false
protected

Whether the threads have already been launched or not.

Definition at line 289 of file support_eddl.h.

◆ active_mutex_

std::mutex ecvl::DLDataset::active_mutex_
protected

Mutex for active_ variable.

Definition at line 290 of file support_eddl.h.

◆ augs_

DatasetAugmentations ecvl::DLDataset::augs_
protected

ecvl::DatasetAugmentations to be applied to the Dataset images (and ground truth if exist) for each split.

Definition at line 283 of file support_eddl.h.

◆ batch_size_

int ecvl::DLDataset::batch_size_
protected

Size of each dataset mini batch.

Definition at line 279 of file support_eddl.h.

◆ ctype_

ColorType ecvl::DLDataset::ctype_
protected

ecvl::ColorType of the Dataset images.

Definition at line 281 of file support_eddl.h.

◆ ctype_gt_

ColorType ecvl::DLDataset::ctype_gt_
protected

ecvl::ColorType of the Dataset ground truth images.

Definition at line 282 of file support_eddl.h.

◆ current_batch_

std::vector<int> ecvl::DLDataset::current_batch_
protected

Number of batches already loaded for each split.

Definition at line 280 of file support_eddl.h.

◆ n_channels_

int ecvl::DLDataset::n_channels_

Number of channels of the images.

Examples
example_ecvl_eddl.cpp.

Definition at line 317 of file support_eddl.h.

◆ n_channels_gt_

int ecvl::DLDataset::n_channels_gt_ = -1

Number of channels of the ground truth images.

Definition at line 318 of file support_eddl.h.

◆ num_workers_

unsigned ecvl::DLDataset::num_workers_
protected

Number of parallel workers.

Definition at line 284 of file support_eddl.h.

◆ producers_

std::vector<std::thread> ecvl::DLDataset::producers_
protected

Vector of threads representing the samples producers.

Definition at line 288 of file support_eddl.h.

◆ queue_

ProducersConsumerQueue ecvl::DLDataset::queue_
protected

Producers-consumer queue of the dataset.

Definition at line 285 of file support_eddl.h.

◆ re_

std::default_random_engine ecvl::DLDataset::re_
staticprotected

Engine used for random number generation.

Definition at line 291 of file support_eddl.h.

◆ resize_dims_

std::vector<int> ecvl::DLDataset::resize_dims_

Dimensions (HxW) to which Dataset images must be resized.

Examples
example_ecvl_eddl.cpp.

Definition at line 319 of file support_eddl.h.

◆ splits_tc_

std::vector<std::vector<ThreadCounters> > ecvl::DLDataset::splits_tc_
protected

Each dataset split has its own vector of threads, each of which has its counters: <counter,min,max>.

Definition at line 287 of file support_eddl.h.

◆ tensors_shape_

std::pair< std::vector<int>, std::vector<int> > ecvl::DLDataset::tensors_shape_
protected

Shape of sample and label tensors.

Definition at line 286 of file support_eddl.h.


The documentation for this class was generated from the following file: