Build NeurEco Compression model with the command line interface#

To build a NeurEco Compression model, run the following command in the terminal:

neurecoDNN build path/to/build/configuration/file/build.conf

The skeleton of a configuration file required to build NeurEco Compression model, here build.conf, looks as follows. Its fields should be filled according to the problem at hand.

 1{
 2 "neurecoDNN_build": {
 3     "DevSettings": {
 4         "valid_percentage": 33.33,
 5         "initial_beta_reg": 0.1,
 6         "validation_indices": "",
 7         "final_learning": true,
 8         "disconnect_inputs_if_possible": true
 9                 },
10     "input_normalization": {
11         "shift_type": "auto",
12         "scale_type": "auto",
13         "normalize_per_feature": true
14                       },
15     "output_normalization": {
16         "shift_type": "none",
17         "scale_type": "none",
18         "normalize_per_feature": false
19                        },
20     "UserSettings": {
21         "gpu_id": 0,
22         "use_gpu": false
23                  },
24     "classification": false,
25     "exc_filenames": [],
26     "output_filenames": [],
27     "validation_exc_filenames": [],
28     "validation_output_filenames": [],
29     "write_model_to": "model.ednn",
30     "write_compression_model_to": "CompModel.ednn",
31     "write_decompression_model_to": "DecompModel.ednn",
32     "minimum_compression_coefficient": 1,
33     "compress_tolerance": 0.02,
34     "build_compress": true,
35     "starting_from_checkpoint_address": "",
36     "checkpoint_address": "ckpt.checkpoint",
37     "resume": false,
38 }
39     }

Building parameters#

The available building parameters in the configuration file are described in the following table.
NeurEco building parameters in python API#

Name

type

description

valid_percentage

float, min=1.0, max=50.0, default=33.33

defines the percentage of the data that will be used as validation data. (NeurEco will automatically choose the best data for validation, to ensure that the created model will have the best fit on unseen data. The modification of this parameter can be of interest when the data set is small and we have to find a good tradeoff between the learning and the validation sets.). This parameter is ignored if validation_indices is specified or validation_exc_filenames and validation_output_filenames are passed.

validation_indices

string, default = “”

address to a csv/npy file on the disk containing the indices of the samples to be used as validation

initial_beta_reg

float, default=0.1

the initial regularization coefficient. In NeurEco, the main source of regularization is parsimony, the beta_reg coefficient ensures that in the beginning of the learning process, if many weight configurations give the same error, the smallest one are chosen. At the end of the learning process, the model is parsimonious and this coefficient is not needed and it goes to zero.

final_learning

boolean, default=True

True if this training is final, False if not. Every data sample matters, if True neureco will try to learn the validation data very carefully at the end of the learning process.

disconnect_inputs_if_possible

boolean default=True

NeurEco will always try to keep its model as small as possible without losing performance wise, so if it finds inputs that do not contribute to the overall performance, it will try to remove all links to them. Setting this field to False will prevent it from disconnecting inputs.

use_gpu

boolean, default=False

indicates whether or not an NVIDIA GPU card will be used for building the model.

gpu_id

integer, default=0

the id of the GPU card on which the user wants to run the building process (in case many GPU cards are available).

input_normalization: shift_type

string, default “auto”

This is the method used to shift the input data. For more details, see Data normalization for Tabular Regression.

input_normalization: scale_type

string, default “auto”

This is the method used to scale the input data. For more details, see Data normalization for Tabular Regression.

input_normalization: normalize_per_feature

boolean, default True

if True shifting and scaling will be performed on each feature in the inputs separately, and if False all the features will be normalized together. For example, if the data is the output of an SVD operation, the scale between the coefficients needs to be maintained, so this field should be False. On the other hand, if the inputs represent different fields with different scales (example temperatures that varies from 260 to 300 degrees, and pressure that varies from 1e5 to 1.1e5 Pascal) should not be scaled together. In this case this field should be True.. For more details, see Data normalization for Tabular Regression.

output_normalization: shift_type

string, not taken into account for Compression

This is the method used to shift the target data. For more details, see Data normalization for Tabular Regression.

output_normalization: scale_type

string, not taken into account for Compression

This is the method used to scale the target data. For more details, see Data normalization for Tabular Regression.

output_normalization: normalize_per_feature

boolean, not taken into account for Compression

if True shifting and scaling will be performed on each feature in the outputs separately, and if False all the features will be normalized together. For example, if the data is the output of an SVD operation, the scale between the coefficients needs to be maintained, so this field should be False. On the other hand, if the outputs represent different fields with different scales (example temperatures that varies from 260 to 300 degrees, and pressure that varies from 1e5 to 1.1e5 Pascal) should not be scaled together. In this case this field should be True.. For more details, see Data normalization for Tabular Regression.

exc_filenames

list of strings, mandatory, default = []

training data: contains the input data table in form the paths of all the input data files. The format of the files can be csv, npy or mat (matlab files).

output_filenames

list of strings, default = []. It is empty for Compression

training data: contains the target data in form of the paths of all the target data files. The format of the files can be csv, npy or mat (matlab files).

validation_exc_filenames

list of strings, default = [] (GUI, .conf)

validation data: contains the validation input data table in form of the paths of all the validation input data files. The format of the files can be csv, npy or mat (matlab files).

validation_output_filenames

list of strings, default = [], it is empty for Compression

validation data: contains the validation target data in form of the paths of all the validation target data files. The format of the files can be csv, npy or mat (matlab files).

write_model_to

string, default = “”

the path where the model will be saved.

checkpoint_address

string, default = “”

the path where the checkpoint model will be saved. The checkpoint model is used for resuming the build of a model, or for choosing an intermediate network with less topological optimization steps.

resume

boolean, default=False

if True, resume the build from its own checkpoint in checkpoint_address

starting_from_checkpoint_address

string, default = “”

the path where the checkpoint model is loaded from. This option is checked if the user wants to continue the build of a model from an existing checkpoint, after changing few settings (additional data for example). To use this option in .conf file, make sure that the option resume has its default value False.

start_build_from_model_number

int, default=-1

When resuming a build, specifies which intermediate model in the checkpoint will be used as starting point. when set to -1, NeurEco will choose the last model created as starting point.

freeze_structure

boolean default=False

When resuming a build, NeurEco will only change the weights (not the network architecture) if this variable is set to True.

build_compress

boolean, has to be set to True for Compression

if True, the model will perform a nonlinear compression.

minimum_compression_coefficients

int default=1

checked only if build_compress = True, specifies the minimum number of nonlinear coefficients.

compress_tolerance

float eg 0.01, 0.001…, default=0.02

checked only if build_compress = True, specifies the tolerance of the compressor: the maximum error accepted when performing a compression and a decompression on the validation data.

write_compression_model_to

string, default = “”

checked only if build_compress = True, this is the path where the compression model will be saved.

write_decompression_model_to

string, default = “”

checked only if build_compress = True, this is the path where the decompression model will be saved.

compress_decompress_size_ratio

float default=1.0

checked only if build_compress = True, specifies the ratio between the sizes of the compression block and the decompression block. This number is always bigger than 0 and smaller or equal to 1. Note that this ratio will be respected in the limit of what NeurEco finds possible.

classification

boolean, has to be set to False for Compression

specifies if the problem is a classification problem.

Data normalization for Tabular Compression#

A normalization operation for NeurEco is a combination of a \(shift\) and a \(scale\), so that:

\[x_{normalized} = \frac{x-shift}{scale}\]

Allowed shift methods for NeurEco and their corresponding shifted values are listed in the table below:

NeurEco Tabular shifting methods#

Name

shift value

none

\[0\]

min

\[min(x)\]

min_centered

\[0.5 * (min(x) + max(x))\]

mean

\[mean(x)\]

Allowed scale methods for NeurEco Tabular and their corresponding scaled values are listed in the table below:

NeurEco Tabular scaling methods#

Name

scale value

none

\[1\]

max

\[max(x) - shift\]

max_centered

\[0.5 * (max(x) - min(x))\]

std

\[std(x)\]

Normalization with auto options:

  • shift is mean and scale is max if the value of mean is far from 0,

  • shift is none and scale is max if the calculated value of mean is close to 0

If the normalization is performed by feature, and the auto options are chosen, the normalization is performed by group of features. These groups are created based on the values of mean and std.