DataFusionTools.machine_learning package#

Submodules#

DataFusionTools.machine_learning.baseclass module#

class datafusiontools.machine_learning.baseclass.BaseClassMachineLearning(classification: bool, training_data: Union[List, NoneType, <built-in function array>] = None, target: Union[List, NoneType, <built-in function array>] = None, target_label: Union[List, NoneType, <built-in function array>] = None, prediction: Union[List, NoneType, <built-in function array>] = None)[source]#

Bases: datafusiontools._core.base_class.BaseClass

classification: bool#
plot_fitted_line(validation_target: numpy.ndarray, output_folder: pathlib.Path = PosixPath('.')) None[source]#

Plots fitted line of prediction

Parameters

output_folder – location where the plot is saved

abstract predict()[source]#
prediction: Union[List, None, numpy.array] = None#
target: Union[List, None, numpy.array] = None#
target_label: Union[List, None, numpy.array] = None#
train(data: numpy.ndarray, target: numpy.ndarray) None[source]#

Trains the NN with the data and multiple class target values based on the model selected (classification or regression).

Parameters
  • data – data features

  • target – multiple class target value

abstract train_classification()[source]#
abstract train_regression()[source]#
training_data: Union[List, None, numpy.array] = None#

DataFusionTools.machine_learning.convolutional module#

class datafusiontools.machine_learning.convolutional.Convolutional(classification: bool, training_data: Union[List, NoneType, <built-in function array>] = None, target: Union[List, NoneType, <built-in function array>] = None, target_label: Union[List, NoneType, <built-in function array>] = None, prediction: Union[List, NoneType, numpy.ndarray] = None, history: Union[List, NoneType, numpy.ndarray] = None, encoder_features: Union[List, NoneType, numpy.ndarray] = None, encoder_target: Union[List, NoneType, numpy.ndarray] = None, model: Union[List, NoneType, numpy.ndarray] = None, kl: Union[List, NoneType, numpy.ndarray] = None, weights: Optional[List] = None, nb_hidden_layers: int = 1, activation_fct: datafusiontools.machine_learning.enumeration_classes.ActivationFunctions = <ActivationFunctions.sigmoid: 'sigmoid'>, optimizer: datafusiontools.machine_learning.enumeration_classes.Optimizer = <Optimizer.Adam: 'Adam'>, loss: datafusiontools.machine_learning.enumeration_classes.LossFunctions = <LossFunctions.mean_absolute_error: 'mean_absolute_error'>, epochs: int = 500, batch: int = 32, regularisation: int = 0, feature_names: Optional[List] = None, validation_targets: Union[List, NoneType, numpy.ndarray] = None, validation_features: Union[List, NoneType, numpy.ndarray] = None, probabilistic: bool = False, nb_filters: List[int] = <factory>, length_filters: List[int] = <factory>, n_dim: int = 1, strides: int = 1)[source]#

Bases: datafusiontools.machine_learning.neural_networks.NeuralNetwork

length_filters: List[int]#
n_dim: int = 1#
nb_filters: List[int]#
plot_confusion(validation: numpy.ndarray, output_folder: pathlib.Path = PosixPath('.')) None[source]#

Plots the confusion matrix for the validation dataset

Parameters
  • validation – Validation data at the predicted points

  • output_folder – location where the plot is saved

strides: int = 1#
train_classification() None[source]#
train_regression() None[source]#

DataFusionTools.machine_learning.enumeration_classes module#

class datafusiontools.machine_learning.enumeration_classes.ActivationFunctionSVM(value)[source]#

Bases: enum.Enum

An enumeration.

linear = 'linear'#
poly = 'poly'#
rbf = 'rbf'#
sigmoid = 'sigmoid'#
class datafusiontools.machine_learning.enumeration_classes.ActivationFunctions(value)[source]#

Bases: enum.Enum

An enumeration.

elu = 'elu'#
relu = 'relu'#
selu = 'selu'#
sigmoid = 'sigmoid'#
softmax = 'softmax'#
softplus = 'softplus'#
softsign = 'softsign'#
tanh = 'tanh'#
class datafusiontools.machine_learning.enumeration_classes.GammaList(value)[source]#

Bases: enum.Enum

An enumeration.

auto = 'auto'#
scale = 'scale'#
class datafusiontools.machine_learning.enumeration_classes.LossFunctions(value)[source]#

Bases: enum.Enum

An enumeration.

binary_crossentropy = 'binary_crossentropy'#
categorical_crossentropy = 'categorical_crossentropy'#
mean_absolute_error = 'mean_absolute_error'#
mean_squared_error = 'mean_squared_error'#
mean_squared_logarithmic_error = 'mean_squared_logarithmic_error'#
sparse_categorical_crossentropy = 'sparse_categorical_crossentropy'#
class datafusiontools.machine_learning.enumeration_classes.Optimizer(value)[source]#

Bases: enum.Enum

An enumeration.

Adadelta = 'Adadelta'#
Adagrad = 'Adagrad'#
Adam = 'Adam'#
Adamax = 'Adamax'#
Ftrl = 'Ftrl'#
Nadam = 'Nadam'#
RMSprop = 'RMSprop'#
SGD = 'SGD'#
class datafusiontools.machine_learning.enumeration_classes.WeightList(value)[source]#

Bases: enum.Enum

An enumeration.

Auto = 'Auto'#
NONE = None#

DataFusionTools.machine_learning.mpl module#

class datafusiontools.machine_learning.mpl.MPL(classification: bool, training_data: Union[List, None, numpy.array] = None, target: Union[List, None, numpy.array] = None, target_label: Union[List, None, numpy.array] = None, prediction: Union[List, None, numpy.ndarray] = None, history: Union[List, None, numpy.ndarray] = None, encoder_features: Union[List, None, numpy.ndarray] = None, encoder_target: Union[List, None, numpy.ndarray] = None, model: Union[List, None, numpy.ndarray] = None, kl: Union[List, None, numpy.ndarray] = None, weights: Optional[List] = None, nb_hidden_layers: int = 1, activation_fct: datafusiontools.machine_learning.enumeration_classes.ActivationFunctions = ActivationFunctions.sigmoid, optimizer: datafusiontools.machine_learning.enumeration_classes.Optimizer = Optimizer.Adam, loss: datafusiontools.machine_learning.enumeration_classes.LossFunctions = LossFunctions.mean_absolute_error, epochs: int = 500, batch: int = 32, regularisation: int = 0, feature_names: Optional[List] = None, validation_targets: Union[List, None, numpy.ndarray] = None, validation_features: Union[List, None, numpy.ndarray] = None, probabilistic: bool = False, nb_neurons: Union[List, None, numpy.array] = None)[source]#

Bases: datafusiontools.machine_learning.neural_networks.NeuralNetwork

Class of the NN object and defines the NN settings.

Parameters

nb_neurons – Number of neurons in each hidden layer

nb_neurons: Union[List, None, numpy.array] = None#
plot_confusion(validation: numpy.ndarray, output_folder: pathlib.Path = PosixPath('.')) None[source]#

Plots the confusion matrix for the validation dataset

Parameters
  • validation – Validation data at the predicted points

  • output_folder – location where the plot is saved

train_classification() None[source]#

Method that trains a NN classification model.

train_regression() None[source]#

Method that trains a NN regression model.

DataFusionTools.machine_learning.neural_networks module#

class datafusiontools.machine_learning.neural_networks.NeuralNetwork(classification: bool, training_data: Union[List, None, numpy.array] = None, target: Union[List, None, numpy.array] = None, target_label: Union[List, None, numpy.array] = None, prediction: Union[List, None, numpy.ndarray] = None, history: Union[List, None, numpy.ndarray] = None, encoder_features: Union[List, None, numpy.ndarray] = None, encoder_target: Union[List, None, numpy.ndarray] = None, model: Union[List, None, numpy.ndarray] = None, kl: Union[List, None, numpy.ndarray] = None, weights: Optional[List] = None, nb_hidden_layers: int = 1, activation_fct: datafusiontools.machine_learning.enumeration_classes.ActivationFunctions = ActivationFunctions.sigmoid, optimizer: datafusiontools.machine_learning.enumeration_classes.Optimizer = Optimizer.Adam, loss: datafusiontools.machine_learning.enumeration_classes.LossFunctions = LossFunctions.mean_absolute_error, epochs: int = 500, batch: int = 32, regularisation: int = 0, feature_names: Optional[List] = None, validation_targets: Union[List, None, numpy.ndarray] = None, validation_features: Union[List, None, numpy.ndarray] = None, probabilistic: bool = False)[source]#

Bases: datafusiontools.machine_learning.baseclass.BaseClassMachineLearning

Initialises the NN object and defines the NN settings. NN defined for regression problems.

Parameters
  • nb_hidden_layers – number of hidden layers

  • activation_fct – (optional: default ‘sigmoid’) Type of activation function

  • optimizer – (optional: default ‘Adam’) Type of minimisation

  • loss – (optional: default ‘binary_crossentropy’) Type of activation loss function

  • epochs – (optional: default 500) Number of epochs

  • batch – (optional: default 32) Number of batch in each epoch realisation

  • regularisation – (optional: default 0) Factor for regularisation

  • weights – (optional: default None) Weights for the categories

activation_fct: datafusiontools.machine_learning.enumeration_classes.ActivationFunctions = 'sigmoid'#
batch: int = 32#
compile_model(metrics: List[str], loss=None, optimizer=None)[source]#

Method that compiles NN with particular metrics

Parameters

metrics – List of metrics to be used during training

encoder_features: Union[List, None, numpy.ndarray] = None#
encoder_target: Union[List, None, numpy.ndarray] = None#
epochs: int = 500#
feature_names: Optional[List] = None#
history: Union[List, None, numpy.ndarray] = None#
kl: Union[List, None, numpy.ndarray] = None#
kl_div(y_true, y_pred)[source]#

Compute KL divergence

Parameters
  • y_true – true value

  • y_pred – predicted value with NN

loss: datafusiontools.machine_learning.enumeration_classes.LossFunctions = 'mean_absolute_error'#
model: Union[List, None, numpy.ndarray] = None#
nb_hidden_layers: int = 1#
optimizer: datafusiontools.machine_learning.enumeration_classes.Optimizer = 'Adam'#
plot_cost_function(output_folder: pathlib.Path = PosixPath('.')) None[source]#

Plots the cost function

Parameters

output_folder – location where the plot is saved

plot_feature_importance(input_data: numpy.array, output_folder: pathlib.Path = PosixPath('.'))[source]#

Function that plots the feature importance charts. This is done be using the shap package which uses Shapley values to explain machine learning models. For more information look in shap’s package website

Parameters
  • output_folder – location where the plot is saved

  • input_data – data to be used for the determination of the Shapley values.

predict(data: numpy.ndarray) None[source]#

Predict the values at the data points with trained NN

Parameters

data – dataset with features for prediction

prediction: Union[List, None, numpy.ndarray] = None#
probabilistic: bool = False#
regularisation: int = 0#
rescale_training_data()[source]#

Rescaling process of training data

validation_features: Union[List, None, numpy.ndarray] = None#
validation_targets: Union[List, None, numpy.ndarray] = None#
weights: Optional[List] = None#

DataFusionTools.machine_learning.random_forest module#

class datafusiontools.machine_learning.random_forest.RandomForest(classification: bool, training_data: Union[List, None, numpy.array] = None, target: Union[List, None, numpy.array] = None, target_label: Union[List, None, numpy.array] = None, prediction: Union[List, None, numpy.array] = None, accuracy: Union[List, None, numpy.ndarray] = None, encoder: Union[List, None, numpy.ndarray] = None, model: Union[List, None, numpy.ndarray] = None, n_estimator: numpy.ndarray = array([1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28., 29., 30.]), max_depth: numpy.ndarray = array([1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20.]), feature_names: Optional[List] = None)[source]#

Bases: datafusiontools.machine_learning.baseclass.BaseClassMachineLearning

RandomForest(classification: bool, training_data: Union[List, NoneType, <built-in function array>] = None, target: Union[List, NoneType, <built-in function array>] = None, target_label: Union[List, NoneType, <built-in function array>] = None, prediction: Union[List, NoneType, <built-in function array>] = None, accuracy: Union[List, NoneType, numpy.ndarray] = None, encoder: Union[List, NoneType, numpy.ndarray] = None, model: Union[List, NoneType, numpy.ndarray] = None, n_estimator: numpy.ndarray = array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28., 29., 30.]), max_depth: numpy.ndarray = array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20.]), feature_names: Optional[List] = None)

accuracy: Union[List, None, numpy.ndarray] = None#
check_score()[source]#

Computes score of training

encoder: Union[List, None, numpy.ndarray] = None#
feature_names: Optional[List] = None#
max_depth: numpy.ndarray = array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12., 13.,        14., 15., 16., 17., 18., 19., 20.])#
model: Union[List, None, numpy.ndarray] = None#
n_estimator: numpy.ndarray = array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12., 13.,        14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26.,        27., 28., 29., 30.])#
plot_confusion(validation: numpy.ndarray, output_folder: pathlib.Path = PosixPath('.')) None[source]#

Plots the confusion matrix for the validation dataset

Parameters
  • validation – Validation data at the predicted points

  • output_folder – location where the plot is saved

plot_feature_importance(input_data: numpy.array, output_folder: pathlib.Path = PosixPath('.'))[source]#

Function that plots the feature importance charts. This is done be using the shap package which uses Shapley values to explain machine learning models. For more information look in shap’s package website

Parameters
  • output_folder – location where the plot is saved

  • input_data – data to be used for the determination of the Shapley values.

plot_feature_importance_with_interaction_values(input_data: numpy.array, output_folder: pathlib.Path = PosixPath('.'))[source]#

Function that plots the feature importance charts. This is done be using the shap package which uses Shapley values to explain machine learning models. For more information look in shap’s package website

Parameters
  • output_folder – location where the plot is saved

  • input_data – data to be used for the determination of the Shapley values.

predict(data: numpy.ndarray) None[source]#

Predict the values at the data points

Parameters

data – dataset with features for prediction

train_classification()[source]#
train_regression()[source]#

DataFusionTools.machine_learning.support_vector_machine module#

class datafusiontools.machine_learning.support_vector_machine.SVM(classification: bool, training_data: Union[List, None, numpy.array] = None, target: Union[List, None, numpy.array] = None, target_label: Union[List, None, numpy.array] = None, prediction: Union[List, None, numpy.array] = None, kernel: datafusiontools.machine_learning.enumeration_classes.ActivationFunctionSVM = ActivationFunctionSVM.rbf, gamma: datafusiontools.machine_learning.enumeration_classes.GammaList = GammaList.scale)[source]#

Bases: datafusiontools.machine_learning.baseclass.BaseClassMachineLearning

Class of the Support Vector Machine.

Parameters
  • kernel – Kernel type to be used in the algorithm

  • gamma – Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’

check_score() None[source]#

Computes score of training

gamma: datafusiontools.machine_learning.enumeration_classes.GammaList = 'scale'#
kernel: datafusiontools.machine_learning.enumeration_classes.ActivationFunctionSVM = 'rbf'#
plot_confusion(validation: numpy.ndarray, output_folder: str = './') None[source]#

Plots the confusion matrix for the validation dataset

Parameters
  • validation – Validation data at the predicted points

  • output_folder – location where the plot is saved

predict(data: numpy.ndarray) None[source]#

Predict the values at the data points

Parameters

data – dataset with features for prediction

train_classification() None[source]#
train_regression() None[source]#
train_svm() None[source]#

Trains the SVM with the data and multiple class target values

DataFusionTools.machine_learning.bayesian_neural_network module#

class datafusiontools.machine_learning.bayesian_neural_network.BayesianNeuralNetwork(classification: bool, training_data: Union[List, None, numpy.array] = None, target: Union[List, None, numpy.array] = None, target_label: Union[List, None, numpy.array] = None, prediction: Union[List, None, numpy.ndarray] = None, history: Union[List, None, numpy.ndarray] = None, encoder_features: Union[List, None, numpy.ndarray] = None, encoder_target: Union[List, None, numpy.ndarray] = None, model: Union[List, None, numpy.ndarray] = None, kl: Union[List, None, numpy.ndarray] = None, weights: Optional[List] = None, nb_hidden_layers: int = 1, activation_fct: datafusiontools.machine_learning.enumeration_classes.ActivationFunctions = ActivationFunctions.sigmoid, optimizer: datafusiontools.machine_learning.enumeration_classes.Optimizer = Optimizer.Adam, loss: datafusiontools.machine_learning.enumeration_classes.LossFunctions = LossFunctions.mean_absolute_error, epochs: int = 500, batch: int = 32, regularisation: int = 0, feature_names: Optional[List] = None, validation_targets: Union[List, None, numpy.ndarray] = None, validation_features: Union[List, None, numpy.ndarray] = None, probabilistic: bool = False, nb_neurons: Union[List, None, numpy.array] = None, learning_rate: float = 0.0001)[source]#

Bases: datafusiontools.machine_learning.neural_networks.NeuralNetwork

Class of the Bayesian Neural Network model. This default model is based on the tutorial of the tensorflow probability package found in https://keras.io/examples/keras_recipes/bayesian_neural_networks/

Parameters
  • nb_neurons – Number of neurons in each hidden layer

  • learning_rate – The learning rate of the default optimization technique

learning_rate: float = 0.0001#
nb_neurons: Union[List, None, numpy.array] = None#
static negative_loglikelihood(targets, estimated_distribution)[source]#

Since the output of the model is a distribution, rather than a point estimate, we use the negative loglikelihood as our loss function to compute how likely to see the true data (targets) from the estimated distribution produced by the model.

plot_confidence_band(targets: numpy.array, x_axis: Optional[numpy.array] = None, output_folder: pathlib.Path = PosixPath('.')) None[source]#

Plots fitted line of prediction

Parameters

output_folder – location where the plot is saved

static posterior(kernel_size, bias_size, dtype=None)[source]#

Define variational posterior weight distribution as multivariate Gaussian. Note that the learnable parameters for this distribution are the means, variances, and covariances.

static prior(kernel_size, bias_size, dtype=None)[source]#

Define the prior weight distribution as Normal of mean=0 and stddev=1. Note that, in this example, the we prior distribution is not trainable, as we fix its parameters.

train_classification() None[source]#

Method that trains a BNN classification model.

train_regression() None[source]#

Method that trains a BNN regression model.

Module contents#