DataFusionTools.machine_learning package
Contents
DataFusionTools.machine_learning package#
Submodules#
DataFusionTools.machine_learning.baseclass module#
- class datafusiontools.machine_learning.baseclass.BaseClassMachineLearning(classification: bool, training_data: Union[List, NoneType, <built-in function array>] = None, target: Union[List, NoneType, <built-in function array>] = None, target_label: Union[List, NoneType, <built-in function array>] = None, prediction: Union[List, NoneType, <built-in function array>] = None)[source]#
Bases:
datafusiontools._core.base_class.BaseClass
- classification: bool#
- plot_fitted_line(validation_target: numpy.ndarray, output_folder: pathlib.Path = PosixPath('.')) None [source]#
Plots fitted line of prediction
- Parameters
output_folder – location where the plot is saved
- prediction: Union[List, None, numpy.array] = None#
- target: Union[List, None, numpy.array] = None#
- target_label: Union[List, None, numpy.array] = None#
- train(data: numpy.ndarray, target: numpy.ndarray) None [source]#
Trains the NN with the data and multiple class target values based on the model selected (classification or regression).
- Parameters
data – data features
target – multiple class target value
- training_data: Union[List, None, numpy.array] = None#
DataFusionTools.machine_learning.convolutional module#
- class datafusiontools.machine_learning.convolutional.Convolutional(classification: bool, training_data: Union[List, NoneType, <built-in function array>] = None, target: Union[List, NoneType, <built-in function array>] = None, target_label: Union[List, NoneType, <built-in function array>] = None, prediction: Union[List, NoneType, numpy.ndarray] = None, history: Union[List, NoneType, numpy.ndarray] = None, encoder_features: Union[List, NoneType, numpy.ndarray] = None, encoder_target: Union[List, NoneType, numpy.ndarray] = None, model: Union[List, NoneType, numpy.ndarray] = None, kl: Union[List, NoneType, numpy.ndarray] = None, weights: Optional[List] = None, nb_hidden_layers: int = 1, activation_fct: datafusiontools.machine_learning.enumeration_classes.ActivationFunctions = <ActivationFunctions.sigmoid: 'sigmoid'>, optimizer: datafusiontools.machine_learning.enumeration_classes.Optimizer = <Optimizer.Adam: 'Adam'>, loss: datafusiontools.machine_learning.enumeration_classes.LossFunctions = <LossFunctions.mean_absolute_error: 'mean_absolute_error'>, epochs: int = 500, batch: int = 32, regularisation: int = 0, feature_names: Optional[List] = None, validation_targets: Union[List, NoneType, numpy.ndarray] = None, validation_features: Union[List, NoneType, numpy.ndarray] = None, probabilistic: bool = False, nb_filters: List[int] = <factory>, length_filters: List[int] = <factory>, n_dim: int = 1, strides: int = 1)[source]#
Bases:
datafusiontools.machine_learning.neural_networks.NeuralNetwork
- length_filters: List[int]#
- n_dim: int = 1#
- nb_filters: List[int]#
- plot_confusion(validation: numpy.ndarray, output_folder: pathlib.Path = PosixPath('.')) None [source]#
Plots the confusion matrix for the validation dataset
- Parameters
validation – Validation data at the predicted points
output_folder – location where the plot is saved
- strides: int = 1#
DataFusionTools.machine_learning.enumeration_classes module#
- class datafusiontools.machine_learning.enumeration_classes.ActivationFunctionSVM(value)[source]#
Bases:
enum.Enum
An enumeration.
- linear = 'linear'#
- poly = 'poly'#
- rbf = 'rbf'#
- sigmoid = 'sigmoid'#
- class datafusiontools.machine_learning.enumeration_classes.ActivationFunctions(value)[source]#
Bases:
enum.Enum
An enumeration.
- elu = 'elu'#
- relu = 'relu'#
- selu = 'selu'#
- sigmoid = 'sigmoid'#
- softmax = 'softmax'#
- softplus = 'softplus'#
- softsign = 'softsign'#
- tanh = 'tanh'#
- class datafusiontools.machine_learning.enumeration_classes.GammaList(value)[source]#
Bases:
enum.Enum
An enumeration.
- auto = 'auto'#
- scale = 'scale'#
- class datafusiontools.machine_learning.enumeration_classes.LossFunctions(value)[source]#
Bases:
enum.Enum
An enumeration.
- binary_crossentropy = 'binary_crossentropy'#
- categorical_crossentropy = 'categorical_crossentropy'#
- mean_absolute_error = 'mean_absolute_error'#
- mean_squared_error = 'mean_squared_error'#
- mean_squared_logarithmic_error = 'mean_squared_logarithmic_error'#
- sparse_categorical_crossentropy = 'sparse_categorical_crossentropy'#
DataFusionTools.machine_learning.mpl module#
- class datafusiontools.machine_learning.mpl.MPL(classification: bool, training_data: Union[List, None, numpy.array] = None, target: Union[List, None, numpy.array] = None, target_label: Union[List, None, numpy.array] = None, prediction: Union[List, None, numpy.ndarray] = None, history: Union[List, None, numpy.ndarray] = None, encoder_features: Union[List, None, numpy.ndarray] = None, encoder_target: Union[List, None, numpy.ndarray] = None, model: Union[List, None, numpy.ndarray] = None, kl: Union[List, None, numpy.ndarray] = None, weights: Optional[List] = None, nb_hidden_layers: int = 1, activation_fct: datafusiontools.machine_learning.enumeration_classes.ActivationFunctions = ActivationFunctions.sigmoid, optimizer: datafusiontools.machine_learning.enumeration_classes.Optimizer = Optimizer.Adam, loss: datafusiontools.machine_learning.enumeration_classes.LossFunctions = LossFunctions.mean_absolute_error, epochs: int = 500, batch: int = 32, regularisation: int = 0, feature_names: Optional[List] = None, validation_targets: Union[List, None, numpy.ndarray] = None, validation_features: Union[List, None, numpy.ndarray] = None, probabilistic: bool = False, nb_neurons: Union[List, None, numpy.array] = None)[source]#
Bases:
datafusiontools.machine_learning.neural_networks.NeuralNetwork
Class of the NN object and defines the NN settings.
- Parameters
nb_neurons – Number of neurons in each hidden layer
- nb_neurons: Union[List, None, numpy.array] = None#
DataFusionTools.machine_learning.neural_networks module#
- class datafusiontools.machine_learning.neural_networks.NeuralNetwork(classification: bool, training_data: Union[List, None, numpy.array] = None, target: Union[List, None, numpy.array] = None, target_label: Union[List, None, numpy.array] = None, prediction: Union[List, None, numpy.ndarray] = None, history: Union[List, None, numpy.ndarray] = None, encoder_features: Union[List, None, numpy.ndarray] = None, encoder_target: Union[List, None, numpy.ndarray] = None, model: Union[List, None, numpy.ndarray] = None, kl: Union[List, None, numpy.ndarray] = None, weights: Optional[List] = None, nb_hidden_layers: int = 1, activation_fct: datafusiontools.machine_learning.enumeration_classes.ActivationFunctions = ActivationFunctions.sigmoid, optimizer: datafusiontools.machine_learning.enumeration_classes.Optimizer = Optimizer.Adam, loss: datafusiontools.machine_learning.enumeration_classes.LossFunctions = LossFunctions.mean_absolute_error, epochs: int = 500, batch: int = 32, regularisation: int = 0, feature_names: Optional[List] = None, validation_targets: Union[List, None, numpy.ndarray] = None, validation_features: Union[List, None, numpy.ndarray] = None, probabilistic: bool = False)[source]#
Bases:
datafusiontools.machine_learning.baseclass.BaseClassMachineLearning
Initialises the NN object and defines the NN settings. NN defined for regression problems.
- Parameters
nb_hidden_layers – number of hidden layers
activation_fct – (optional: default ‘sigmoid’) Type of activation function
optimizer – (optional: default ‘Adam’) Type of minimisation
loss – (optional: default ‘binary_crossentropy’) Type of activation loss function
epochs – (optional: default 500) Number of epochs
batch – (optional: default 32) Number of batch in each epoch realisation
regularisation – (optional: default 0) Factor for regularisation
weights – (optional: default None) Weights for the categories
- activation_fct: datafusiontools.machine_learning.enumeration_classes.ActivationFunctions = 'sigmoid'#
- batch: int = 32#
- compile_model(metrics: List[str], loss=None, optimizer=None)[source]#
Method that compiles NN with particular metrics
- Parameters
metrics – List of metrics to be used during training
- encoder_features: Union[List, None, numpy.ndarray] = None#
- encoder_target: Union[List, None, numpy.ndarray] = None#
- epochs: int = 500#
- feature_names: Optional[List] = None#
- history: Union[List, None, numpy.ndarray] = None#
- kl: Union[List, None, numpy.ndarray] = None#
- kl_div(y_true, y_pred)[source]#
Compute KL divergence
- Parameters
y_true – true value
y_pred – predicted value with NN
- loss: datafusiontools.machine_learning.enumeration_classes.LossFunctions = 'mean_absolute_error'#
- model: Union[List, None, numpy.ndarray] = None#
- optimizer: datafusiontools.machine_learning.enumeration_classes.Optimizer = 'Adam'#
- plot_cost_function(output_folder: pathlib.Path = PosixPath('.')) None [source]#
Plots the cost function
- Parameters
output_folder – location where the plot is saved
- plot_feature_importance(input_data: numpy.array, output_folder: pathlib.Path = PosixPath('.'))[source]#
Function that plots the feature importance charts. This is done be using the shap package which uses Shapley values to explain machine learning models. For more information look in shap’s package website
- Parameters
output_folder – location where the plot is saved
input_data – data to be used for the determination of the Shapley values.
- predict(data: numpy.ndarray) None [source]#
Predict the values at the data points with trained NN
- Parameters
data – dataset with features for prediction
- prediction: Union[List, None, numpy.ndarray] = None#
- probabilistic: bool = False#
- regularisation: int = 0#
- validation_features: Union[List, None, numpy.ndarray] = None#
- validation_targets: Union[List, None, numpy.ndarray] = None#
- weights: Optional[List] = None#
DataFusionTools.machine_learning.random_forest module#
- class datafusiontools.machine_learning.random_forest.RandomForest(classification: bool, training_data: Union[List, None, numpy.array] = None, target: Union[List, None, numpy.array] = None, target_label: Union[List, None, numpy.array] = None, prediction: Union[List, None, numpy.array] = None, accuracy: Union[List, None, numpy.ndarray] = None, encoder: Union[List, None, numpy.ndarray] = None, model: Union[List, None, numpy.ndarray] = None, n_estimator: numpy.ndarray = array([1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28., 29., 30.]), max_depth: numpy.ndarray = array([1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20.]), feature_names: Optional[List] = None)[source]#
Bases:
datafusiontools.machine_learning.baseclass.BaseClassMachineLearning
RandomForest(classification: bool, training_data: Union[List, NoneType, <built-in function array>] = None, target: Union[List, NoneType, <built-in function array>] = None, target_label: Union[List, NoneType, <built-in function array>] = None, prediction: Union[List, NoneType, <built-in function array>] = None, accuracy: Union[List, NoneType, numpy.ndarray] = None, encoder: Union[List, NoneType, numpy.ndarray] = None, model: Union[List, NoneType, numpy.ndarray] = None, n_estimator: numpy.ndarray = array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28., 29., 30.]), max_depth: numpy.ndarray = array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20.]), feature_names: Optional[List] = None)
- accuracy: Union[List, None, numpy.ndarray] = None#
- encoder: Union[List, None, numpy.ndarray] = None#
- feature_names: Optional[List] = None#
- max_depth: numpy.ndarray = array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20.])#
- model: Union[List, None, numpy.ndarray] = None#
- n_estimator: numpy.ndarray = array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28., 29., 30.])#
- plot_confusion(validation: numpy.ndarray, output_folder: pathlib.Path = PosixPath('.')) None [source]#
Plots the confusion matrix for the validation dataset
- Parameters
validation – Validation data at the predicted points
output_folder – location where the plot is saved
- plot_feature_importance(input_data: numpy.array, output_folder: pathlib.Path = PosixPath('.'))[source]#
Function that plots the feature importance charts. This is done be using the shap package which uses Shapley values to explain machine learning models. For more information look in shap’s package website
- Parameters
output_folder – location where the plot is saved
input_data – data to be used for the determination of the Shapley values.
- plot_feature_importance_with_interaction_values(input_data: numpy.array, output_folder: pathlib.Path = PosixPath('.'))[source]#
Function that plots the feature importance charts. This is done be using the shap package which uses Shapley values to explain machine learning models. For more information look in shap’s package website
- Parameters
output_folder – location where the plot is saved
input_data – data to be used for the determination of the Shapley values.
DataFusionTools.machine_learning.support_vector_machine module#
- class datafusiontools.machine_learning.support_vector_machine.SVM(classification: bool, training_data: Union[List, None, numpy.array] = None, target: Union[List, None, numpy.array] = None, target_label: Union[List, None, numpy.array] = None, prediction: Union[List, None, numpy.array] = None, kernel: datafusiontools.machine_learning.enumeration_classes.ActivationFunctionSVM = ActivationFunctionSVM.rbf, gamma: datafusiontools.machine_learning.enumeration_classes.GammaList = GammaList.scale)[source]#
Bases:
datafusiontools.machine_learning.baseclass.BaseClassMachineLearning
Class of the Support Vector Machine.
- Parameters
kernel – Kernel type to be used in the algorithm
gamma – Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’
- gamma: datafusiontools.machine_learning.enumeration_classes.GammaList = 'scale'#
- plot_confusion(validation: numpy.ndarray, output_folder: str = './') None [source]#
Plots the confusion matrix for the validation dataset
- Parameters
validation – Validation data at the predicted points
output_folder – location where the plot is saved
DataFusionTools.machine_learning.bayesian_neural_network module#
- class datafusiontools.machine_learning.bayesian_neural_network.BayesianNeuralNetwork(classification: bool, training_data: Union[List, None, numpy.array] = None, target: Union[List, None, numpy.array] = None, target_label: Union[List, None, numpy.array] = None, prediction: Union[List, None, numpy.ndarray] = None, history: Union[List, None, numpy.ndarray] = None, encoder_features: Union[List, None, numpy.ndarray] = None, encoder_target: Union[List, None, numpy.ndarray] = None, model: Union[List, None, numpy.ndarray] = None, kl: Union[List, None, numpy.ndarray] = None, weights: Optional[List] = None, nb_hidden_layers: int = 1, activation_fct: datafusiontools.machine_learning.enumeration_classes.ActivationFunctions = ActivationFunctions.sigmoid, optimizer: datafusiontools.machine_learning.enumeration_classes.Optimizer = Optimizer.Adam, loss: datafusiontools.machine_learning.enumeration_classes.LossFunctions = LossFunctions.mean_absolute_error, epochs: int = 500, batch: int = 32, regularisation: int = 0, feature_names: Optional[List] = None, validation_targets: Union[List, None, numpy.ndarray] = None, validation_features: Union[List, None, numpy.ndarray] = None, probabilistic: bool = False, nb_neurons: Union[List, None, numpy.array] = None, learning_rate: float = 0.0001)[source]#
Bases:
datafusiontools.machine_learning.neural_networks.NeuralNetwork
Class of the Bayesian Neural Network model. This default model is based on the tutorial of the tensorflow probability package found in https://keras.io/examples/keras_recipes/bayesian_neural_networks/
- Parameters
nb_neurons – Number of neurons in each hidden layer
learning_rate – The learning rate of the default optimization technique
- learning_rate: float = 0.0001#
- nb_neurons: Union[List, None, numpy.array] = None#
- static negative_loglikelihood(targets, estimated_distribution)[source]#
Since the output of the model is a distribution, rather than a point estimate, we use the negative loglikelihood as our loss function to compute how likely to see the true data (targets) from the estimated distribution produced by the model.
- plot_confidence_band(targets: numpy.array, x_axis: Optional[numpy.array] = None, output_folder: pathlib.Path = PosixPath('.')) None [source]#
Plots fitted line of prediction
- Parameters
output_folder – location where the plot is saved
- static posterior(kernel_size, bias_size, dtype=None)[source]#
Define variational posterior weight distribution as multivariate Gaussian. Note that the learnable parameters for this distribution are the means, variances, and covariances.