fvgpOptimizer#
- class gpcam.gp_optimizer.fvGPOptimizer(x_data=None, y_data=None, init_hyperparameters=None, noise_variances=None, compute_device='cpu', kernel_function=None, kernel_function_grad=None, noise_function=None, noise_function_grad=None, prior_mean_function=None, prior_mean_function_grad=None, gp2Scale=False, gp2Scale_dask_client=None, gp2Scale_batch_size=10000, gp2Scale_linalg_mode=None, calc_inv=False, ram_economy=False, cost_function=None, logging=False, args=None)[source]#
This class is an optimization extension of the fvgp package for multi-task (vector-valued) Gaussian Processes. Gaussian Processes can be initialized, trained, and conditioned; also the posterior can be evaluated and used via an acquisition function, and plugged into optimizers to find maxima.
V … number of input points
Di… input space dimensionality
No… number of outputs
N … arbitrary integers (N1, N2,…)
The main logic of fvgp is that any multi-task GP is just a single-task GP over a Cartesian product space of input and output space, as long as the kernel is flexible enough, so prepare to work on your kernel. This is the best way to give the user optimal control and power. At various instances, for example prior-mean function, noise function, and kernel function definitions, you will see that the input x is defined over this combined space. For example, if your input space is a Euclidean 2d space and your output is labelled [0,1], the input to the mean, kernel, and noise functions might be
x =
[[0.2, 0.3,0],[0.9,0.6,0],
[0.2, 0.3,1],[0.9,0.6,1]]
This has to be understood and taken into account when customizing gpCAM for multi-task use. The examples will provide deeper insights.
- Parameters:
x_data (np.ndarray | list, optional) – The input point positions. Shape (V x Di), where Di is the
fvgp.fvGP.input_set_dim. For multi-task GPs, the index set dimension = input space dimension + 1. If dealing with non-Euclidean inputs x_data should be a list, not a numpy array. In this case, both the index set and the input space dim are set to 1.y_data (np.ndarray) – The values of the data points. Shape (V,No). It is possible that not every entry in x_data has all corresponding tasks available. In that case y_data may have np.nan as the corresponding entries.
init_hyperparameters (np.ndarray, optional) – Vector of hyperparameters used to initiate the GP. The default is an array of ones with the right length for the anisotropic Matern kernel with automatic relevance determination (ARD). The task direction is simply considered a separate dimension. If gp2Scale is enabled, the default kernel changes to the anisotropic Wendland kernel.
noise_variances (np.ndarray, optional) – An numpy array defining the uncertainties/noise in the y_data in form of a point-wise variance. Shape (V, No). If y_data has np.nan entries, the corresponding noise_variances have to be np.nan. Note: if no noise_variances are provided here, the noise_function callable will be used; if the callable is not provided, the noise variances will be set to abs(np.mean(y_data)) / 100.0. If noise covariances are required (correlated noise), make use of the noise_function. Only provide a noise function OR noise_variances, not both.
compute_device (str, optional) – One of cpu or gpu, determines how linear algebra computations are executed. The default is cpu. For gpu, pytorch or cupy has to be installed manually. For advanced options see args. If gp2Scale is enabled but no kernel is provided, the choice of the compute_device will be particularly important. In that case, the default Wendland kernel will be computed on the cpu or the gpu which will significantly change the compute time depending on the compute architecture.
kernel_function (Callable, optional) – A symmetric positive definite covariance function (a kernel) that calculates the covariance between data points. It is a function of the form k(x1,x2,hyperparameters, [args]). args is optional and is used to make fvgp.gp.args available. The input x1 a N1 x Di+1 array of positions, x2 is a N2 x Di+1 array of positions, the hyperparameters argument is a 1d array of length N depending on how many hyperparameters are initialized. The default is a stationary anisotropic kernel (fvgp.GP.default_kernel) which performs automatic relevance determination (ARD). The task direction is simply considered an additional dimension. This kernel should only be used for tests and in the simplest of cases. The output is a matrix, an N1 x N2 numpy array.
kernel_function_grad (Callable, optional) – A function that calculates the derivative of the gp_kernel_function with respect to the hyperparameters. If provided, it will be used for local training (optimization) and can speed up the calculations. It accepts as input x1 (a N1 x Di + 1 array of positions), x2 (a N2 x Di + 1 array of positions) and hyperparameters (a 1d array of length Di+2 for the default kernel). The default is a finite difference calculation. If ram_economy is True, the function’s input is x1, x2,hyperparameters (numpy array), and a direction (int). The output is a numpy array of shape (len(hps) x N). If ram_economy is False, the function’s input is x1, x2, and hyperparameters. The output is a numpy array of shape (len(hyperparameters) x N1 x N2). See ram_economy.
prior_mean_function (Callable, optional) – A function f(x, hyperparameters, [args]) that evaluates the prior mean at a set of input position. It accepts as input an array of positions (of shape N1 x Di+1) and hyperparameters (a 1d array of length Di+2 for the default kernel). Optionally, the third argument args can be defined. The return value is a 1d array of length N1. If None is provided, fvgp.GP._default_mean_function is used, which is the average of the y_data.
prior_mean_function_grad (Callable, optional) – A function that evaluates the gradient of the prior_mean_function at a set of input positions with respect to the hyperparameters. It accepts as input an array of positions (of size N1 x Di+1) and hyperparameters (a 1d array of length Di+2 for the default kernel). The return value is a 2d array of shape (len(hyperparameters) x N1). If None is provided, either zeros are returned since the default mean function does not depend on hyperparameters, or a finite-difference approximation is used if prior_mean_function is provided.
noise_function (Callable, optional) – The noise function is a callable f(x,hyperparameters, [args]) that returns a vector (1d np.ndarray) of len(x), a matrix of shape (length(x),length(x)) or a sparse matrix of the same shape. The third argument args is optional. The input x is a numpy array of shape (N x Di+1). The hyperparameter array is the same that is communicated to mean and kernel functions. Only provide a noise function OR a noise variance vector, not both.
noise_function_grad (Callable, optional) – A function that evaluates the gradient of the noise_function at an input position with respect to the hyperparameters. It accepts as input an array of positions (of size N x Di+1) and hyperparameters (a 1d array of length D+1 for the default kernel). The return value is a 2d np.ndarray of shape (len(hyperparameters) x N) or a 3d np.ndarray of shape (len(hyperparameters) x N x N). If None is provided, either zeros are returned since the default noise function does not depend on hyperparameters, or, if noise_function is provided but no noise function, a finite-difference approximation will be used. The same rules regarding ram_economy as for the kernel definition apply here. That means the function will have an additional direction parameter.
gp2Scale (bool, optional) – Turns on gp2Scale. This will distribute the covariance computations across multiple workers. This is an advanced feature for HPC GPs up to 10 million data points. If gp2Scale is used, the default kernel is an anisotropic Wendland kernel which is compactly supported. There are a few things to consider (read on); this is an advanced option. If no kernel is provided, the compute_device option should be revisited. The default kernel will use the specified device to compute covariances. The default is False.
gp2Scale_dask_client (dask.distributed.Client, optional) – A dask client for gp2Scale. On HPC architecture, this client is provided by the job script. Please have a look at the examples. A local client is used as the default.
gp2Scale_batch_size (int, optional) – Matrix batch size for distributed computing in gp2Scale. The default is 10000.
gp2Scale_linalg_mode (str, optional) – One of Chol, sparseLU, sparseCG, sparseMINRES, sparseSolve, sparseCGpre (incomplete LU preconditioner), or sparseMINRESpre. The default is None which amounts to an automatic determination of the mode. For advanced customization options this can also be an iterable with three callables: the first f(K), where K is the covariance matrix to compute a factorization object which is available in the second and third callable. The second being the linear solve f(obj, vec), and the third being the logdet=f(obj). If a factorization object is not required, the first callable should return the matrix itself (K).
calc_inv (bool, optional) – If True, the algorithm calculates and stores the inverse of the covariance matrix after each training or update of the dataset or hyperparameters, which makes computing the posterior covariance faster (3-10 times). For larger problems (>5000 data points), the use of inversion should be avoided due to computational instability and costs. The default is False. Note, the training will not use the inverse for stability reasons. Storing the inverse is a good option when the dataset is not too large and the posterior covariance is heavily used.
ram_economy (bool, optional) – Only of interest if the gradient and/or Hessian of the log marginal likelihood is/are used for the training. If True, components of the derivative of the log marginal likelihood are calculated sequentially, leading to a slow-down but much less RAM usage. If the derivative of the kernel (and noise function) with respect to the hyperparameters (kernel_function_grad) is going to be provided, it has to be tailored: for ram_economy=True it should be of the form f(x, hyperparameters, direction) and return a 2d numpy array of shape len(x1) x len(x2). If ram_economy=False, the function should be of the form f(x, hyperparameters) and return a numpy array of shape H x len(x1) x len(x2), where H is the number of hyperparameters. CAUTION: This array will be stored and is very large.
cost_function (Callable, optional) – A function encoding the cost of motion through the input space and the cost of a measurement. Its inputs are an origin (np.ndarray of size V x D), x (np.ndarray of size V x D), and the value of cost_func_params; origin is the starting position, and x is the destination position. The return value is a 1d array of length V describing the costs as floats. The ‘score’ from acquisition_function is divided by this returned cost to determine the next measurement point. The default is no-op.
logging (bool) – If true, logging is enabled. The default is False.
args (dict, optional) –
Advanced options. Recognized keys are:
”random_logdet_lanczos_degree” : int; default = 20
”random_logdet_error_rtol” : float; default = 0.01
”random_logdet_verbose” : True/False; default = False
”random_logdet_print_info” : True/False; default = False
”sparse_minres_tol” : float
”cg_minres_tol” : float
”random_logdet_lanczos_compute_device” : str; default = “cpu”/”gpu”
”Chol_factor_compute_device” : str; default = “cpu”/”gpu”
”update_Chol_factor_compute_device”: str; default = “cpu”/”gpu”
”Chol_solve_compute_device” : str; default = “cpu”/”gpu”
”Chol_logdet_compute_device” : str; default = “cpu”/”gpu”
”GPU_engine” : str; default = “torch”/”cupy”
All other keys will be stored and are available as part of the object instance.
- y_data#
Datapoint values
- Type:
np.ndarray
- noise_variances#
Datapoint observation variances.
- Type:
np.ndarray
- fvgp_y_data#
The data values from the fvgp point of view.
- Type:
np.ndarray
- fvgp_noise_variances#
Observation variances from the fvgp point of view.
- Type:
np.ndarray
- hyperparameters#
Current hyperparameters in use.
- Type:
np.ndarray
- K#
Current prior covariance matrix of the GP
- Type:
np.ndarray
- m#
Current prior mean vector.
- Type:
np.ndarray
- V#
the noise covariance matrix or a vector.
- Type:
np.ndarray
- ask(input_set, x_out=None, acquisition_function='variance', position=None, n=1, method='global', pop_size=20, max_iter=20, tol=1e-06, constraints=(), x0=None, vectorized=True, info=False, args=None, dask_client=None, batch_size=None)#
Given that the acquisition device is at position, this function ask()`s for `n new optimal points within a given input_set (given as bounds or candidates) using the optimization setup method, acquisition_function_pop_size, max_iter, tol, constraints, and x0. This function can also choose the best candidate of a candidate set for Bayesian optimization on non-Euclidean input spaces.
- Parameters:
input_set (np.ndarray | list) – Either a numpy array of floats of shape D x 2 describing the Euclidean search space or a set of candidates in the form of a list. If a candidate list is provided, ask() will evaluate the acquisition function on each element and return a sorted array of length n. This is usually desirable for non-Euclidean inputs but can be used either way. If candidates are Euclidean, they should be provided as a list of 1d np.ndarrays. In that case vectorized = True will lead to a vectorized acquisition function evaluation. The possibility of a candidate list together with user-defined acquisition functions also means that mixed discrete-continuous spaces can be considered here. The candidates will be directly given to the acquisition function.
x_out (np.ndarray, optional) – The position indicating where in the output space the acquisition function should be evaluated. This array is of shape (No). This is only use the multi-task setting.
position (np.ndarray, optional) – Current position in the input space. If a cost function is provided this position will be taken into account to guarantee a cost-efficient new suggestion. The default is None.
n (int, optional) – The algorithm will try to return n suggestions for new measurements. This is either done by method = ‘hgdl’, or otherwise by maximizing the collective information gain (default).
acquisition_function (Callable | str, optional) – The acquisition function accepts as input a numpy array of size V x D (such that V is the number of input points, and D is the parameter space dimensionality) and a
GPOptimizerobject. The return value is 1d array of length V providing ‘scores’ for each position, such that the highest scored point will be measured next. In the single-task case (usinggpcam.GPOptimizer) the following built-in acquisition functions can be used: `ucb(),`lcb`,`maximum`, minimum, variance,`expected improvement`, relative information entropy,`relative information entropy set`, probability of improvement, gradient,`total correlation`,`target probability`. In the multi-task case (usinggpcam.fvGPOptimizer) the following built-in acquisition functions can be used: `variance(), relative information entropy, relative information entropy set, total correlation, ucb, lcb, and expected improvement. In the multi-task case, it is highly recommended to deploy a user-defined acquisition function due to the intricate relationship of posterior distributions at different points in the output space. If None, the default function variance, meaningfvgp.GP.posterior_covariance()with variance_only = True will be used. The acquisition function can be a callable function of the form my_func(x,gpcam.GPOptimizer) which will be maximized (!!!), so make sure desirable new measurement points will be located at maxima. Explanations of the built-in acquisition functions: variance: simply the posterior variance; relative information entropy: the KL divergence of the prior over predictions and the posterior; relative information entropy set: the KL divergence of the prior; defined over predictions and the posterior point-by-point; ucb: upper confidence bound, posterior mean + 3. std; lcb: lower confidence bound, -(posterior mean - 3. std); maximum: finds the maximum of the current posterior mean; minimum: finds the maximum of the current posterior mean; gradient: puts focus on high-gradient regions; probability of improvement: as the name would suggest; expected improvement: as the name would suggest; total correlation: extension of mutual information to more than 2 random variables; target probability: probability of a target. This needs a dictionary args = {‘a’: lower bound, ‘b’: upper bound} to be defined.method (str, optional) – A string defining the method used to find the maximum of the acquisition function. Choose from global, local, hgdl. HGDL is an in-house hybrid optimizer that is comfortable on HPC hardware. The default is global.
pop_size (int, optional) – An integer defining the number of individuals if global is chosen as method. The default is 20. For
hgdlthis will be overwritten by the dask_client definition.max_iter (int, optional) – This number defined the number of iterations before the optimizer is terminated. The default is 20.
tol (float, optional) – Termination criterion for the local optimizer. The default is 1e-6.
x0 (np.ndarray, optional) – A set of points as numpy array of shape N x D, used as starting location(s) for the optimization algorithms. The default is None.
vectorized (bool, optional) – If your acquisition function is vectorized to return the solution to an array of inquiries as an array, this option makes the optimization faster if method = ‘global’ is used. The default is True but will be set to False if method is not global.
info (bool, optional) – Print optimization information. The default is False.
constraints (tuple of object instances, optional) – scipy constraints instances, depending on the used optimizer.
args (any, optional) – Arguments that will be passed to the acquisition function as part of the gp_optimizer object. This will overwrite the args set at initialization.
dask_client (distributed.client.Client, optional) – A Dask Distributed Client instance for distributed acquisition_function optimization. If None is provided, a new
distributed.client.Clientinstance is constructed for hgdl.batch_size (distributed.client.Client, optional) – If a candidate set (input set) and a dask client is provided, the acquisition function evaluations will be executed in parallel in batches of this size.
- Returns:
Solution – Found maxima of the acquisition function, the associated function values and optimization object that, only in case of method = hgdl can be queried for solutions.
- Return type:
{‘x’: np.array(maxima), “f_a(x)” : np.array(func_evals), “opt_obj” : opt_obj}
- crps(x_test, y_test)#
This function calculates the continuous rank probability score.
- Parameters:
x_test (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.
y_test (np.ndarray) – A numpy array of shape (V x No) in the multi-output case. These are the y data to compare against.
- Returns:
CRPS, standard dev. of CRPS
- Return type:
- evaluate_acquisition_function(x, x_out=None, acquisition_function='variance', origin=None, args=None)#
Function to evaluate the acquisition function.
- Parameters:
x (np.ndarray | list) – Point positions at which the acquisition function is evaluated. np.ndarray of shape (N x D) or list.
x_out (np.ndarray, optional) – Point positions in the output space.
acquisition_function (Callable, optional) – Acquisition function to execute. Callable with inputs (x,gpcam.gp_optimizer.GPOptimizer), where x is a V x D array of input x_data. The return value is a 1d array of length V. The default is variance.
origin (np.ndarray, optional) – If a cost function is provided this 1d numpy array of length D is used as the origin of motion.
args (any, optional) – Arguments that will be passed to the acquisition function as part of the gp_optimizer object. CAUTION: this will overwrite the args set at initialization.
- Returns:
The acquisition function evaluations at all points x
- Return type:
np.ndarray
- static gaussian_1d(x, mu, sigma)#
Evaluates a 1D Gaussian (Normal) distribution at a point x.
- Parameters:
x (np.ndarray) – The points where you want to evaluate the Gaussian.
mu (np.ndarray) – The mean of the Gaussian (default 0.0).
sigma (np.ndarray) – The standard deviation of the Gaussians.
- Returns:
Evaluations of the Gaussian
- Return type:
np.ndarray
- get_data()#
Function that provides access to the class attributes.
- Returns:
dictionary of class attributes
- Return type:
- get_gp2Scale_exec_time(time_per_worker_execution, number_of_workers)#
This function calculates the estimated time gp2Scale takes to calculate the covariance matrix as a function of the number of workers and their speed calculating a block.
- get_hyperparameters()#
Function to get the current hyperparameters.
- Returns:
hyperparameters
- Return type:
np.ndarray
- get_prior_pdf()#
Function to get the current prior covariance matrix.
- Returns:
A dictionary containing information about the GP prior distribution
- Return type:
- gp_entropy(x_pred, x_out=None)#
Function to compute the entropy of the gp prior probability distribution.
- Parameters:
x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces. Output coordinates in case of multi-task GP use; a numpy array of size (N x L), where N is the number of output points, and L is the dimensionality of the output space.
x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).
- Returns:
Entropy
- Return type:
- gp_entropy_grad(x_pred, direction, x_out=None)#
Function to compute the gradient of entropy of the prior in a given direction.
- Parameters:
x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces.
direction (int) – Direction of the derivative.
x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).
- Returns:
Entropy gradient in given direction
- Return type:
- gp_kl_div(x_pred, comp_mean, comp_cov, x_out=None)#
Function to compute the kl divergence of a posterior at given points and a given normal distribution.
- Parameters:
x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces.
comp_mean (np.ndarray) – Comparison mean vector for KL divergence. len(comp_mean) = len(x_pred)
comp_cov (np.ndarray) – Comparison covariance matrix for KL divergence. shape(comp_cov) = (len(x_pred),len(x_pred))
x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).
- Returns:
Solution
- Return type:
- gp_mutual_information(x_pred, x_out=None, add_noise=False)#
Function to calculate the mutual information between the random variables f(x_data) and f(x_pred). The mutual information is always positive, as it is a KL divergence, and is bounded from below by 0. The maxima are expected at the data points. Zero is expected far from the data support. :param x_pred: A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for
GPs on non-Euclidean input spaces.
- Parameters:
x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).
add_noise (bool, optional) – If True the noise variances will be added to the prior over the prediction points. Default = False.
- Returns:
Solution
- Return type:
- gp_relative_information_entropy(x_pred, x_out=None, add_noise=False)#
Function to compute the KL divergence and therefore the relative information entropy of the prior distribution defined over predicted function values and the posterior distribution. The value is a reflection of how much information is predicted to be gained through observing a set of data points at x_pred.
- Parameters:
x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces.
x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).
add_noise (bool, optional) – If True the noise variances will be added to the posterior covariance. Default = False.
- Returns:
Solution – Relative information entropy of prediction points, as a collective.
- Return type:
- gp_relative_information_entropy_set(x_pred, x_out=None, add_noise=False)#
Function to compute the KL divergence and therefore the relative information entropy of the prior distribution over predicted function values and the posterior distribution. The value is a reflection of how much information is predicted to be gained through observing each data point in x_pred separately, not all at once as in gp_relative_information_entropy.
- Parameters:
x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces.
x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).
add_noise (bool, optional) – If True the noise variances will be added to the posterior covariance. Default = False.
- Returns:
Solution – Relative information entropy of prediction points, but not as a collective.
- Return type:
- gp_total_correlation(x_pred, x_out=None, add_noise=False)#
Function to calculate the interaction information between the random variables f(x_data) and f(x_pred). This is the mutual information of each f(x_pred) with f(x_data). It is also called the Multi-information. It is best used when several prediction points are supposed to be mutually aware. The total correlation is always positive, as it is a KL divergence, and is bounded from below by 0. The maxima are expected at the data points. Zero is expected far from the data support.
- Parameters:
x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces.
x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).
add_noise (bool, optional) – If True the noise variances will be added to the prior over the prediction points. Default = False.
- Returns:
Solution – Total correlation between prediction points, as a collective.
- Return type:
- joint_gp_prior(x_pred, x_out=None)#
Function to compute the joint prior over f (at measured locations) and f_pred at x_pred.
- Parameters:
x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces.
x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).
- Returns:
Solution
- Return type:
- joint_gp_prior_grad(x_pred, direction, x_out=None)#
Function to compute the gradient of the data-informed prior.
- Parameters:
x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces.
direction (int) – Direction of derivative.
x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).
- Returns:
Solution
- Return type:
- kill_client(opt_obj)#
Function to kill an asynchronous training client. This shuts down the associated
distributed.client.Client.- Parameters:
opt_obj (object instance) – Object created by
train(asynchronous=True)().
- log_likelihood(hyperparameters=None)#
Function that computes the marginal log-likelihood
- Parameters:
hyperparameters (np.ndarray, optional) – Vector of hyperparameters of shape (N). If not provided, the covariance will not be recomputed.
- Returns:
log marginal likelihood of the data
- Return type:
- static make_1d_x_pred(b, res=100)#
This is a purely convenience-driven function calculating prediction points on a 1d grid.
- Parameters:
b (np.ndarray) – A numpy array of shape (2) defineing lower and upper bounds
res (int, optional) – Resolution. Default = 100
- Returns:
prediction points
- Return type:
np.ndarray
- static make_2d_x_pred(bx, by, resx=100, resy=100)#
This is a purely convenience-driven function calculating prediction points on a grid. :param bx: A numpy array of shape (2) defining lower and upper bounds in x direction. :type bx: np.ndarray :param by: A numpy array of shape (2) defining lower and upper bounds in y direction. :type by: np.ndarray :param resx: Resolution in x direction. Default = 100. :type resx: int, optional :param resy: Resolution in y direction. Default = 100. :type resy: int, optional
- Returns:
prediction points
- Return type:
np.ndarray
- neg_log_likelihood_gradient(hyperparameters=None, component=0)#
Function that computes the gradient of the marginal log-likelihood.
- Parameters:
hyperparameters (np.ndarray, optional) – Vector of hyperparameters of shape (N). If not provided, the covariance will not be recomputed.
component (int, optional) – In case many GPs are computed in parallel, this specifies which one is considered.
- Returns:
Gradient of the negative log marginal likelihood
- Return type:
np.ndarray
- nlpd(x_test, y_test)#
This function calculates the Negative log predictive density.
- Parameters:
x_test (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.
y_test (np.ndarray) – A numpy array of shape V or (V x No) in the multi-output case. These are the y data to compare against.
- Returns:
NLPD
- Return type:
- nrmse(x_test, y_test)#
This function calculates the normalized root mean squared error. Note that in the multi-task setting the user should perform their input point transformation beforehand.
- Parameters:
x_test (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.
y_test (np.ndarray) – A numpy array of shape V or (V x No) in the multi-output case. These are the y data to compare against.
- Returns:
NRMSE
- Return type:
- optimize(*, func, search_space, x_out=None, hyperparameter_bounds=None, train_at=(10, 20, 50, 100, 200), x0=None, acq_func='lcb', max_iter=100, callback=None, break_condition=None, ask_max_iter=20, ask_pop_size=20, method='global', training_method='global', training_max_iter=20)#
This function is a light-weight optimization loop, using tell() and ask() repeatedly to optimize a given function, while retraining the GP regularly. For advanced customizations please use those three methods in a customized loop.
- Parameters:
func (Callable) – The function to be optimized. The callable should be of the form def f(x), where x is an element of your search space. The return is a tuple of scalars or vectors (a,b) where a is a scalar/vector of function evaluations and b is a scalar/vector of noise variances. Scalar here applies when the function to be optimized is a scalar valued function. Vector here applies when the function to be optimized is a vector valued function.
search_space (np.ndarray | list) – In the Euclidean case this should be a 2d np.ndarray of bounds in each direction of the input space. In the non-Euclidean case, this should be a list of all candidates.
x_out (np.ndarray, optional) – The position indicating where in the output space the acquisition function should be evaluated. This array is of shape (No).
hyperparameter_bounds (np.ndarray) – Bound of the hyperparameters for the training. The default will only work for the default kernel. Otherwise, please specify bounds for your hyperparameters.
train_at (tuple, optional) – The list should contain the integers that indicate the data lengths at which to train the GP. The default = [10,20,50,100,200].
x0 (np.ndarray, optional) – Starting position(s). Corresponding to the search space either elements of the candidate set in form of a list or elements of the Euclidean search space in the form of a 2d np.ndarray.
acq_func (Callable, optional) – Default lower-confidence bound(lcb) which means minimizing the func. The acquisition function should be formulated such that MAXIMIZING it will lead to the desired optimization (minimization or maximization) of func. For example lcb (the default) MAXIMIZES -(mean - 3.0 * standard dev) which is equivalent to minimizing (mean - 3.0 * standard dev) which leads to finding a minimum.
max_iter (int, optional) – The maximum number of iterations. Default=10,000,000.
callback (Callable, optional) – Function to be called in every iteration. Form: f(x_data, y_data)
break_condition (Callable, optional) – Callable f(x_data, y_data) that should return True if run is complete, otherwise False.
ask_max_iter (int, optional) – Default=20. Maximum number of iteration of the global and hybrid optimizer within ask().
ask_pop_size (int, optional) – Default=20. Population size of the global and hybrid optimizer.
method (str, optional) – Default=`global`. Method of optimization of the acquisition function. One of global, `local, hybrid.
training_method (str, optional) – Default=`global`. See
gpcam.GPOptimizer.train()training_max_iter (int, optional) – Default=20. See
gpcam.GPOptimizer.train()
- Returns:
Full traces of function values `f(x)` and arguments `x` and the last entry –
- Form {‘trace f(x)’: self.y_data,
’trace x’: self.x_data, ‘f(x)’: self.y_data[-1], ‘x’: self.x_data[-1]}
- Return type:
- picp(x_test, y_true, interval=0.95)#
Computes the Prediction Interval Coverage Probability (PICP) for a Gaussian Process posterior.
- Parameters:
x_test (array-like, shape (N,dim))
y_true (array-like, shape (N,)) – True values of the target variable.
interval (float, optional) – Confidence interval (default 0.95 for 95% intervals).
- Returns:
picp (float) – Prediction Interval Coverage Probability
lower_bounds (ndarray) – Lower bounds of prediction intervals
upper_bounds (ndarray) – Upper bounds of prediction intervals
- posterior_covariance(x_pred, x_out=None, variance_only=False, add_noise=False)#
Function to compute the posterior covariance.
- Parameters:
x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces.
x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).
variance_only (bool, optional) – If True the computation of the posterior covariance matrix is avoided which can save compute time. In that case the return will only provide the variance at the input points. Default = False. This is only relevant if calc_inv at initialization is True.
add_noise (bool, optional) – If True the noise variances will be added to the posterior variances. Default = False.
- Returns:
Solution
- Return type:
- posterior_covariance_grad(x_pred, x_out=None, direction=None)#
Function to compute the gradient of the posterior covariance.
- Parameters:
x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces.
x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).
direction (int, optional) – Direction of derivative, If None (default) the whole gradient will be computed.
- Returns:
Solution
- Return type:
- posterior_mean(x_pred, hyperparameters=None, x_out=None)#
This function calculates the posterior mean for a set of input points.
- Parameters:
x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces.
hyperparameters (np.ndarray, optional) – A numpy array of the correct size depending on the kernel. This is optional in case the posterior mean has to be computed with given hyperparameters, which is, for instance, the case if the posterior mean is a constraint during training. The default is None which means the initialized or trained hyperparameters are used.
x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).
- Returns:
Solution points and function values
- Return type:
- posterior_mean_grad(x_pred, hyperparameters=None, x_out=None, direction=None, component=0)#
This function calculates the gradient of the posterior mean for a set of input points.
- Parameters:
x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces.
hyperparameters (np.ndarray, optional) – A numpy array of the correct size depending on the kernel. This is optional in case the posterior mean has to be computed with given hyperparameters, which is, for instance, the case if the posterior mean is a constraint during training. The default is None which means the initialized or trained hyperparameters are used.
x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).
direction (int, optional) – Direction of derivative, If None (default) the whole gradient will be computed.
component (int, optional) – In case y_data is multi-modal and no fvgp.GPOptimizer is used — this means y_data.shape[1] independent GPs are being executed — this indicates which GP’s gradient is evaluated. The default is 0.
- Returns:
Solution
- Return type:
- posterior_probability(x_pred, comp_mean, comp_cov, x_out=None)#
Function to compute probability of a probabilistic quantity of interest, given the GP posterior at given points.
- Parameters:
x_pred (np.ndarray or list) – A numpy array of shape (V x D), interpreted as an array of input point positions, or a list for GPs on non-Euclidean input spaces.
comp_mean (np.ndarray) – A vector of mean values, same length as x_pred.
comp_cov (np.nparray) – Covariance matrix, in R^{len(x_pred) x len(x_pred)}
x_out (np.ndarray, optional) – Output coordinates in case of multi-task GP use; a numpy array of size (N), where N is the number evaluation points in the output direction. Usually this is np.ndarray([0,1,2,…]).
- Returns:
Solution – The probability of a probabilistic quantity of interest, given the GP posterior at a given point.
- Return type:
- r2(x_test, y_test)#
This function calculates the R2 prediction score.
- Parameters:
x_test (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.
y_test (np.ndarray) – A numpy array of shape V or (V x No) in the multi-output case. These are the y data to compare against.
- Returns:
R2
- Return type:
- rmse(x_test, y_test)#
This function calculates the root mean squared error. Note that in the multi-task setting the user should perform their input point transformation beforehand.
- Parameters:
x_test (np.ndarray) – A numpy array of shape (V x D), interpreted as an array of input point positions.
y_test (np.ndarray) – A numpy array of shape V or (V x No) in the multi-output case. These are the y data to compare against.
- Returns:
RMSE
- Return type:
- set_args(new_args)#
Use this function to change the arguments for the GP.
- Parameters:
new_args (dict) – The new advanced settings.
- set_hyperparameters(hps)#
Function to set hyperparameters.
- Parameters:
hps (np.ndarray) – A 1-d numpy array of hyperparameters.
- stop_training(opt_obj)#
Function to stop an asynchronous hgdl training. This leaves the
distributed.client.Clientalive.- Parameters:
opt_obj (object instance) – Object created by
train(asynchronous=True)().
- tell(x, y, noise_variances=None, append=True, gp_rank_n_update=None)#
This function can tell() the gp_optimizer class the data that was collected. The data will instantly be used to update the GP data.
- Parameters:
x (np.ndarray | list) – Point positions to be communicated to the Gaussian Process; either a np.ndarray of shape (U x D) or a list.
y (np.ndarray) – The values of the data points. Shape (V,No). It is possible that not every entry in x_new has all corresponding tasks available. In that case y_new may contain np.nan entries.
noise_variances (np.ndarray, optional) – An numpy array or list defining the uncertainties/noise in the y_data in form of a point-wise variance. Shape (V, No). If y_data has np.nan entries, the corresponding noise_variances have to be np.nan. Note: if no noise_variances are provided here, the noise_function callable will be used; if the callable is not provided, the noise variances will be set to abs(np.mean(y_data)) / 100.0. If noise covariances are required (correlated noise), make use of the noise_function. Only provide a noise function OR noise_variances, not both.
append (bool, optional) – Indication whether to append to or overwrite the existing dataset. Default = True. In the default case, data will be appended.
gp_rank_n_update (bool , optional) – Indicates whether the GP marginal should be rank-n updated or recomputed. The default is gp_rank_n_update=append, meaning if data is only appended, the rank_n_update will be performed.
- test_log_likelihood_gradient(hyperparameters, epsilon=1e-06)#
Function to test your gradient of the log-likelihood and therefore of the kernel function.
- Parameters:
hyperparameters (np.ndarray, optional) – Vector of hyperparameters of shape (N).
- Return type:
analytical and finite difference gradients to compare
- train(hyperparameter_bounds=None, objective_function=None, objective_function_gradient=None, objective_function_hessian=None, init_hyperparameters=None, method='mcmc', pop_size=20, tolerance=0.0001, max_iter=10000, local_optimizer='L-BFGS-B', global_optimizer='genetic', constraints=(), dask_client=None, info=False, asynchronous=False)#
This function finds the maximum of the log marginal likelihood and therefore trains the GP (synchronously). This can be done on a remote cluster/computer by specifying the method to be hgdl and providing a dask client. Method hgdl can also be run asynchronously. The GP prior will automatically be updated with the new hyperparameters after the training.
- Parameters:
hyperparameter_bounds (np.ndarray, optional) – A 2d numpy array of shape (N x 2), where N is the number of hyperparameters. The default means inferring the bounds from the communicated dataset. This only works for the default kernel.
objective_function (callable, optional) – The function that will be MINIMIZED for training the GP. The form of the function is f(hyperparameters=hps) and returns a scalar. This function can be used to train via non-standard user-defined objectives. The default is the negative log marginal likelihood.
objective_function_gradient (callable, optional) – The gradient of the function that will be MINIMIZED for training the GP. The form of the function is f(hyperparameters=hps) and returns a vector of len(hps). This function can be used to train via non-standard user-defined objectives. The default is the gradient of the negative log marginal likelihood.
objective_function_hessian (callable, optional) – The Hessian of the function that will be MINIMIZED for training the GP. The form of the function is f(hyperparameters=hps) and returns a matrix of shape(len(hps),len(hps)). This function can be used to train via non-standard user-defined objectives. The default is the Hessian of the negative log marginal likelihood.
init_hyperparameters (np.ndarray, optional) – Initial hyperparameters used as starting location for all optimizers. The default is a random draw from a uniform distribution within the hyperparameter_bounds.
method (str or Callable, optional) – The method used to train the hyperparameters. The options are global, local, hgdl, mcmc, and a callable. The callable gets a gp.GP instance and has to return a 1d np.ndarray of hyperparameters. The default is mcmc (scipy’s differential evolution). If method = mcmc, the attribute fvgp.GP.mcmc_info is updated and contains convergence and distribution information. For hgdl, please provide a distributed.Client().
pop_size (int, optional) – A number of individuals used for any optimizer with a global component. Default = 20.
tolerance (float, optional) – Used as termination criterion for local optimizers. Default = 0.0001.
max_iter (int, optional) – Maximum number of iterations for global and local optimizers. Default = 10000.
local_optimizer (str, optional) – Defining the local optimizer. Default = L-BFGS-B, most scipy.optimize.minimize functions are permissible.
global_optimizer (str, optional) – Defining the global optimizer. Only applicable to method = hgdl. Default = genetic
constraints (tuple of object instances, optional) – Equality and inequality constraints for the optimization. If the optimizer is hgdl see hgdl.readthedocs.io. If the optimizer is a scipy optimizer, see the scipy documentation.
dask_client (distributed.client.Client, optional) – A Dask Distributed Client instance for distributed training if hgdl is used.
info (bool, optional) – Provides a way how to access information reports during training of the GP. The default is False. If other information is needed please utilize logger as described in the online documentation (separately for HGDL and fvgp if needed).
asynchronous (bool, optional) – Method hgdl allows for asynchronous execution. In that case, an object will be returned that can be queried for an intermediate or final solution.
- Returns:
optimized hyperparameters (only fyi, gp is already updated)
- Return type:
np.ndarray
- update_gp_data(x_new, y_new, noise_variances_new=None, append=True, gp_rank_n_update=None)#
This function updates the data in the gp object instance. The data will only be overwritten if append=False, otherwise the data will be appended. This is a change from earlier versions. Now, the default is not to overwrite the existing data.
- Parameters:
x_new (np.ndarray or list) – The input point positions. Shape (V x Di), where Di is the
fvgp.fvGP.input_set_dim. For multi-task GPs, the index set dimension = input space dimension + 1. If dealing with non-Euclidean inputs x_new should be a list, not a numpy array.y_new (np.ndarray) – The values of the data points. Shape (V,No). It is possible that not every entry in x_new has all corresponding tasks available. In that case y_new may contain np.nan entries.
noise_variances_new (np.ndarray, optional) – An numpy array or list defining the uncertainties/noise in the y_data in form of a point-wise variance. Shape (V, No). If y_data has np.nan entries, the corresponding noise_variances have to be np.nan. Note: if no noise_variances are provided here, the noise_function callable will be used; if the callable is not provided, the noise variances will be set to abs(np.mean(y_data)) / 100.0. If noise covariances are required (correlated noise), make use of the noise_function. Only provide a noise function OR noise_variances, not both.
append (bool, optional) – Indication whether to append to or overwrite the existing dataset. Default = True. In the default case, data will be appended.
gp_rank_n_update (bool , optional) – Indicates whether the GP marginal should be rank-n updated or recomputed. The default is gp_rank_n_update=append, meaning if data is only appended, the rank_n_update will be performed.
- update_hyperparameters(opt_obj)#
Function to update the Gaussian Process hyperparameters if an asynchronous training is running.
- Parameters:
opt_obj (object instance) – Object created by
train(asynchronous=True)().- Returns:
hyperparameters
- Return type:
np.ndarray