gpOptimizer

class gpcam.gp_optimizer.GPOptimizer(x_data, y_data, init_hyperparameters=None, hyperparameter_bounds=None, noise_variances=None, compute_device='cpu', gp_kernel_function=None, gp_kernel_function_grad=None, gp_noise_function=None, gp_noise_function_grad=None, gp_mean_function=None, gp_mean_function_grad=None, gp2Scale=False, gp2Scale_dask_client=None, gp2Scale_batch_size=10000, store_inv=True, ram_economy=False, args=None, info=False, cost_function=None, cost_function_parameters=None, cost_update_function=None)

This class is an optimization wrapper around the fvgp package for single-task (scalar-valued) Gaussian Processes. Gaussian Processes can be initialized, trained, and conditioned; also the posterior can be evaluated and used via acquisition functions, and plugged into optimizers to find its maxima. This class inherits many methods from the fvgp.GP class.

V … number of input points

D … input space dimensionality

N … arbitrary integers (N1, N2,…)

All posterior evaluation functions are inherited from fvgp.GP class. Check there for a full list of capabilities. In addition, other methods like kernel definitions and methods for validation are available. These include, but are not limited to:

fvgp.GP.posterior_mean()

fvgp.GP.posterior_covariance()

fvgp.GP.posterior_mean_grad()

fvgp.GP.posterior_covariance_grad()

fvgp.GP.joint_gp_prior()

fvgp.GP.joint_gp_prior_grad()

fvgp.GP.gp_entropy()

fvgp.GP.gp_entropy_grad()

fvgp.GP.gp_kl_div()

fvgp.GP.gp_kl_div_grad()

fvgp.GP.gp_mutual_information()

fvgp.GP.gp_total_correlation()

fvgp.GP.gp_relative_information_entropy()

fvgp.GP.gp_relative_information_entropy_set()

fvgp.GP.posterior_probability()

fvgp.GP.posterior_probability_grad()

Kernel functions:

fvgp.GP.squared_exponential_kernel()

fvgp.GP.squared_exponential_kernel_robust()

fvgp.GP.exponential_kernel()

fvgp.GP.exponential_kernel_robust()

fvgp.GP.matern_kernel_diff1()

fvgp.GP.matern_kernel_diff1_robust()

fvgp.GP.matern_kernel_diff2()

fvgp.GP.matern_kernel_diff2_robust()

fvgp.GP.sparse_kernel()

fvgp.GP.periodic_kernel()

fvgp.GP.linear_kernel()

fvgp.GP.dot_product_kernel()

fvgp.GP.polynomial_kernel()

fvgp.GP.wendland_anisotropic()

fvgp.GP.non_stat_kernel()

fvgp.GP.non_stat_kernel_gradient()

fvgp.GP.get_distance_matrix()

Other methods:

fvgp.GP.crps()

fvgp.GP.rmse()

fvgp.GP.make_2d_x_pred()

fvgp.GP.make_1d_x_pred()

fvgp.GP.log_likelihood()

fvgp.GP.neg_log_likelihood()

fvgp.GP.neg_log_likelihood_gradient()

fvgp.GP.neg_log_likelihood_hessian()

Parameters:
  • x_data (np.ndarray) – The input point positions. Shape (V x D), where D is the input_space_dim.

  • y_data (np.ndarray) – The values of the data points. Shape (V,1) or (V).

  • init_hyperparameters (np.ndarray, optional) – Vector of hyperparameters used by the GP initially. This class provides methods to train hyperparameters. The default is a random draw from a uniform distribution within hyperparameter_bounds, with a shape appropriate for the default kernel (D + 1), which is an anisotropic Matern kernel with automatic relevance determination (ARD). If gp2Scale is enabled, the default kernel changes to the anisotropic Wendland kernel.

  • hyperparameter_bounds (np.ndarray, optional) – A 2d numpy array of shape (N x 2), where N is the number of needed hyperparameters. The default is None, in which case the hyperparameter_bounds are estimated from the domain size and the initial y_data. If the data changes significantly, the hyperparameters and the bounds should be changed/retrained. Initial hyperparameters and bounds can also be set in the train calls. The default only works for the default kernels.

  • noise_variances (np.ndarray, optional) – An numpy array defining the uncertainties/noise in the data y_data in form of a point-wise variance. Shape (len(y_data), 1) or (len(y_data)). Note: if no noise_variances are provided here, the gp_noise_function callable will be used; if the callable is not provided, the noise variances will be set to abs(np.mean(y_data)) / 100.0. If noise covariances are required, also make use of the gp_noise_function.

  • compute_device (str, optional) – One of “cpu” or “gpu”, determines how linear system solves are run. The default is “cpu”. For “gpu”, pytorch has to be installed manually. If gp2Scale is enabled but no kernel is provided, the choice of the compute_device becomes much more important. In that case, the default kernel will be computed on the cpu or the gpu which will significantly change the compute time depending on the compute architecture.

  • gp_kernel_function (Callable, optional) – A symmetric positive semi-definite covariance function (a kernel) that calculates the covariance between data points. It is a function of the form k(x1,x2,hyperparameters, obj). The input x1 is a N1 x D array of positions, x2 is a N2 x D array of positions, the hyperparameters argument is a 1d array of length D+1 for the default kernel and of a different user-defined length for other kernels obj is an fvgp.GP instance. The default is a stationary anisotropic kernel (fvgp.GP.default_kernel()) which performs automatic relevance determination (ARD). The output is a covariance matrix, an N1 x N2 numpy array.

  • gp_kernel_function_grad (Callable, optional) – A function that calculates the derivative of the gp_kernel_function with respect to the hyperparameters. If provided, it will be used for local training (optimization) and can speed up the calculations. It accepts as input x1 (a N1 x D array of positions), x2 (a N2 x D array of positions), hyperparameters (a 1d array of length D+1 for the default kernel), and a fvgp.GP instance. The default is a finite difference calculation. If ‘ram_economy’ is True, the function’s input is x1, x2, direction (int), hyperparameters (numpy array), and a fvgp.GP instance, and the output is a numpy array of shape (len(hps) x N). If ‘ram economy’ is False,the function’s input is x1, x2, hyperparameters, and a fvgp.GP instance. The output is a numpy array of shape (len(hyperparameters) x N1 x N2). See ‘ram_economy’.

  • gp_mean_function (Callable, optional) – A function that evaluates the prior mean at a set of input position. It accepts as input an array of positions (of shape N1 x D), hyperparameters (a 1d array of length D+1 for the default kernel) and a fvgp.GP instance. The return value is a 1d array of length N1. If None is provided, fvgp.GP._default_mean_function() is used.

  • gp_mean_function_grad (Callable, optional) – A function that evaluates the gradient of the gp_mean_function at a set of input positions with respect to the hyperparameters. It accepts as input an array of positions (of size N1 x D), hyperparameters (a 1d array of length D+1 for the default kernel) and a fvgp.GP instance. The return value is a 2d array of shape (len(hyperparameters) x N1). If None is provided, either zeros are returned since the default mean function does not depend on hyperparameters, or a finite-difference approximation is used if gp_mean_function is provided.

  • gp_noise_function (Callable optional) – The noise function is a callable f(x,hyperparameters,obj) that returns a positive symmetric definite matrix of shape(len(x),len(x)). The input x is a numpy array of shape (N x D). The hyperparameter array is the same that is communicated to mean and kernel functions. The obj is a :py:class:fvgp.GP instance.

  • gp_noise_function_grad (Callable, optional) – A function that evaluates the gradient of the gp_noise_function at an input position with respect to the hyperparameters. It accepts as input an array of positions (of size N x D), hyperparameters (a 1d array of length D+1 for the default kernel) and a fvgp.GP instance. The return value is a 3d array of shape (len(hyperparameters) x N x N). If None is provided, either zeros are returned since the default noise function does not depend on hyperparameters. If gp_noise_function is provided but no gradient function, a finite-difference approximation will be used. The same rules regarding ram economy as for the kernel definition apply here.

  • gp2Scale (bool, optional) – Turns on gp2Scale. This will distribute the covariance computations across multiple workers. This is an advanced feature for HPC GPs up to 10 million data points. If gp2Scale is used, the default kernel is an anisotropic Wendland kernel which is compactly supported. The noise function will have to return a scipy.sparse matrix instead of a numpy array. There are a few more things to consider (read on); this is an advanced option. If no kernel is provided, the compute_device option should be revisited. The kernel will use the specified device to compute covariances. The default is False.

  • gp2Scale_dask_client (distributed.client.Client, optional) – A dask client for gp2Scale to distribute covariance computations over. Has to contain at least 3 workers. On HPC architecture, this client is provided by the job script. Please have a look at the examples. A local client is used as default.

  • gp2Scale_batch_size (int, optional) – Matrix batch size for distributed computing in gp2Scale. The default is 10000.

  • store_inv (bool, optional) – If True, the algorithm calculates and stores the inverse of the covariance matrix after each training or update of the dataset or hyperparameters, which makes computing the posterior covariance faster. For larger problems (>2000 data points), the use of inversion should be avoided due to computational instability and costs. The default is True. Note, the training will always use Cholesky or LU decomposition instead of the inverse for stability reasons. Storing the inverse is a good option when the dataset is not too large and the posterior covariance is heavily used.

  • ram_economy (bool, optional) – Only of interest if the gradient and/or Hessian of the marginal log_likelihood is/are used for the training. If True, components of the derivative of the marginal log-likelihood are calculated subsequently, leading to a slow-down but much less RAM usage. If the derivative of the kernel (or noise function) with respect to the hyperparameters (gp_kernel_function_grad) is going to be provided, it has to be tailored: for ram_economy=True it should be of the form f(x1[, x2], direction, hyperparameters, obj) and return a 2d numpy array of shape len(x1) x len(x2). If ram_economy=False, the function should be of the form f(x1[, x2,] hyperparameters, obj) and return a numpy array of shape H x len(x1) x len(x2), where H is the number of hyperparameters. CAUTION: This array will be stored and is very large.

  • args (any, optional) – args will be a class attribute and therefore available to kernel, noise and prior mean functions.

  • info (bool, optional) – Provides a way how to see the progress of gp2Scale, Default is False

  • cost_function (Callable, optional) – A function encoding the cost of motion through the input space and the cost of a measurement. Its inputs are an origin (np.ndarray of size V x D), x (np.ndarray of size V x D), and the value of cost_func_params; origin is the starting position, and x is the destination position. The return value is a 1d array of length V describing the costs as floats. The ‘score’ from acquisition_function is divided by this returned cost to determine the next measurement point. The default in no-op.

  • cost_function_parameters (object, optional) – This object is transmitted to the cost function; it can be of any type. The default is None.

  • cost_update_function (Callable, optional) – If provided this function will be used when update_cost_function() is called. The function cost_update_function accepts as input costs (a list of cost values usually determined by instrument_func) and a parameter object. The default is a no-op.

x_data

Datapoint positions

Type:

np.ndarray

y_data

Datapoint values

Type:

np.ndarray

noise_variances

Datapoint observation (co)variances

Type:

np.ndarray

hyperparameters

Current hyperparameters in use.

Type:

np.ndarray

K

Current prior covariance matrix of the GP

Type:

np.ndarray

KVinv

If enabled, the inverse of the prior covariance + noise matrix V inv(K+V)

Type:

np.ndarray

KVlogdet

logdet(K+V)

Type:

float

V

the noise covariance matrix

Type:

np.ndarray

ask(bounds=None, candidates=None, position=None, n=1, acquisition_function='variance', method='global', pop_size=20, max_iter=20, tol=1e-06, constraints=(), x0=None, vectorized=True, info=False, dask_client=None)

Given that the acquisition device is at “position”, this function `ask()`s for “n” new optimal points within certain “bounds” and using the optimization setup: “method”, “acquisition_function_pop_size”, “max_iter”, “tol”, “constraints”, and “x0”. This function can also choose the best candidate of a candidate set for Bayesian optimization on non-Euclidean input spaces.

Parameters:
  • bounds (np.ndarray, optional) – A numpy array of floats of shape D x 2 describing the search range. While this is optional, bounds or a candidate set has to be provided.

  • candidates (list or np.ndarray, optional) – If provided, ask will statistically choose the best candidate from the set. This is usually desirable for non-Euclidean inputs but can be used either way. If candidates are Euclidean, they should be provided as 2d numpy array. Bounds or candidates have to be specified, not both. If N optimal solutions are requested (n=N), then a maximum of 100*N candidates are being considered randomly. If fewer candidates are provided, all will be considered.

  • position (np.ndarray, optional) – Current position in the input space. If a cost function is provided this position will be taken into account to guarantee a cost-efficient new suggestion. The default is None.

  • n (int, optional) – The algorithm will try to return n suggestions for new measurements. This is either done by method = ‘hgdl’, or otherwise by maximizing the collective information gain (default).

  • acquisition_function (Callable, optional) – The acquisition function accepts as input a numpy array of size V x D (such that V is the number of input points, and D is the parameter space dimensionality) and a GPOptimizer object. The return value is 1d array of length V providing ‘scores’ for each position, such that the highest scored point will be measured next. Built-in functions can be used by one of the following keys: ucb,`lcb`,`maximum`, minimum, variance,`expected_improvement`, relative information entropy,`relative information entropy set`, probability of improvement, gradient,`total correlation`,`target probability`. If None, the default function variance, meaning fvgp.GP.posterior_covariance() with variance_only = True will be used. The acquisition function can be a callable of the form my_func(x,gpcam.GPOptimizer) which will be maximized (!!!), so make sure desirable new measurement points will be located at maxima. Explanations of the acquisition functions: variance: simply the posterior variance relative information entropy: the KL divergence of the prior over predictions and the posterior relative information entropy set: the KL divergence of the prior defined over predictions and the posterior point-by-point ucb: upper confidence bound, posterior mean + 3. std lcb: lower confidence bound, -(posterior mean - 3. std) maximum: finds the maximum of the current posterior mean minimum: finds the maximum of the current posterior mean gradient: puts focus on high-gradient regions probability of improvement: as the name would suggest expected improvement: as the name would suggest total correlation: extension of mutual information to more than 2 random variables target probability: probability of a target; needs a dictionary GPOptimizer.args = {‘a’: lower bound, ‘b’: upper bound} to be defined.

  • method (str, optional) – A string defining the method used to find the maximum of the acquisition function. Choose from global, local, hgdl. HGDL is an in-house hybrid optimizer that is comfortable on HPC hardware. The default is global.

  • pop_size (int, optional) – An integer defining the number of individuals if global is chosen as method. The default is 20. For hgdl this will be overwritten by the dask_client definition.

  • max_iter (int, optional) – This number defined the number of iterations before the optimizer is terminated. The default is 20.

  • tol (float, optional) – Termination criterion for the local optimizer. The default is 1e-6.

  • x0 (np.ndarray, optional) – A set of points as numpy array of shape N x D, used as starting location(s) for the optimization algorithms. The default is None.

  • vectorized (bool, optional) – If your acquisition function is vectorized to return the solution to an array of inquiries as an array, this option makes the optimization faster if method = ‘global’ is used. The default is True but will be set to False if method is not global.

  • info (bool, optional) – Print optimization information. The default is False.

  • constraints (tuple of object instances, optional) – scipy constraints instances, depending on the used optimizer.

  • dask_client (distributed.client.Client, optional) – A Dask Distributed Client instance for distributed acquisition_function optimization. If None is provided, a new distributed.client.Client instance is constructed for hgdl.

Returns:

Solution – Found maxima of the acquisition function, the associated function values and optimization object that, only in case of method = hgdl can be queried for solutions.

Return type:

{‘x’: np.array(maxima), “f(x)” : np.array(func_evals), “opt_obj” : opt_obj}

evaluate_acquisition_function(x, acquisition_function='variance', origin=None)

Function to evaluate the acquisition function.

Parameters:
  • x (np.ndarray) – Point positions at which the acquisition function is evaluated. Shape (N x D).

  • acquisition_function (Callable, optional) – Acquisition function to execute. Callable with inputs (x,gpcam.GPOptimizer), where x is a V x D array of input x position. The return value is a 1d array of length V. The default is variance.

  • origin (np.ndarray, optional) – If a cost function is provided this 1d numpy array of length D is used as the origin of motion.

Returns:

Evaluation – The acquisition function evaluations at all points x.

Return type:

np.ndarray

get_data()

Function that provides a way to access the class attributes.

Returns:

dictionary of class attributes

Return type:

dict

kill_training(opt_obj)

Function to kill an asynchronous training. This shuts down the associated distributed.client.Client.

Parameters:

opt_obj (object instance) – Object created by train_async().

stop_training(opt_obj)

Function to stop an asynchronous training. This leaves the distributed.client.Client alive.

Parameters:

opt_obj (object instance) – Object created by train_async().

tell(x, y, noise_variances=None, overwrite=True)

This function can tell() the gp_optimizer class the data that was collected. The data will instantly be used to update the gp data. IMPORTANT: This call does not append data. The entire dataset, including the updates, has to be provided.

Parameters:
  • x (np.ndarray) – Point positions (of shape U x D) to be communicated to the Gaussian Process.

  • y (np.ndarray) – Point values (of shape U x 1 or U) to be communicated to the Gaussian Process.

  • noise_variances (np.ndarray, optional) – Point value variances (of shape U x 1 or U) to be communicated to the Gaussian Process. If not provided, the GP will 1% of the y values as variances.

  • overwrite (bool, optional) – The default is True. Indicates if all previous data should be overwritten.

train(objective_function=None, objective_function_gradient=None, objective_function_hessian=None, hyperparameter_bounds=None, init_hyperparameters=None, method='global', pop_size=20, tolerance=0.0001, max_iter=120, local_optimizer='L-BFGS-B', global_optimizer='genetic', constraints=(), dask_client=None)

This function finds the maximum of the log marginal likelihood and therefore trains the GP (synchronously). This can be done on a remote cluster/computer by specifying the method to be ‘hgdl’ and providing a dask client. However, in that case gpcam.GPOptimizer.train_async() is preferred. The GP prior will automatically be updated with the new hyperparameters after the training.

Parameters:
  • objective_function (callable, optional) – The function that will be MINIMIZED for training the GP. The form of the function is f(hyperparameters=hps) and returns a scalar. This function can be used to train via non-standard user-defined objectives. The default is the negative log marginal likelihood.

  • objective_function_gradient (callable, optional) – The gradient of the function that will be MINIMIZED for training the GP. The form of the function is f(hyperparameters=hps) and returns a vector of len(hps). This function can be used to train via non-standard user-defined objectives. The default is the gradient of the negative log marginal likelihood.

  • objective_function_hessian (callable, optional) – The hessian of the function that will be MINIMIZED for training the GP. The form of the function is f(hyperparameters=hps) and returns a matrix of shape(len(hps),len(hps)). This function can be used to train via non-standard user-defined objectives. The default is the hessian of the negative log marginal likelihood.

  • hyperparameter_bounds (np.ndarray, optional) – A numpy array of shape (D x 2), defining the bounds for the optimization. The default is an array of bounds of the length of the initial hyperparameters with all bounds defined practically as [0.00001, inf]. The initial hyperparameters are either defined by the user at initialization, or in this function call, or are defined as np.ones((input_space_dim + 1)). This choice is only recommended in very basic scenarios and can lead to suboptimal results. It is better to provide hyperparameter bounds.

  • init_hyperparameters (np.ndarray, optional) – Initial hyperparameters used as starting location for all optimizers with local component. The default is a random draw from a uniform distribution within the bounds.

  • method (str or Callable, optional) – The method used to train the hyperparameters. The options are global, local, hgdl, mcmc, and a callable. The callable gets a fvgp.GP instance and has to return a 1d np.ndarray of hyperparameters. The default is global (scipy’s differential evolution). If method = “mcmc”, the attribute gpcam.GPOptimizer.mcmc_info is updated and contains convergence and distribution information.

  • pop_size (int, optional) – A number of individuals used for any optimizer with a global component. Default = 20.

  • tolerance (float, optional) – Used as termination criterion for local optimizers. Default = 0.0001.

  • max_iter (int, optional) – Maximum number of iterations for global and local optimizers. Default = 120.

  • local_optimizer (str, optional) – Defining the local optimizer. Default = “L-BFGS-B”, most scipy.opimize.minimize functions are permissible.

  • global_optimizer (str, optional) – Defining the global optimizer. Only applicable to method = hgdl. Default = genetic

  • constraints (tuple of object instances, optional) – Equality and inequality constraints for the optimization. If the optimizer is hgdl see the [hgdl documentation](hgdl.readthedocs.io). If the optimizer is a scipy optimizer, see the scipy documentation.

  • dask_client (distributed.client.Client, optional) – A Dask Distributed Client instance for distributed training if HGDL is used. If None is provided, a new distributed.client.Client instance is constructed.

Returns:

hyperparameters – Returned are the hyperparameters, however, the GP is automatically updated.

Return type:

np.ndarray

train_async(objective_function=None, objective_function_gradient=None, objective_function_hessian=None, hyperparameter_bounds=None, init_hyperparameters=None, max_iter=10000, local_optimizer='L-BFGS-B', global_optimizer='genetic', constraints=(), dask_client=None)

This function asynchronously finds the maximum of the log marginal likelihood and therefore trains the GP. This can be done on a remote cluster/computer by providing a dask client. This function submits the training and returns an object which can be given to gpcam.GPOptimizer.update_hyperparameters(), which will automatically update the GP prior with the new hyperparameters.

Parameters:
  • objective_function (callable, optional) – The function that will be MINIMIZED for training the GP. The form of the function is f(hyperparameters=hps) and returns a scalar. This function can be used to train via non-standard user-defined objectives. The default is the negative log marginal likelihood.

  • objective_function_gradient (callable, optional) – The gradient of the function that will be MINIMIZED for training the GP. The form of the function is f(hyperparameters=hps) and returns a vector of len(hps). This function can be used to train via non-standard user-defined objectives. The default is the gradient of the negative log marginal likelihood.

  • objective_function_hessian (callable, optional) – The hessian of the function that will be MINIMIZED for training the GP. The form of the function is f(hyperparameters=hps) and returns a matrix of shape(len(hps),len(hps)). This function can be used to train via non-standard user-defined objectives. The default is the hessian of the negative log marginal likelihood.

  • hyperparameter_bounds (np.ndarray, optional) – A numpy array of shape (D x 2), defining the bounds for the optimization. The default is an array of bounds for the default kernel D = input_space_dim + 1 with all bounds defined practically as [0.00001, inf]. This choice is only recommended in very basic scenarios.

  • init_hyperparameters (np.ndarray, optional) – Initial hyperparameters used as starting location for all optimizers with local component. The default is a random draw from a uniform distribution within the bounds.

  • max_iter (int, optional) – Maximum number of epochs for HGDL. Default = 10000.

  • local_optimizer (str, optional) – Defining the local optimizer. Default = “L-BFGS-B”, most scipy.opimize.minimize functions are permissible.

  • global_optimizer (str, optional) – Defining the global optimizer. Only applicable to method = ‘hgdl’. Default = genetic

  • constraints (tuple of hgdl.NonLinearConstraint instances, optional) – Equality and inequality constraints for the optimization. See hgdl.

  • dask_client (distributed.client.Client, optional) – A Dask Distributed Client instance for distributed training if HGDL is used. If None is provided, a new distributed.client.Client instance is constructed.

Returns:

opt_obj – Optimization object that can be given to gpcam.GPOptimizer.update_hyperparameters() to update the prior GP.

Return type:

object instance

update_cost_function(measurement_costs)

This function updates the parameters for the user-defined cost function It essentially calls the user-given cost_update_function which should return the new parameters.

Parameters:

measurement_costs (object) – An arbitrary object that describes the costs when moving in the parameter space. It can be arbitrary because the cost function using the parameters and the cost_update_function updating the parameters are both user-defined and this object has to be in accordance with those definitions.

update_hyperparameters(opt_obj)

Function to update the Gaussian Process hyperparameters if an asynchronous training is running.

Parameters:

opt_obj (object instance) – Object created by train_async().

Returns:

hyperparameters

Return type:

np.ndarray