3. Bayesian Search¶
A module of Bayesian optimisation search for hyperparameter tuning of NEORL algorithms based upon scikit-optimise
.
Original paper: https://arxiv.org/abs/1012.2599
Bayesian search, in contrast to grid and random searches, keeps track of past evaluation results. Bayesian uses past evaluations to form a probabilistic model mapping hyperparameters to a probability of a score on the objective function (e.g. max/min fitness). Bayesian optimization excels when the objective functions are expensive to evaluate, when we do not have access to derivatives, or when the problem at hand is non-convex.
3.1. What can you use?¶
Multi processing: ✔️ (Multithreading in a single processor)
Discrete/Continuous/Mixed spaces: ✔️
Reinforcement Learning Algorithms: ✔️
Evolutionary Algorithms: ✔️
Hybrid Neuroevolution Algorithms: ✔️
3.2. Parameters¶
-
class
neorl.tune.bayestune.
BAYESTUNE
(param_grid, fit, mode='min', ncases=50, seed=None)[source]¶ A module for Bayesian search for hyperparameter tuning
- Parameters
param_grid – (dict) the type and range of each hyperparameter in a dictionary form (types are
int/discrete
orfloat/continuous
orgrid/categorical
). Example: {‘x1’: [[40, 50, 60, 100], ‘grid’], ‘x2’: [[0.2, 0.8], ‘float’], ‘x3’: [[‘blend’, ‘cx2point’], ‘grid’], ‘x4’: [[20, 80], ‘int’]}fit – (function) the self-defined fitness function that includes the hyperparameters as input and algorithm score as output
mode – (str) problem type, either “min” for minimization problem or “max” for maximization. Default: Bayesian tuner is set to minimize an objective
ncases – (int) number of random hyperparameter cases to generate per core,
ncases >= 11
(see Notes for an important remark)seed – (int) random seed for sampling reproducibility
-
tune
(ncores=1, csvname=None, verbose=True)[source]¶ This function starts the tuning process with specified number of processors
- Parameters
nthreads – (int) number of parallel threads (see the Notes section below for an important note about parallel execution)
csvname – (str) the name of the csv file name to save the tuning results (useful for expensive cases as the csv file is updated directly after the case is done)
verbose – (bool) whether to print updates to the screen or not
3.3. Example¶
from neorl.tune import BAYESTUNE
from neorl import ES
#**********************************************************
# Part I: Original Problem Settings
#**********************************************************
#Define the fitness function (for original optimisation)
def sphere(individual):
y=sum(x**2 for x in individual)
return y
#*************************************************************
# Part II: Define fitness function for hyperparameter tuning
#*************************************************************
def tune_fit(cxpb, mu, alpha, cxmode):
#--setup the parameter space
nx=5
BOUNDS={}
for i in range(1,nx+1):
BOUNDS['x'+str(i)]=['float', -100, 100]
#--setup the ES algorithm
es=ES(mode='min', bounds=BOUNDS, fit=sphere, lambda_=80, mu=mu, mutpb=0.1, alpha=alpha,
cxmode=cxmode, cxpb=cxpb, ncores=1, seed=1)
#--Evolute the ES object and obtains y_best
#--turn off verbose for less algorithm print-out when tuning
x_best, y_best, es_hist=es.evolute(ngen=100, verbose=0)
return y_best #returns the best score
#*************************************************************
# Part III: Tuning
#*************************************************************
#Setup the parameter space
#VERY IMPORTANT: The order of these parameters MUST be similar to their order in tune_fit
#see tune_fit
param_grid={
#def tune_fit(cxpb, mu, alpha, cxmode):
'cxpb': ['float', 0.1, 0.9], #cxpb is first (low=0.1, high=0.8, type=float/continuous)
'mu': ['int', 30, 60], #mu is second (low=30, high=60, type=int/discrete)
'alpha':['grid', [0.1, 0.2, 0.3, 0.4]], #alpha is third (grid with fixed values, type=grid/categorical)
'cxmode':['grid', ['blend', 'cx2point']]} #cxmode is fourth (grid with fixed values, type=grid/categorical)
#setup a bayesian tune object
btune=BAYESTUNE(mode='min', param_grid=param_grid, fit=tune_fit, ncases=30)
#tune the parameters with method .tune
bayesres=btune.tune(ncores=1, csvname='bayestune.csv', verbose=True)
print(bayesres)
btune.plot_results(pngname='bayes_conv')
3.4. Notes¶
We allow a weak parallelization of Bayesian search via multithreading. The user can start independent Bayesian search with different seeds by increasing
ncores
. However, all threads will be executed on a single processor, which will slow down every Bayesian sequence. Therefore, this option is recommended when each hyperparameter case is fast-to-evaluate and does not require intensive CPU power.If the user sets
ncores=4
and setsncases=15
, a total of 60 hyperparameter cases are evaluated, where each thread uses 25% of the CPU power. The extension to multiprocessing/multi-core capability is on track in future.Keep
ncases >= 11
. If ncases < 11, the optimiser resetsncases=11
. It is good to start withncases=30
, check the optimizer convergence, and increase as needed.Relying on
grid/categorical
variables can accelerate the search by a wide margin. Therefore, if the user is aware of certain values of the (int/discrete
) or the (float/continuous
) hyperparameters, it is good to convert them togrid/categorical
.
3.5. Acknowledgment¶
Thanks to our fellows in scikit-optimize, as we used their gp_minimize
implementation to leverage our Bayesian search module in our framework.
Head, Tim, Gilles Louppe MechCoder, and Iaroslav Shcherbatyi. “scikit-optimize/scikit-optimize: v0.7.1”(2020).