3. Bayesian Search¶

A module of Bayesian optimisation search for hyperparameter tuning of NEORL algorithms based upon scikit-optimise.

Original paper: https://arxiv.org/abs/1012.2599

Bayesian search, in contrast to grid and random searches, keeps track of past evaluation results. Bayesian uses past evaluations to form a probabilistic model mapping hyperparameters to a probability of a score on the objective function (e.g. max/min fitness). Bayesian optimization excels when the objective functions are expensive to evaluate, when we do not have access to derivatives, or when the problem at hand is non-convex.

3.1. What can you use?¶

Multi processing: ✔️ (Multithreading in a single processor)
Discrete/Continuous/Mixed spaces: ✔️
Reinforcement Learning Algorithms: ✔️
Evolutionary Algorithms: ✔️
Hybrid Neuroevolution Algorithms: ✔️

3.2. Parameters¶

class neorl.tune.bayestune.BAYESTUNE(param_grid, fit, mode='min', ncases=50, seed=None)[source]¶

A module for Bayesian search for hyperparameter tuning

Parameters

param_grid – (dict) the type and range of each hyperparameter in a dictionary form (types are int/discrete or float/continuous or grid/categorical). Example: {‘x1’: [[40, 50, 60, 100], ‘grid’], ‘x2’: [[0.2, 0.8], ‘float’], ‘x3’: [[‘blend’, ‘cx2point’], ‘grid’], ‘x4’: [[20, 80], ‘int’]}
fit – (function) the self-defined fitness function that includes the hyperparameters as input and algorithm score as output
mode – (str) problem type, either “min” for minimization problem or “max” for maximization. Default: Bayesian tuner is set to minimize an objective
ncases – (int) number of random hyperparameter cases to generate per core, ncases >= 11 (see Notes for an important remark)
seed – (int) random seed for sampling reproducibility

tune(ncores=1, csvname=None, verbose=True)[source]¶

This function starts the tuning process with specified number of processors

Parameters

nthreads – (int) number of parallel threads (see the Notes section below for an important note about parallel execution)
csvname – (str) the name of the csv file name to save the tuning results (useful for expensive cases as the csv file is updated directly after the case is done)
verbose – (bool) whether to print updates to the screen or not

3.3. Example¶

from neorl.tune import BAYESTUNE
from neorl import ES

#**********************************************************
# Part I: Original Problem Settings
#**********************************************************

#Define the fitness function (for original optimisation)
def sphere(individual):
    y=sum(x**2 for x in individual)
    return y

#*************************************************************
# Part II: Define fitness function for hyperparameter tuning
#*************************************************************
def tune_fit(cxpb, mu, alpha, cxmode):

    #--setup the parameter space
    nx=5
    BOUNDS={}
    for i in range(1,nx+1):
            BOUNDS['x'+str(i)]=['float', -100, 100]

    #--setup the ES algorithm
    es=ES(mode='min', bounds=BOUNDS, fit=sphere, lambda_=80, mu=mu, mutpb=0.1, alpha=alpha,
             cxmode=cxmode, cxpb=cxpb, ncores=1, seed=1)

    #--Evolute the ES object and obtains y_best
    #--turn off verbose for less algorithm print-out when tuning
    x_best, y_best, es_hist=es.evolute(ngen=100, verbose=0)

    return y_best #returns the best score

#*************************************************************
# Part III: Tuning
#*************************************************************
#Setup the parameter space
#VERY IMPORTANT: The order of these parameters MUST be similar to their order in tune_fit
#see tune_fit
param_grid={
#def tune_fit(cxpb, mu, alpha, cxmode):
'cxpb': ['float', 0.1, 0.9],             #cxpb is first (low=0.1, high=0.8, type=float/continuous)
'mu':   ['int', 30, 60],                 #mu is second (low=30, high=60, type=int/discrete)
'alpha':['grid', [0.1, 0.2, 0.3, 0.4]],    #alpha is third (grid with fixed values, type=grid/categorical)
'cxmode':['grid', ['blend', 'cx2point']]}  #cxmode is fourth (grid with fixed values, type=grid/categorical)

#setup a bayesian tune object
btune=BAYESTUNE(mode='min', param_grid=param_grid, fit=tune_fit, ncases=30)
#tune the parameters with method .tune
bayesres=btune.tune(ncores=1, csvname='bayestune.csv', verbose=True)
print(bayesres)
btune.plot_results(pngname='bayes_conv')

3.4. Notes¶

We allow a weak parallelization of Bayesian search via multithreading. The user can start independent Bayesian search with different seeds by increasing ncores. However, all threads will be executed on a single processor, which will slow down every Bayesian sequence. Therefore, this option is recommended when each hyperparameter case is fast-to-evaluate and does not require intensive CPU power.
If the user sets ncores=4 and sets ncases=15, a total of 60 hyperparameter cases are evaluated, where each thread uses 25% of the CPU power. The extension to multiprocessing/multi-core capability is on track in future.
Keep ncases >= 11. If ncases < 11, the optimiser resets ncases=11. It is good to start with ncases=30, check the optimizer convergence, and increase as needed.
Relying on grid/categorical variables can accelerate the search by a wide margin. Therefore, if the user is aware of certain values of the (int/discrete) or the (float/continuous) hyperparameters, it is good to convert them to grid/categorical.

3.5. Acknowledgment¶

Thanks to our fellows in scikit-optimize, as we used their gp_minimize implementation to leverage our Bayesian search module in our framework.

Head, Tim, Gilles Louppe MechCoder, and Iaroslav Shcherbatyi. “scikit-optimize/scikit-optimize: v0.7.1”(2020).