Prioritized replay Evolutionary and Swarm Algorithm (PESA)¶

A module for the parallel hybrid PESA algorithm with prioritized experience replay from reinforcement learning. This is the classical PESA that hybridizes PSO, ES, and SA modules within NEORL.

Original paper: Radaideh, M. I., & Shirvan, K. (2022). PESA: Prioritized experience replay for parallel hybrid evolutionary and swarm algorithms-Application to nuclear fuel. Nuclear Engineering and Technology.

https://doi.org/10.1016/j.net.2022.05.001

What can you use?¶

Multi processing: ✔️
Discrete spaces: ✔️
Continuous spaces: ✔️
Mixed Discrete/Continuous spaces: ✔️

Parameters¶

class neorl.hybrid.pesa.PESA(mode, bounds, fit, npop, mu=None, memory_size=None, alpha_init=0.1, alpha_end=1, alpha_backdoor=0.1, Tmax=10000, chi=0.1, cxpb=0.7, mutpb=0.1, c1=2.05, c2=2.05, speed_mech='constric', ncores=1, seed=None)[source]¶

PESA Major Parameters

Parameters

mode – (str) problem type, either “min” for minimization problem or “max” for maximization
bounds – (dict) input parameter type and lower/upper bounds in dictionary form. Example: bounds={'x1': ['int', 1, 4], 'x2': ['float', 0.1, 0.8], 'x3': ['float', 2.2, 6.2]}
fit – (function) the fitness function
npop – (int) total number of individuals in each group. So for ES, PSO, and SA, full population is npop*3.
mu – (int) number of individuals to survive to the next generation. Also, mu equals to the number of individuals to sample from the memory. If None, mu=int(npop/2). So 1/2 of PESA population comes from previous generation, and 1/2 comes from the replay memory (See Notes below for more info)
memory_size – (int) max size of the replay memory (if None, memory_size is built to accommodate all samples during search)
alpha_init – (float) initial value of the prioritized replay coefficient (See Notes below)
alpha_end – (float) final value of the prioritized replay coefficient (See Notes below)
alpha_backdoor – (float) backdoor greedy replay rate/probability to sample from the memory for SA instead of random-walk (See Notes below)

PESA Auxiliary Parameters (for the internal algorithms)

Parameters

cxpb – (float) for ES, population crossover probability between [0,1]
mutpb – (float) for ES, population mutation probability between [0,1]
c1 – (float) for PSO, cognitive speed constant
c2 – (float) for PSO, social speed constant
speed_mech – (str) for PSO, type of speed mechanism for to update particle velocity, choose between constric, timew, globw.
Tmax – (float) for SA, initial/max temperature to start the annealing process
chi – (float) for SA, probability to perturb an attribute during SA annealing (occurs when rand(0,1) < chi).

PESA Misc. Parameters

Parameters

ncores – (int) number of parallel processors
seed – (int) random seed for sampling

evolute(ngen, x0=None, warmup=100, verbose=0)[source]¶

This function evolutes the PESA algorithm for number of generations.

Parameters

ngen – (int) number of generations to evolute
x0 – (list of lists) initial samples to start the replay memory (len(x0) must be equal or more than npop)
warmup – (int) number of random warmup samples to initialize the replay memory and must be equal or more than npop (only used if x0=None)
verbose – (int) print statistics to screen, 0: no print, 1: PESA print, 2: detailed print

Returns

(tuple) (best individual, best fitness, and a list of fitness history)

Example¶

from neorl import PESA

#Define the fitness function
def FIT(individual):
    """Sphere test objective function.
            F(x) = sum_{i=1}^d xi^2
            d=1,2,3,...
            Range: [-100,100]
            Minima: 0
    """
    y=sum(x**2 for x in individual)
    return y 

#Setup the parameter space (d=5)
nx=5
BOUNDS={}
for i in range(1,nx+1):
    BOUNDS['x'+str(i)]=['float', -100, 100]

npop=60
pesa=PESA(mode='min', bounds=BOUNDS, fit=FIT, npop=npop, mu=40, alpha_init=0.2, 
          alpha_end=1.0, alpha_backdoor=0.1, ncores=1)
x0=[[50,50,50,50,50] for i in range(npop)]  #initial guess
x_best, y_best, pesa_hist=pesa.evolute(ngen=50, x0=x0, verbose=1)

Notes¶

PESA is symmetric, meaning population size is equal between PSO, ES, and SA, which is helpful to ensure that all algorithms can update the memory with similar computing time. For example, if the user sets npop=60, then in every generation, the swarm of PSO has 60 particles, ES population has 60 individuals, and SA chain has length of 60.
mu defines the number of individuals from npop to survive to the next generation, and also the number of samples to replay from the memory. This is applicable to PSO and ES alone as SA does not have the concept of population. For example, by setting mu=40 and npop=60, then after every generation, the top 40 individuals in PSO and ES survive. Then the replay memory feeds 40 individuals to ES, which form the new pool of 80 individuals that go through offspring processes that produce again npop=60. For PSO, the replay memory provides 60-40=20 particles to form a new swarm of npop=60.
For complex problems and limited memory, we recommend to set memory_size ~ 5000. When the memory gets full, old samples are overwritten by new ones. Allowing a large memory for complex problems may slow down PESA as handling large memories is more computationally exhaustive. If memory_size = None, the memory size will be set to maximum value of ngen*npop*3.
For parallel computing of PESA, pick ncores divisible by 3 (e.g. 6, 18, 30) to ensure equal computing power across the internal algorithms.
If ncores=1, serial calculation of PESA is executed.
Example on how to assign computing resources for PESA. Lets assume a generation of npop=60 individuals and ncores=30. Then, icores=int(ncores/3) or icores=10 cores are assigned to each algorithm of PSO, ES, and SA. In this case, the ES population has size npop=60 and it is evaluated in parallel with 10 cores. PSO swarm also has npop=60 particles evaluated with 10 cores. SA releases 10 parallel chains, each chain evaluates 6 individuals.
Check the sections of PSO, ES, and SA for notes on the internal algorithms and the auxiliary parameters of PESA.
Start the prioritized replay with a small fraction for alpha_init < 0.1 to increase randomness earlier to improve PESA exploration. Choose a high fraction for alpha_end > 0.9 to increase exploitation by the end of evolution.
The rate of alpha_backdoor replaces the regular random-walk sample of SA with the best individual in the replay memory to keep SA chain up-to-date. For example, alpha_backdoor=0.1 implies that out of 10 individuals in the SA chain, 1 comes from the memory and the other 9 come from classical random-walk. Keep the value of alpha_backdoor small enough, e.g. alpha_backdoor < 0.2, to avoid SA divergence.
Look for an optimal balance between npop and ngen, it is recommended to minimize population size to allow for more generations.
Total number of cost evaluations for PESA is ngen*npop*3 + warmup.