Prioritized replay Evolutionary and Swarm Algorithm (PESA)¶
A module for the parallel hybrid PESA algorithm with prioritized experience replay from reinforcement learning. This is the classical PESA that hybridizes PSO, ES, and SA modules within NEORL.
Original paper: Radaideh, M. I., & Shirvan, K. (2022). PESA: Prioritized experience replay for parallel hybrid evolutionary and swarm algorithms-Application to nuclear fuel. Nuclear Engineering and Technology.
https://doi.org/10.1016/j.net.2022.05.001
What can you use?¶
Multi processing: ✔️
Discrete spaces: ✔️
Continuous spaces: ✔️
Mixed Discrete/Continuous spaces: ✔️
Parameters¶
-
class
neorl.hybrid.pesa.
PESA
(mode, bounds, fit, npop, mu=None, memory_size=None, alpha_init=0.1, alpha_end=1, alpha_backdoor=0.1, Tmax=10000, chi=0.1, cxpb=0.7, mutpb=0.1, c1=2.05, c2=2.05, speed_mech='constric', ncores=1, seed=None)[source]¶ PESA Major Parameters
- Parameters
mode – (str) problem type, either “min” for minimization problem or “max” for maximization
bounds – (dict) input parameter type and lower/upper bounds in dictionary form. Example:
bounds={'x1': ['int', 1, 4], 'x2': ['float', 0.1, 0.8], 'x3': ['float', 2.2, 6.2]}
fit – (function) the fitness function
npop – (int) total number of individuals in each group. So for ES, PSO, and SA, full population is
npop*3
.mu – (int) number of individuals to survive to the next generation. Also,
mu
equals to the number of individuals to sample from the memory. If None,mu=int(npop/2)
. So 1/2 of PESA population comes from previous generation, and 1/2 comes from the replay memory (See Notes below for more info)memory_size – (int) max size of the replay memory (if None,
memory_size
is built to accommodate all samples during search)alpha_init – (float) initial value of the prioritized replay coefficient (See Notes below)
alpha_end – (float) final value of the prioritized replay coefficient (See Notes below)
alpha_backdoor – (float) backdoor greedy replay rate/probability to sample from the memory for SA instead of random-walk (See Notes below)
PESA Auxiliary Parameters (for the internal algorithms)
- Parameters
cxpb – (float) for ES, population crossover probability between [0,1]
mutpb – (float) for ES, population mutation probability between [0,1]
c1 – (float) for PSO, cognitive speed constant
c2 – (float) for PSO, social speed constant
speed_mech – (str) for PSO, type of speed mechanism for to update particle velocity, choose between
constric
,timew
,globw
.Tmax – (float) for SA, initial/max temperature to start the annealing process
chi – (float) for SA, probability to perturb an attribute during SA annealing (occurs when
rand(0,1) < chi
).
PESA Misc. Parameters
- Parameters
ncores – (int) number of parallel processors
seed – (int) random seed for sampling
-
evolute
(ngen, x0=None, warmup=100, verbose=0)[source]¶ This function evolutes the PESA algorithm for number of generations.
- Parameters
ngen – (int) number of generations to evolute
x0 – (list of lists) initial samples to start the replay memory (
len(x0)
must be equal or more thannpop
)warmup – (int) number of random warmup samples to initialize the replay memory and must be equal or more than
npop
(only used ifx0=None
)verbose – (int) print statistics to screen, 0: no print, 1: PESA print, 2: detailed print
- Returns
(tuple) (best individual, best fitness, and a list of fitness history)
Example¶
from neorl import PESA
#Define the fitness function
def FIT(individual):
"""Sphere test objective function.
F(x) = sum_{i=1}^d xi^2
d=1,2,3,...
Range: [-100,100]
Minima: 0
"""
y=sum(x**2 for x in individual)
return y
#Setup the parameter space (d=5)
nx=5
BOUNDS={}
for i in range(1,nx+1):
BOUNDS['x'+str(i)]=['float', -100, 100]
npop=60
pesa=PESA(mode='min', bounds=BOUNDS, fit=FIT, npop=npop, mu=40, alpha_init=0.2,
alpha_end=1.0, alpha_backdoor=0.1, ncores=1)
x0=[[50,50,50,50,50] for i in range(npop)] #initial guess
x_best, y_best, pesa_hist=pesa.evolute(ngen=50, x0=x0, verbose=1)
Notes¶
PESA is symmetric, meaning population size is equal between PSO, ES, and SA, which is helpful to ensure that all algorithms can update the memory with similar computing time. For example, if the user sets
npop=60
, then in every generation, the swarm of PSO has 60 particles, ES population has 60 individuals, and SA chain has length of 60.mu
defines the number of individuals fromnpop
to survive to the next generation, and also the number of samples to replay from the memory. This is applicable to PSO and ES alone as SA does not have the concept of population. For example, by settingmu=40
andnpop=60
, then after every generation, the top 40 individuals in PSO and ES survive. Then the replay memory feeds 40 individuals to ES, which form the new pool of 80 individuals that go through offspring processes that produce againnpop=60
. For PSO, the replay memory provides60-40=20
particles to form a new swarm ofnpop=60
.For complex problems and limited memory, we recommend to set
memory_size ~ 5000
. When the memory gets full, old samples are overwritten by new ones. Allowing a large memory for complex problems may slow down PESA as handling large memories is more computationally exhaustive. Ifmemory_size = None
, the memory size will be set to maximum value ofngen*npop*3
.For parallel computing of PESA, pick
ncores
divisible by 3 (e.g. 6, 18, 30) to ensure equal computing power across the internal algorithms.If
ncores=1
, serial calculation of PESA is executed.Example on how to assign computing resources for PESA. Lets assume a generation of
npop=60
individuals andncores=30
. Then,icores=int(ncores/3)
oricores=10
cores are assigned to each algorithm of PSO, ES, and SA. In this case, the ES population has sizenpop=60
and it is evaluated in parallel with 10 cores. PSO swarm also hasnpop=60
particles evaluated with 10 cores. SA releases 10 parallel chains, each chain evaluates 6 individuals.Check the sections of PSO, ES, and SA for notes on the internal algorithms and the auxiliary parameters of PESA.
Start the prioritized replay with a small fraction for
alpha_init < 0.1
to increase randomness earlier to improve PESA exploration. Choose a high fraction foralpha_end > 0.9
to increase exploitation by the end of evolution.The rate of
alpha_backdoor
replaces the regular random-walk sample of SA with the best individual in the replay memory to keep SA chain up-to-date. For example,alpha_backdoor=0.1
implies that out of 10 individuals in the SA chain, 1 comes from the memory and the other 9 come from classical random-walk. Keep the value ofalpha_backdoor
small enough, e.g.alpha_backdoor < 0.2
, to avoid SA divergence.Look for an optimal balance between
npop
andngen
, it is recommended to minimize population size to allow for more generations.Total number of cost evaluations for PESA is
ngen*npop*3 + warmup
.