RL-informed Differential Evolution (ACKTR-DE)

The Actor Critic using Kronecker-Factored Trust Region (ACKTR) algorithm starts the search to collect some individuals given a fitness function through a RL environment. In the second step, the best ACKTR individuals are used to guide differential evolution (DE), where RL individuals are randomly introduced into the DE population to enrich their diversity by replacing the worst DE individuals. The user first runs ACKTR search followed by DE, the best results of both stages are reported to the user.

Original papers:

  • Radaideh, M. I., & Shirvan, K. (2021). Rule-based reinforcement learning methodology to inform evolutionary algorithms for constrained optimization of engineering applications. Knowledge-Based Systems, 217, 106836.

What can you use?

  • Multi processing: ✔️

  • Discrete spaces: ✔️

  • Continuous spaces: ✔️

  • Mixed Discrete/Continuous spaces: ✔️

Parameters

class neorl.hybrid.ackde.ACKDE(mode, fit, env, bounds, npop=60, npop_rl=6, init_pop_rl=True, hyperparam={}, seed=None)[source]

A ACKTR-informed DE Neuroevolution module

Parameters
  • mode – (str) problem type, either min for minimization problem or max for maximization

  • fit – (function) the fitness function to be used with DE

  • env – (NEORL environment or Gym environment) The environment to learn with ACKTR, either use NEORL method CreateEnvironment (see below) or construct your custom Gym environment.

  • bounds – (dict) input parameter type and lower/upper bounds in dictionary form. Example: bounds={'x1': ['int', 1, 4], 'x2': ['float', 0.1, 0.8], 'x3': ['float', 2.2, 6.2]}

  • npop – (int): population size of DE

  • npop_rl – (int): number of RL/ACKTR individuals to use in DE population (npop_rl < npop)

  • init_pop_rl – (bool) flag to initialize DE population with ACKTR individuals

  • hyperparam – (dict) dictionary of DE hyperparameters (F, CR) and ACKTR hyperparameters (n_steps, gamma, learning_rate, ent_coef, vf_coef, vf_fisher_coef, kfac_clip, max_grad_norm, lr_schedule)

  • seed – (int) random seed for sampling

evolute(ngen, ncores=1, verbose=False)[source]

This function evolutes the DE algorithm for number of generations with guidance from RL individuals.

Parameters
  • ngen – (int) number of generations to evolute

  • ncores – (int) number of parallel processors to use with DE

  • verbose – (bool) print statistics to screen

Returns

(tuple) (best individual, best fitness, and a list of fitness history)

learn(total_timesteps, rl_filter=100, verbose=False)[source]

This function starts the learning of ACKTR algorithm for number of timesteps to create individuals for evolutionary search

Parameters
  • total_timesteps – (int) number of timesteps to run

  • rl_filter – (int) number of top individuals to keep from the full RL search

  • verbose – (bool) print statistics to screen

Returns

(dataframe) dataframe of individuals/fitness sorted from best to worst

class neorl.rl.make_env.CreateEnvironment(method, fit, bounds, ncores=1, mode='max', episode_length=50)[source]

A module to construct a fitness environment for certain algorithms that follow reinforcement learning approach of optimization

Parameters
  • method – (str) the supported algorithms, choose either: dqn, ppo, acktr, acer, a2c.

  • fit – (function) the fitness function

  • bounds – (dict) input parameter type and lower/upper bounds in dictionary form. Example: bounds={'x1': ['int', 1, 4], 'x2': ['float', 0.1, 0.8], 'x3': ['float', 2.2, 6.2]}

  • ncores – (int) number of parallel processors

  • mode – (str) problem type, either min for minimization problem or max for maximization (RL is default to max)

  • episode_length – (int): number of individuals to evaluate before resetting the environment to random initial guess.

Example

Train a ACKTR-DE agent to optimize the 5-D sphere function

from neorl import ACKDE
from neorl import CreateEnvironment

def Sphere(individual):
    """Sphere test objective function.
            F(x) = sum_{i=1}^d xi^2
            d=1,2,3,...
            Range: [-100,100]
            Minima: 0
    """
    y=sum(x**2 for x in individual)
    return y


#Setup the parameter space (d=5)
nx=5
BOUNDS={}
for i in range(1,nx+1):
    BOUNDS['x'+str(i)]=['float', -100, 100]

if __name__=='__main__':  #use this block for parallel ACKTR!
    #create an enviroment class for RL/ACKTR
    env=CreateEnvironment(method='acktr', fit=Sphere, ncores=1,  
                          bounds=BOUNDS, mode='min', episode_length=50)
    
    #change hyperparameters of ACKTR/DE if you like (defaults should be good to start with)
    h={'F': 0.5,
       'CR': 0.3,
       'n_steps': 20,
       'learning_rate': 0.001}
    
    #Important: `mode` in CreateEnvironment and `mode` in ACKDE must be consistent
    #fit is needed to be passed again for DE, must be same as the one used in env
    ackde=ACKDE(mode='min', fit=Sphere, npop=60,
                env=env, npop_rl=6, init_pop_rl=False, 
                bounds=BOUNDS, hyperparam=h, seed=1)
    #first run RL for some timesteps
    rl=ackde.learn(total_timesteps=2000, verbose=True)
    #second run DE, which will use RL data for guidance
    ackde_x, ackde_y, ackde_hist=ackde.evolute(ngen=100, ncores=1, verbose=True) #ncores for DE