Meta-Algorithms¶
MAML-Algorithm (Interface)¶
-
class
meta_policy_search.meta_algos.
MAMLAlgo
(policy, inner_lr=0.1, meta_batch_size=20, num_inner_grad_steps=1, trainable_inner_step_size=False)[source]¶ Bases:
meta_policy_search.meta_algos.base.MetaAlgo
Provides some implementations shared between all MAML algorithms
Parameters: - policy (Policy) – policy object
- inner_lr (float) – gradient step size used for inner step
- meta_batch_size (int) – number of meta-learning tasks
- num_inner_grad_steps (int) – number of gradient updates taken per maml iteration
- trainable_inner_step_size (boolean) – whether make the inner step size a trainable variable
-
build_graph
()¶ Creates meta-learning computation graph
Pseudocode:
for task in meta_batch_size: make_vars init_dist_info_sym for step in num_grad_steps: for task in meta_batch_size: make_vars update_dist_info_sym set objectives for optimizer
-
make_vars
(prefix='')¶ Parameters: prefix (str) – a string to prepend to the name of each variable Returns: a tuple containing lists of placeholders for each input type and meta task Return type: (tuple)
-
optimize_policy
(all_samples_data, log=True)¶ Performs MAML outer step for each task
Parameters: - all_samples_data (list) – list of lists of lists of samples (each is a dict) split by gradient update and meta task
- log (bool) – whether to log statistics
Returns: None
ProMP-Algorithm¶
-
class
meta_policy_search.meta_algos.
ProMP
(*args, name='ppo_maml', learning_rate=0.001, num_ppo_steps=5, num_minibatches=1, clip_eps=0.2, target_inner_step=0.01, init_inner_kl_penalty=0.01, adaptive_inner_kl_penalty=True, anneal_factor=1.0, **kwargs)[source]¶ Bases:
meta_policy_search.meta_algos.base.MAMLAlgo
ProMP Algorithm
Parameters: - policy (Policy) – policy object
- name (str) – tf variable scope
- learning_rate (float) – learning rate for optimizing the meta-objective
- num_ppo_steps (int) – number of ProMP steps (without re-sampling)
- num_minibatches (int) – number of minibatches for computing the ppo gradient steps
- clip_eps (float) – PPO clip range
- target_inner_step (float) – target inner kl divergence, used only when adaptive_inner_kl_penalty is true
- init_inner_kl_penalty (float) – initial penalty for inner kl
- adaptive_inner_kl_penalty (bool) – whether to used a fixed or adaptive kl penalty on inner gradient update
- anneal_factor (float) – multiplicative factor for annealing clip_eps. If anneal_factor < 1, clip_eps <- anneal_factor * clip_eps at each iteration
- inner_lr (float) – gradient step size used for inner step
- meta_batch_size (int) – number of meta-learning tasks
- num_inner_grad_steps (int) – number of gradient updates taken per maml iteration
- trainable_inner_step_size (boolean) – whether make the inner step size a trainable variable
-
make_vars
(prefix='')¶ Parameters: prefix (str) – a string to prepend to the name of each variable Returns: a tuple containing lists of placeholders for each input type and meta task Return type: (tuple)
TRPO-MAML-Algorithm¶
-
class
meta_policy_search.meta_algos.
TRPOMAML
(*args, name='trpo_maml', step_size=0.01, inner_type='likelihood_ratio', exploration=False, **kwargs)[source]¶ Bases:
meta_policy_search.meta_algos.base.MAMLAlgo
Algorithm for TRPO MAML
Parameters: - policy (Policy) – policy object
- name (str) – tf variable scope
- step_size (int) – trust region size for the meta policy optimization through TPRO
- inner_type (str) – One of ‘log_likelihood’, ‘likelihood_ratio’, ‘dice’, choose which inner update to use
- exploration (bool) – whether to use E-MAML or MAML
- inner_lr (float) – gradient step size used for inner step
- meta_batch_size (int) – number of meta-learning tasks
- num_inner_grad_steps (int) – number of gradient updates taken per maml iteration
- trainable_inner_step_size (boolean) – whether make the inner step size a trainable variable
-
make_vars
(prefix='')¶ Parameters: prefix (str) – a string to prepend to the name of each variable Returns: a tuple containing lists of placeholders for each input type and meta task Return type: (tuple)
VPG-MAML-Algorithm¶
-
class
meta_policy_search.meta_algos.
VPGMAML
(*args, name='vpg_maml', learning_rate=0.001, inner_type='likelihood_ratio', exploration=False, **kwargs)[source]¶ Bases:
meta_policy_search.meta_algos.base.MAMLAlgo
Algorithm for PPO MAML
Parameters: - policy (Policy) – policy object
- name (str) – tf variable scope
- learning_rate (float) – learning rate for the meta-objective
- exploration (bool) – use exploration / pre-update sampling term / E-MAML term
- inner_type (str) – inner optimization objective - either log_likelihood or likelihood_ratio
- inner_lr (float) – gradient step size used for inner step
- meta_batch_size (int) – number of meta-learning tasks
- num_inner_grad_steps (int) – number of gradient updates taken per maml iteration
- trainable_inner_step_size (boolean) – whether make the inner step size a trainable variable
-
make_vars
(prefix='')¶ Parameters: prefix (str) – a string to prepend to the name of each variable Returns: a tuple containing lists of placeholders for each input type and meta task Return type: (tuple)