Model-Predictive Path Integral Control

Implements Model-Predictive Path Integral Control, a stochastic sampling based model predictive control method. For further information, see the following papers:

struct MPPI{DT<:AbstractFloat, nu, Covar<:AbstractArray{DT<:AbstractFloat,2}, Value, Env, Init, Obs, State}
MPPI{DT<:AbstractFloat}(args...; kwargs...) -> MPPI
MPPI(args...; kwargs...) -> MPPI

Construct an instance of MPPI with args and kwargs, where DT <: AbstractFloat is the element type used for pre-allocated buffers, which defaults to Float32.

In the following explanation of the MPPI constructor, we use the following notation:

  • U::Matrix: the canonical control vector $(u_{1}, u_{2}, \dots, u_{H})$, where size(U) == (length(actionspace(env)), H).


  • env_tconstructor: a function with signature env_tconstructor(n) that returns n instances of T, where T <: AbstractEnvironment.
  • H::Integer: Length of sampled trajectories.
  • K::Integer: Number of trajectories to sample.
  • covar::AbstractMatrix: The covariance matrix for the Normal distribution from which control pertubations are sampled from.
  • gamma::Real: Reward discount, applied as gamma^(t - 1) * reward[t].
  • lambda::Real: Temperature parameter for the exponential reweighting of sampled trajectories. In the limit that lambda approaches 0, U is set to the highest reward trajectory. Conversely, as lambda approaches infinity, U is computed as the unweighted-average of the samples trajectories.
  • value: a function mapping observations to scalar rewards, with the signature value(obs::AbstractVector) --> reward::Real
  • initfn!: A function with the signature initfn!(U::Matrix) used for re-initializing U after shifting it. Defaults to setting the last element of U to 0.
getaction!(action, state, m; nthreads)

Starting from the environment's state, perform one step of the MPPI algorithm and store the resulting action in action. The trajectory sampling portion of MPPI is done in parallel using nthreads threads.

reset!(m::LyceumAI.MPPI) -> LyceumAI.MPPI

Resets the canonical control vector to zeros.