# Model-Predictive Path Integral Control

Implements Model-Predictive Path Integral Control, a stochastic sampling based model predictive control method. For further information, see the following papers:

- Information Theoretic MPC for Model-Based Reinforcement Learning
- Aggressive Driving with Model Predictive Path Integral Control

`LyceumAI.MPPI`

— Type`struct MPPI{DT<:AbstractFloat, nu, Covar<:AbstractArray{DT<:AbstractFloat,2}, Value, Env, Init, Obs, State}`

```
MPPI{DT<:AbstractFloat}(args...; kwargs...) -> MPPI
MPPI(args...; kwargs...) -> MPPI
```

Construct an instance of `MPPI`

with `args`

and `kwargs`

, where `DT <: AbstractFloat`

is the element type used for pre-allocated buffers, which defaults to Float32.

In the following explanation of the `MPPI`

constructor, we use the following notation:

`U::Matrix`

: the canonical control vector $(u_{1}, u_{2}, \dots, u_{H})$, where`size(U) == (length(actionspace(env)), H)`

.

**Keywords**

`env_tconstructor`

: a function with signature`env_tconstructor(n)`

that returns`n`

instances of`T`

, where`T <: AbstractEnvironment`

.`H::Integer`

: Length of sampled trajectories.`K::Integer`

: Number of trajectories to sample.`covar::AbstractMatrix`

: The covariance matrix for the Normal distribution from which control pertubations are sampled from.`gamma::Real`

: Reward discount, applied as`gamma^(t - 1) * reward[t]`

.`lambda::Real`

: Temperature parameter for the exponential reweighting of sampled trajectories. In the limit that lambda approaches 0,`U`

is set to the highest reward trajectory. Conversely, as`lambda`

approaches infinity,`U`

is computed as the unweighted-average of the samples trajectories.`value`

: a function mapping observations to scalar rewards, with the signature`value(obs::AbstractVector) --> reward::Real`

`initfn!`

: A function with the signature`initfn!(U::Matrix)`

used for re-initializing`U`

after shifting it. Defaults to setting the last element of`U`

to 0.

`LyceumBase.getaction!`

— Method```
getaction!(action, state, m; nthreads)
```

Starting from the environment's `state`

, perform one step of the MPPI algorithm and store the resulting action in `action`

. The trajectory sampling portion of MPPI is done in parallel using `nthreads`

threads.

`LyceumBase.reset!`

— Method```
reset!(m::LyceumAI.MPPI) -> LyceumAI.MPPI
```

Resets the canonical control vector to zeros.