# Policies

## DiagGaussianPolicy

`LyceumAI.DiagGaussianPolicy`

— Type`struct DiagGaussianPolicy{Mean, Logstd<:(AbstractArray{T,1} where T)}`

`DiagGaussianPolicy`

policy represents a stochastic control policy, represented as a multivariate Gaussian distribution of the form:

where $\mu_{\theta_1}$ is a neural network, parameterized by $\theta_1$, that maps an observation to a mean action and $\Sigma_{\theta_2}$ is a diagonal covariance matrix parameterized by $\theta_2$, the diagonal entries of the matrix. Rather than tracking $\Sigma_{\theta_2}$ directly, we track the log standard deviations, which are easier to learn. Note that $\mu_{\theta_1}$ is a *state-dependent* mean while $\Sigma_{\theta_2}$ is a *global* covariance.

`LyceumAI.DiagGaussianPolicy`

— Method```
DiagGaussianPolicy(meanNN, logstd; fixedlogstd)
```

Construct a `DiagGaussianPolicy`

with a state-dependent mean `meanNN`

and initial log-standard deviation `logstd`

. If `fixedlogstd`

is true, `logstd`

will be treated as a constant. `meanNN`

should be object that is compatible with Flux.jl and have the following signatures:

`meanNN(obs::AbstractVector)`

–>`action::AbstractVector`

`meanNN(obs::AbstractMatrix)`

–>`action::AbstractMatrix`

`LyceumBase.Tools.sample!`

— Method`sample!([rng = GLOBAL_RNG, ]action, policy, feature)`

Treating `policy`

as a stochastic policy, sample an action from `policy`

, conditioned on `feature`

, and store it in `action`

.

`LyceumBase.getaction!`

— Method```
getaction!(action, policy, feature)
```

Treating `policy`

as a deterministic policy, compute the mean action of `policy`

, conditioned on `feature`

, and store it in `action`

.

`LyceumAI.loglikelihood`

— Function```
loglikelihood(policy, action, feature)
```

Return loglikelihood of `action`

conditioned on `feature`

for `policy`

.

```
loglikelihood(policy, actions, features)
```

Treating each column of `actions`

and `features`

as a single action/feature, return a vector of the loglikelihoods of `actions`

conditioned on `features`

for `policy`

.