struct DiagGaussianPolicy{Mean, Logstd<:(AbstractArray{T,1} where T)}

DiagGaussianPolicy policy represents a stochastic control policy, represented as a multivariate Gaussian distribution of the form:

\[\pi_{\theta}(a | o) = \mathcal{N}(\mu_{\theta_1}(o), \Sigma_{\theta_2})\]

where $\mu_{\theta_1}$ is a neural network, parameterized by $\theta_1$, that maps an observation to a mean action and $\Sigma_{\theta_2}$ is a diagonal covariance matrix parameterized by $\theta_2$, the diagonal entries of the matrix. Rather than tracking $\Sigma_{\theta_2}$ directly, we track the log standard deviations, which are easier to learn. Note that $\mu_{\theta_1}$ is a state-dependent mean while $\Sigma_{\theta_2}$ is a global covariance.

DiagGaussianPolicy(meanNN, logstd; fixedlogstd)

Construct a DiagGaussianPolicy with a state-dependent mean meanNN and initial log-standard deviation logstd. If fixedlogstd is true, logstd will be treated as a constant. meanNN should be object that is compatible with Flux.jl and have the following signatures:

  • meanNN(obs::AbstractVector) –> action::AbstractVector
  • meanNN(obs::AbstractMatrix) –> action::AbstractMatrix
sample!([rng = GLOBAL_RNG, ]action, policy, feature)

Treating policy as a stochastic policy, sample an action from policy, conditioned on feature, and store it in action.

getaction!(action, policy, feature)

Treating policy as a deterministic policy, compute the mean action of policy, conditioned on feature, and store it in action.

loglikelihood(policy, action, feature)

Return loglikelihood of action conditioned on feature for policy.

loglikelihood(policy, actions, features)

Treating each column of actions and features as a single action/feature, return a vector of the loglikelihoods of actions conditioned on features for policy.