Policies

DiagGaussianPolicy

LyceumAI.DiagGaussianPolicyType
struct DiagGaussianPolicy{Mean, Logstd<:(AbstractArray{T,1} where T)}

DiagGaussianPolicy policy represents a stochastic control policy, represented as a multivariate Gaussian distribution of the form:

\[\pi_{\theta}(a | o) = \mathcal{N}(\mu_{\theta_1}(o), \Sigma_{\theta_2})\]

where $\mu_{\theta_1}$ is a neural network, parameterized by $\theta_1$, that maps an observation to a mean action and $\Sigma_{\theta_2}$ is a diagonal covariance matrix parameterized by $\theta_2$, the diagonal entries of the matrix. Rather than tracking $\Sigma_{\theta_2}$ directly, we track the log standard deviations, which are easier to learn. Note that $\mu_{\theta_1}$ is a state-dependent mean while $\Sigma_{\theta_2}$ is a global covariance.

LyceumAI.DiagGaussianPolicyMethod
DiagGaussianPolicy(meanNN, logstd; fixedlogstd)

Construct a DiagGaussianPolicy with a state-dependent mean meanNN and initial log-standard deviation logstd. If fixedlogstd is true, logstd will be treated as a constant. meanNN should be object that is compatible with Flux.jl and have the following signatures:

  • meanNN(obs::AbstractVector) –> action::AbstractVector
  • meanNN(obs::AbstractMatrix) –> action::AbstractMatrix
LyceumBase.Tools.sample!Method
sample!([rng = GLOBAL_RNG, ]action, policy, feature)

Treating policy as a stochastic policy, sample an action from policy, conditioned on feature, and store it in action.

LyceumBase.getaction!Method
getaction!(action, policy, feature)

Treating policy as a deterministic policy, compute the mean action of policy, conditioned on feature, and store it in action.

LyceumAI.loglikelihoodFunction
loglikelihood(policy, action, feature)

Return loglikelihood of action conditioned on feature for policy.

loglikelihood(policy, actions, features)

Treating each column of actions and features as a single action/feature, return a vector of the loglikelihoods of actions conditioned on features for policy.