Executes critic-free reinforcement learning. Generates a cohort of responses, evaluates against the target reward condition, and updates parameters via Z-score normalized advantages with an approximate KL divergence constraint.
Master weights residing in Javascript Float32Arrays are serialized into binary formats. Local DB logic utilizes native ArrayBuffer storage for robustness, strictly eliminating Base64 call stack limits. Momentum buffers from Muon are strategically discarded to halve the storage footprint.