Critic-free reinforcement learning. Generates cohort responses, evaluates against reward condition, updates via Z-score normalized advantages with approximate KL divergence constraint. GRPO stability at micro-scale is protective — limited parameters prevent degenerate reward satisfaction.
Master weights residing in Javascript Float32Arrays are serialized into binary formats. Local DB logic utilizes native ArrayBuffer storage, strictly eliminating Base64 call stack limits. Momentum buffers from Muon are discarded to halve the storage footprint. EvaROSA v1: sensorium adapter weights and endospace state persist alongside core model.