Critic-free reinforcement learning. Generates cohort responses, evaluates against reward condition, updates via Z-score normalized advantages with approximate KL divergence constraint. GRPO stability at micro-scale is protective — limited parameters prevent degenerate reward satisfaction.
Master weights residing in Javascript Float32Arrays are serialized into binary formats. Local DB logic utilizes native ArrayBuffer storage, strictly eliminating Base64 call stack limits. Momentum buffers from Muon are discarded to halve the storage footprint. Simulacra: sensorium adapter weights, ROSA states, and self_embed persist alongside core model weights. Export format: .simpip (Simulacra Pip). Not compatible with .piprosa or .evapip.