Text Input
G2P Output — Phoneme Sequence
waiting for input...
Mel Spectrogram · 80 bands · 24kHz
FRAMES —
DURATION —
VOCODER NEURALHIFI-GAN
Voice
Synth
★ Studio
Model
Log
Voice Space · 10-dim
measured baritone, warm resonance
Synthesis Parameters
RATE
1.00x
PITCH SHIFT
±0
GL ITERS
24
MVC LAYERS
2
★ Voice Capture
ready — mic not yet requested
Analyze + Train
waiting for recording...
Training adapts voice timbre and speaking tempo to your recording.
The duration model learns your natural phoneme timing — exported in the .pop2 file.
More recordings = better generalisation.
Voice Identity (.pop2)
no voice trained yet
Model Status
Backbone MVC Bidirectional Mamba
Vocoder HiFi-GAN BitLinear
Weights BitLinear ternary
D_model 128
N_layers 2
Voc stages3
N_mels 80
Sample rate24000 Hz
Hop size 256
N_fft 1024
GL fallbackstandby
Pop2 not loaded
postMessage API
// embed in iframe, control from host:
speak · stop · getStatus
getVoiceVec · setVoiceVec
loadPop2 · loadPop2Url
exportPop2 · reset
// example:
iframe.contentWindow
.postMessage({
target:'omnivocal',
action:'speak',
text:'hello'
}, '*');
System Log