What you have created here is very impressive.
Hi, Gary.
Thanks!
I was interested in Vocaloids a few years ago but the software was very expensive and all of the voicebanks were Japanese so English pronunciation was a problem
Vocaloids tend to have accents, and the resynthesis process only magnifies that. The
Dex and
Daina voicebanks focus on a more American English accent, and I was toying with the idea of buying the full editor, but I wasn't terribly impressed by the demos, and the price of the full
Vocaloid editor always priced me out of the market... even when I
had that much money to spend.
UTAU is cool, especially since you can build your own voicebanks. But working with voices requires a lot of low-level tweaking. One nice thing about
Synthesizer V is that you have sliders for each phoneme at the syllable level. That makes it a lot easier to tune a word, and even work around errors in the voicebank.
You may be aware that
Synthesizer V actually grew from Kanru Hua working on tools for
UTAU. He developed his own resampler (resynthesis engine) for
UTAU called
Moresampler, which is different from most most resamplers available for
UTAU. Most UTAU resamplers use crossfades and standard pitch transposition techniques to reassemble the phonemes from the source audio files.
Moresampler performs spectral analysis on the harmonic and inharmonic audio, as well as the vocal pulse. The resynthesis doesn't need to refer back to the original audio. As a result,
Moresampler is a lot more flexible in resynthesis, both in in changing the pitch and duration. It also doesn't suffer from as many of the sort of artifacts you get from a lot of pitch shift/time stretch algorithms.
Kanru also came up with a his own reclist (phoneme recording list) method which he called "Arpasing", named because he used the
ARPAbet phonemene set instead of
SAMPA phonemes that
Vocaloid and
UTAU tend to use. (The word "Arpasing" confusingly refers to "ARPA" in "ARPABET", not to "parsing").
Because his reclist uses English words, the resulting samples tend to be more realistic than if the phonemes were recorded in isolation. On the other hand,
Synthesizer V sounds a bit more "spoken" than "sung".
Anyway, at some point Kanru rightly decided that it would be better to branch away from
UTAU and create his own application from scratch. If nothing else,
Synthesizer V is a lot simpler to install than
UTAU.