Originally Posted By: Gary Weder
What you have created here is very impressive.

Hi, Gary.

Thanks! smile

Quote:
I was interested in Vocaloids a few years ago but the software was very expensive and all of the voicebanks were Japanese so English pronunciation was a problem

Vocaloids tend to have accents, and the resynthesis process only magnifies that. The Dex and Daina voicebanks focus on a more American English accent, and I was toying with the idea of buying the full editor, but I wasn't terribly impressed by the demos, and the price of the full Vocaloid editor always priced me out of the market... even when I had that much money to spend.

UTAU is cool, especially since you can build your own voicebanks. But working with voices requires a lot of low-level tweaking. One nice thing about Synthesizer V is that you have sliders for each phoneme at the syllable level. That makes it a lot easier to tune a word, and even work around errors in the voicebank.

You may be aware that Synthesizer V actually grew from Kanru Hua working on tools for UTAU. He developed his own resampler (resynthesis engine) for UTAU called Moresampler, which is different from most most resamplers available for UTAU. Most UTAU resamplers use crossfades and standard pitch transposition techniques to reassemble the phonemes from the source audio files.

Moresampler performs spectral analysis on the harmonic and inharmonic audio, as well as the vocal pulse. The resynthesis doesn't need to refer back to the original audio. As a result, Moresampler is a lot more flexible in resynthesis, both in in changing the pitch and duration. It also doesn't suffer from as many of the sort of artifacts you get from a lot of pitch shift/time stretch algorithms.

Kanru also came up with a his own reclist (phoneme recording list) method which he called "Arpasing", named because he used the ARPAbet phonemene set instead of SAMPA phonemes that Vocaloid and UTAU tend to use. (The word "Arpasing" confusingly refers to "ARPA" in "ARPABET", not to "parsing").

Because his reclist uses English words, the resulting samples tend to be more realistic than if the phonemes were recorded in isolation. On the other hand, Synthesizer V sounds a bit more "spoken" than "sung".

Anyway, at some point Kanru rightly decided that it would be better to branch away from UTAU and create his own application from scratch. If nothing else, Synthesizer V is a lot simpler to install than UTAU. wink


-- David Cuny
My virtual singer development blog

Vocal control, you say. Never heard of it. Is that some kind of ProTools thing?