Most voice synthesis programs use training databases to generate transitions between frames (slices of sound). This captures all sorts of prosody information, such as pitch and stress. The approach is generally referred to as HMM (Hidden Markov Model).

Sinsy takes the same sort of HMM as Ivona Reader does. The fact that she's got a heavy accent actually demonstrates the strength of the approach, since it captures the speaker accurately.

But to do this same approach with singing, you would need a huge database to capture all the transition sounds at all the different pitch intervals. So it's not really practical to create a database that covers all the transitions.

Additionally, there are timing constraints in singing - something that vocal synthesis doesn't deal with at all.

So getting synthetic singing to work with "state of the art" vocal synthesis turns out to be problematic, to say the least.


-- David Cuny
My virtual singer development blog

Vocal control, you say. Never heard of it. Is that some kind of ProTools thing?