When BiaB generates a MusicXML file for Sinsy, it creates the note and rest durations the same as the BiaB playback. And by default, BiaB generates rests after every non-tied note.

After. Every. Single. Note.

This is wrong for singing synthesis for a number of reasons.

Consider the case of a hyphenated word, such as "HAP-PY". Instead of writing:

HAP- PY

BiaB instead writes:

HAP- [rest] PY

This is obviously wrong. But even putting a rest after each word is wrong:

THIS [rest] AND [rest] THAT

This is partly because there's a lot of co-articulation in singing. If there are rests after every word, it's not going to sound right.

But there's a more subtle problem here. A note doesn't indicate where the syllable starts, but where the vowel of that syllable starts. That means if you have something like this:

[rest] THIS AND THAT

The consonants actually fall on the prior note's timespan:

[rest]+TH IS AND+TH AT

This works great, even if the prior music event was a rest (like it is above). But in order to fit the consonants onto the prior note's timespan, the prior note (or rest) needs to be shortened. With consonants, this is done by shortening the duration of the vowel.

But what if the prior event was a very, very short rest? Whatever doesn't fit is going to be truncated:

[rest]+TH IS [rest] AND (truncated TH) AT

So for the purposes of generating Music XML for vocal synthesis programs to process, BiaB should export the notation as it appears, not as it is played.


-- David Cuny
My virtual singer development blog

Vocal control, you say. Never heard of it. Is that some kind of ProTools thing?