PG Music Home
Posted By: dcuny Free web-based synthetic singer - 03/06/13 08:55 AM
I ran across Sinsy the other day, and thought it might be a good fit for some forum members who are unable to sing. It's a synthetic singer capable of doing a fairly good job with English.

There are a couple features of Sinsy that make it stand out from other programs. Unlike Vocaloid, it's free. Unlike Utau, it's usable without having to deal with Japanese. It takes MusicXML files as input, and pays attention to various musical markings. It has a fair bit of musical intelligence, and does a good job with putting phrases together in a very musical way.

On the downside, it's got a pretty distinct Japanese accent. It would be nice if they modeled it using a native English speaker, but can you expect for free?

I put together a demo using BiaB: Words Like Smoke

And because I can't stop leave well enough alone, here's a remix: Words Like Smoke (Mix 2)

The music (including the melody) was generated by BiaB using the MIDI PNOMOOD1 style with a couple of added RealTrack guitars, although I tweaked the register of the melody in a couple places. I exported the MIDI file from BiaB and edited the lyrics in Finale NotePad. Sinsy wasn't happy with getting multiple staffs, and I couldn't figure how to remove staffs in NotePad, so I edited it in Notion and did a bit more tweaking of the lyrics.

If BiaB exported MusicXML, it would have been a lot simpler!

The remix was done in Reaper - I started out fixing the guitar strums (they needed to be moved forward) and ended up messing around with the vocal track and Reaper's pitch correction tool. Fun stuff.

Anyway, breaths are automatically added after long phrases - a nice touch. Vibrato is added automatically on long notes, and I'm impressed how well it handled tied notes on the final cadence.

It also works with MuseScore - here's Ave Maria done in MuseScore.

Actually, with the "ze" instead of "the", Sinsy sounds a bit like Natasha Fatale.
Posted By: Ryszard Re: Free web-based synthetic singer - 03/06/13 10:23 AM
That is astonishing--and good. The expressiveness from the Schubert score is especially (ahem) noteworthy. It is unsurprising that it has a bit of an accent, given that it was taking its cues from Kanji (simplified Japanese ideograms). It is hard to tell from your short sample whether it does better with English text. Even the slightly synthetic quality, something like a cross between a human voice and a Theremin, is somewhat appealing, at least to this electronic composer. Excellent work on your part, too. Thanks for sharing.

Richard
Posted By: Noel96 Re: Free web-based synthetic singer - 03/06/13 11:55 AM
Hi David,

Thank you for passing on this information. I'm astounded! It makes me wonder where music will be in 5 years time!

Regards,
Noel
Posted By: Westside Steve Re: Free web-based synthetic singer - 03/06/13 11:57 AM
http://m.youtube.com/#/channel/UC4PQ5V_6...nLsbgdtR1NS3TjQ
Posted By: MountainSide Re: Free web-based synthetic singer - 03/06/13 01:07 PM
Wow...that's not bad...perhaps could be better used for background vocals huh? Does it also have male vocals? Can it handle multiple vocal parts for harmony and choir?
Posted By: Mac Re: Free web-based synthetic singer - 03/06/13 02:01 PM
Sounds artificial, artifacts and all that.

Bears watching, though, to see if improvements are forthcoming.


--Mac
Posted By: filkertom Re: Free web-based synthetic singer - 03/06/13 02:42 PM
Huh. Okay, that Ave Maria was rather frickin' good.
Posted By: Don Gaynor Re: Free web-based synthetic singer - 03/06/13 02:57 PM
Mac, the Ave Maria link convinced me. Imagine composing full choral arrangements in the very near future using BIAB and nothing else. Its uncharacteristic of you to poo-poo technological advancements no matter how infantile the current state-of-the-art. Have you had your coffee yet? At least my SLP (Speech Language Pathologist) was fascinated and is sharing this link with her SLP friends and I only imagine that she has access to all the latest advancements in synthesized speech. Have a cupa and revisit the link, my friend.

https://www.youtube.com/watch?v=u7K0-ttUBng

It has fewer phonyms (sp?) than the sample of Beatles' Yesterday (on same page) but the advancements in technology in just three years is phenomenal.
Posted By: Ryszard Re: Free web-based synthetic singer - 03/06/13 03:01 PM
Quote:

. . . it has fewer phonyms (sp?) than the sample of Beatles' Yesterday . . .




Phonemes (said the professional linguist).

Richard
Posted By: PgFantastic Re: Free web-based synthetic singer - 03/06/13 05:56 PM
Makes me want to take singing lessons!
Posted By: dcuny Re: Free web-based synthetic singer - 03/06/13 06:14 PM
As Richard pointed out, a larger phoneme set would make this program phenomenal for English speakers. But it's "good enough" right now, and BIAB user can start working with this program right away.

Having MusicXML export (of the vocal track only) would be ideal, and I'm hoping MusicXML export gets added to BiaB in the near future.

In lieu of that, if you haven't got a music notation program and don't want to download any software, Noteflight is a web-based program that will import MIDI and export MusicXML. So you could get a free account, upload the MIDI into Noteflight, add lyrics, and export MusicXML into Sinsy.

Most synthetic singer programs use a piano roll sort of interface. Because Sinsy generates good performance by default from the score means that it's less cumbersome than other programs. Plus, most of this software needs additional drivers for Japanese language support, even if you're using in English.

There are a number of reasons that this software has taken off in Japan - the limited phoneme set certainly helps. Vocaloid was initially the leading software for synthetic singing, but the free Utau program has started to surpass it. An interesting feature of Utau is that users are able to create their own voice banks instead of being locked down to using commercially released voices.

Instead of just recording single phonemes, voice banks consist of recordings of consonants and vowels (CV) - "sa", "so", "see", and so on. More sophisticated voice banks have more complex combinations - VCV and CVVC. So there are very smooth transitions between phonemes.

Unfortunately, creating equivalent voice banks for English is a much larger undertaking, because there are a lot more combinations. But it's clearly doable.

In any event, it plays nice with BiaB, creates musical results and is free. That's very cool.

Don, interested in using it for any songs?
Posted By: PeterGannon Re: Free web-based synthetic singer - 03/06/13 07:37 PM
Very cool! Thanks for pointing it out David.
Posted By: Jim Fogle Re: Free web-based synthetic singer - 03/06/13 11:51 PM
I hear voice synthesis daily listening to NOAA weather radio. An article about the evolution of the voices used is here: http://www.nws.noaa.gov/nwr/newvoice.htm
Posted By: MountainSide Re: Free web-based synthetic singer - 03/07/13 04:45 PM
Hmmmm....apparently Yamaha is behind Vocaloid and a new English version, albeit by a Japanese vocalist, is now available: http://www.vocaloid.com/en/ and here: http://www.ssw.co.jp/en/products/vocal3/megpoid/index.html .

Wouldn't it be great if PG partnered with these guys to develop an US-based linguistic algorithm to create the ultimate Band in a Box!
Posted By: Don Gaynor Re: Free web-based synthetic singer - 03/07/13 06:48 PM
Quote:

Don, interested in using it for any songs?



David, not at the moment, I'm steeped in Songwriting Class, but afterwards, I'll prove Mac true: I'll be "full of it!"

I think we've gotten the good Doctor's attention and that's what we had hoped to accomplish. "Take it, Peter!"

I am so chuffed by all you've done in researching this subject, David. You've acted "above and beyond the call..." Thank you so sincerely.

With Dr Peter carrying the puck to the goal, I'll rest comfortably now.

@David I'll take a look at my current compositions but, unfortunately, I routinely delete my .sgu and .mgu files after rendering to .wav or .mp3. Color me stupid, but please try to stay within the lines.
Posted By: PeterGannon Re: Free web-based synthetic singer - 03/07/13 06:54 PM
Hi David,

What controls do you need for a note, other than the time, duration, note number, and the lyric
For example, at bar 1, beat 1, you might want it to say "The" and last for one beat/

How many other parameters can you enter, for example, is there a strength setting (and is it 0-9?) and vibrato information?

Can you be more precise for the timing, for example, to start at bar 1, beat 1, tick 23 (out of 120 PPQ), and last for 87 ticks (120 PPQ).
Posted By: dcuny Re: Free web-based synthetic singer - 03/07/13 06:59 PM
There's no doubt that Vocaloid can be quite good. Here's a comparison of all three programs doing "Yesterday":

Yesterday (Vocaloid, "Oliver" voice)
Yesterday (Sinsy)
Yesterday (UTAU, "Camila Melodia" voice)

Vocaloid is a commercial product. The editor runs around $100US, and doesn't include any voice libraries. Voice libraries run about $130US. There are a number of characters that target English, although they all seem to have accents of some sort.

The results can be very good, but it requires spending money and learning a new software program.


Jinriki Vocaloid ("manual Vocaloid") consists of manually cutting and pasting together phonemes of a singer, and creating a new song. (It helps when the language has a limited phoneme set).

UTAU is a free program that started out automating that manual process, and eventually grew into a much more powerful program. Voice banks are created by users, and quality varies dramatically. There are some UTAUoids that rival the quality commercial Vocaloid voice banks.

But... Getting good English results from UTAU can be a challenge, because it requires running a Japanese program with limited translation, and finding a good English voice bank.


Sinsy apparently came out of a research project. I wouldn't be surprised to find that it becomes commercial at some point. There is currently only one English voice for it, and it appears to be built from a Japanese voice, so it's got a pretty strong accent.

Using Sinsy only requires uploading a MusicXML file of the vocals, and just about every music notation program (except for BiaB, hint, hint) does that. It's really easy.


There are other options. For example, MelodyAssistant has a VirtualSinger. It sounds like they're using the free Festival voice synthesizer. Listening to the VirtualSinger demos gives a good comparison to what's changed in the technology.


I think this technology is still very much in development, so adding it to BiaB is an iffy proposition. But adding features (like MusicXML) that make interfacing with tools like Sinsy makes a lot of sense.
Posted By: Don Gaynor Re: Free web-based synthetic singer - 03/07/13 07:34 PM
I communicate with the R&D folks at Dynavox and, just yesterday, sent them the link to Ave Maria but haven't heard back yet.

In all fairness to Dynavox, their target market is Speech Synthesis, not singing/music. With no one to keep a fire under them and cattle prod them frequently, the project will remain on the proverbial back burner indefinitely. Therefore, I need someone who can get the project moved OVER the burner. I can provide names and email addresses if anyone wants to accept the challenge.

David? You get things accomplished. Would you consider accepting the task?

Thanks.
Posted By: dcuny Re: Free web-based synthetic singer - 03/07/13 07:34 PM
Quote:

What controls do you need for a note, other than the time, duration, note number, and the lyric



It depends which program you're talking about.

Sinsy takes well-formed MusicXML. There are a lot of elements in MusicXML that aren't required, and it looks like it'll accept a pretty minimal MusicXML file. As I mentioned, just about any music notation will generate MusicXML, and while it's a bit verbose, it's not that hard to generate.

Off tangent for a moment...

I haven't got a Vocaloid, but I know that it's got it's own VSQ file format. It imports MIDI data, and I suspect that's how most people interface with it.

I won't even touch UTAU... There's way too much fiddling to get the Japanese localization working on my machine, and it's only half-translated anyway.

The free Festival system had an old XML file format that looked a lot like what you're describing - a note list of phonemes with pitch and timing. You can see an example of it here. I haven't kept up with Festival, and the document is pretty old (in computer years) - back in 2002.

End of tangent

Quote:

How many other parameters can you enter, for example, is there a strength setting (and is it 0-9?) and vibrato information?



The Sinsy documentation (on the main page) says:

The following musical symbols are supported: tie, slur, staccato, accent, dynamics, crescendo, decrescendo, breath mark.

Quote:

Can you be more precise for the timing, for example, to start at bar 1, beat 1, tick 23 (out of 120 PPQ), and last for 87 ticks (120 PPQ).



Yes, MusicXML will let you get down to that resolution. There are two portions to MusicXML: the displayed notation, and the actual performance. Like MIDI, you specify you base tick value, and you can give the timing relative to that tick. Here's an example (sorry, but the indentation is lost):
Quote:

<measure number="6">
<attributes/>
<note>
<pitch>
<step>B</step>
<octave>5</octave>
</pitch>
<duration>192</duration>
<voice>1</voice>
<type>half</type>
<dot/>
<lyric number="1">
<syllabic>single</syllabic>
<text>this</text>
</lyric>
</note>
</measure>



Really, the simplest thing to do is export a MIDI melody from BiaB, import it into just about any music notation program, add lyrics and export the MusicXML.

While Sinsy is really cool in that it automatically does this stuff, I don't know what their licence is, and I don't know that there's any guarantee that the web site will stay up, and how accessible it will be.
Posted By: dcuny Re: Free web-based synthetic singer - 03/07/13 08:20 PM
Quote:

In all fairness to Dynavox, their target market is Speech Synthesis, not singing/music.



I think this is a key point. Music requires a lot of things that "ordinary" speech synthesis doesn't. Here is a paper that does a good job explaining it.

The frequency envelope of the voice needs to take into account a number of additional factors:
  • Portamento: Notes that belong to the same word or phrase should be smoothly connected, instead of jumping from note to note.
  • Preparation: Before moving to the next note, the pitch may move in the opposite direction first in preparation of the change.
  • Overshoot: Before hitting the target pitch, the pitch may overshoot the target.
  • Vibrato: Sustained notes will typically have an added vibrato.
  • Fluctuation: Holding the pitch perfectly is unnatural, so some low-level fluctuation needs to be added.
All of these are easily observed when looking at the frequency match line from a pitch correction program. (I left off "undershoot" and "scooping", which I see on my own vocal way too much!)

The paper cited above looks like it's got enough information to implement these features. I'd want logic to prevent vibrato on notes less than a particular duration.

There's also the question of how well their voices will translate to singing. I assume the DynaVox model automatically handles formant preservation since they're synthesizing the voice in the first place. The paper cites the addition of a "singing formant" at about 3kHa, along with amplitude modulation based on vibrato (volume changes along with the vibrato).

Quote:

With no one to keep a fire under them and cattle prod them frequently, the project will remain on the proverbial back burner indefinitely. Therefore, I need someone who can get the project moved OVER the burner. I can provide names and email addresses if anyone wants to accept the challenge.



If the DynaVox people are truly interesting in this, I could prod them.

I don't know how difficult this would be to add to their product, but the basic ideas are pretty straight forward. Basically, it's a matter of creating a dynamic frequency envelope. That'll get you a long way to a more realistic singing voice.

There was a project like this for the free Festival system, but it seems to be mostly dead links now. While Festival is legible, it's not really that pleasant to listen to.
Posted By: Don Gaynor Re: Free web-based synthetic singer - 03/07/13 09:18 PM
Friends and friendesses, Dynavox took the bait, Their Tech Services Manager asked for full contact information so it could be forwarded to the Engineers in R&D and I've accepted David's kind offer to "prod' them.

I hope that adapting this technology in the relatively few Dynavox devices out in the marketplace does not dampen PG Music's zeal for the concept. However, all Dynavox users would certainly be prime sales prospects for BIAB.
Posted By: dcuny Re: Free web-based synthetic singer - 03/07/13 10:40 PM
If they've got low-level control of the synthesized voice frequency, it shouldn't be terribly difficult for Dynavox to create a musical pitch contour to drive the voice.

Sinsy appears to be owned by the HTS (HMM-based Speech Synthesis System) working group. It looks like they license their tools under a Modified BSD License, which is very cool.

There are a number of interesting text-to-speech tools that use HMMs (Hidden Markov Models), but singing creates an additional complexity of creating an adequately large corpus of training data. The only paper I've been able to find on Sinsy is 99% Japanese, and Google's translation is a bit of a mess, but it notes because parts of Sinsy use the HTK license, commercial use is prohibited. So I guess we don't need to worry about it being bought out by a company.

That also explains why the paper I referenced focused on converting speech into singing.
Posted By: boehm Re: Free web-based synthetic singer - 03/08/13 01:52 AM
Hi David,

what a great impulse!
Now I've tasted blood.
I will pursuit that matter.
A vast field for experiments.
Your "words like smoke" didn't sound so bad. (especially mix 2).
Thank you very much for the input.

Guenter
Posted By: klkl Re: Free web-based synthetic singer - 11/27/14 01:20 PM
Here is a Christmas song in English that makes use of Sinsy, with video:

http://youtu.be/FezfuU6n_Sw

Not as good as the Ave Maria, I admit. But still interesting.
© PG Music Forums