Nari Labs: Dia Examples

Comparison between Dia-1.6B (ours), ElevenLabs Studio, and Sesame CSM-1B. Plus fun examples (including audio prompt use).

Standard Usage

Input script

[S1] Dia is an open weights text to dialogue model. 
[S2] You get full control over scripts and voices. 
[S1] Wow. Amazing. (laughs) 
[S2] Try it now on Git hub or Hugging Face.

Note that ElevenLabs and Sesame models do not have the ability to transcribe laughter tags into speech. We replace (laughs) with haha. Also, Dia is not fine-tuned on a specific voice. It will generate random voices unless you add audio prompts, or fix the seed.

Input script

[S1] Hey. how are you doing?  
[S2] Pretty good. Pretty good. What about you? 
[S1] I'm great. So happy to be speaking to you.  
[S2] Me too. This is some cool stuff. Huh?  
[S1] Yeah. I have been reading more about speech generation. 
[S2] Yeah. 
[S1] And it really seems like context is important. 
[S2] Definitely.

Standard Usage

Dia-1.6B (ours)

ElevenLabs Studio

Sesame CSM-1B

Dia-1.6B (ours)

ElevenLabs Studio

Sesame Website Example