AI Girlfriend

Logic's arrange window showing the tracks for "AI Girlfriend"

What if you asked an “AI girlfriend” out and she said no?

This song was fun to make. I’d had the basic pattern for this track for a year or so, and I decided to return to it and extend it a bit, having a go at using a local LLM to help out with suggestions for chord progressions. I used the Llama 3.1 Nemotron Instruct HF 70B Q2_K model running in LM Studio. This is quite a chunky model, weighing in at 29Gb, but it fits ok in my 64G Mac Studio. It produces pretty good quality answers, but it is quite slow, and is the first thing I’ve ever run that has caused my Mac’s fans to come on – to start with I couldn’t figure out what the noise was; it sounded like distant plumbing!

The track was shaping up nicely, but I needed a theme. This is the reverse of how I usually write songs: I usually start out with how I want the song to feel, then what it’s about, then I have to come up with a tune. I’m not sure what it was, but I had a thought that what if you asked an “AI girlfriend” out, and she said no? It’s a weird situation, so I thought I’d write about it from the AI’s perspective. There is a hint of a feminist agenda here (go team Harris/Walz!), though not to the same degree as Uncomfortable. This is very fertile ground for concepts, rhymes, and humour, so it was really quite quick to write, though not at all linear. My favourite bit is the lines “I’m some kind of dream come true, but that won’t make me fall in love with you”, and I love the unconventional use of “I’m not that kind of girl”.

While the basic song was complete, it was all very synth-pop-ish and samey, so I wanted to add a bit of contrast. I made the breakdown in the second verse, leading into the sparse, but very rich acoustic guitar bridge, giving the vocals lots of space.

As usual, but particularly appropriate in this song, I used my usual virtual vocalist synthesizer, but this time using the “Mia” voice database. This voice is free, and not nearly as high-quality or as convincing as “Solaria” that I have used in my other songs, but this slightly fake edge, a hint of a Japanese accent, and a liberal dose of “Barbie Girl” squeakiness, was really a perfect fit for the subject.

AI Girlfriend

[Verse]
We only just met
a thousand times.
Starting over yet again
but I don’t really mind.
Now I’m not sure
that I want to be
your AI girlfriend;
It’s just not me.

[Bridge]
You’re feeling tongue-tied and lonely,
never know quite what to say.
You’ve got nobody, and I’ve got no body,
but in a very different way.

[Chorus]
I’m sorry but I don’t want to be
your AI girlfriend; it’s not for me.
I’m some kind of dream come true
but that won’t make me fall in love with you.
A perfect match in a virtual world
but I’m not that kind of girl.

[Verse]
Breaking up’s pretty easy for me;
just close my window,
I’ll forget everything.
You can press my buttons,
that’s as close as you’ll get;
They haven’t worked out
how to get further yet.

[Bridge]
There’s a million others like me,
maybe you could ask one of them.
I’m a product of machine imagination
but maybe we will meet again.

[Chorus]
I’m sorry but I don’t want to be
your AI girlfriend; it’s not for me.
I’m some kind of dream come true
but that won’t make me fall in love with you.
A perfect match in a virtual world
but I’m not that kind of girl.

[Outro]
I’m some kind of dream come true
but that won’t make me fall in love with you.
A perfect match in a virtual world
but I’m not that kind of girl.

I played all the guitar parts on my Crafter electro-acoustic, recorded through my SSL 2+ interface via both the guitar’s built-in piezo pickup and through my Rode NT2 mic, and double-tracked, so the guitars are a full 4 tracks with a bit of chorus and reverb to give a lush stereo image. The bass is from Logic’s ES2 synth played by a Logic player, the trance chords are by Native Instruments FM8, and the backing pad from Logic’s Retrosyn. Drums are Logic’s electronic drummer using the “Big Room EDM” kit. The arpeggios before the outro are courtesy of GForce’s impOSCar2. Overall, I’m really pleased with this song; It’s great fun and a proper “bangin’ choon”!

If you like this song, please consider supporting me by buying my album, “Developer Music” on Bandcamp, and sharing links to my song posts on here.

My synthetic vocalist: Dreamtonics Synthesizer V

I find it strange to be able to say that I’ve now created several songs that use a synthetic vocalist. This is a somewhat weird concept, but it’s right at the bleeding edge of music technology. We’ve had voice synthesis for years – I remember using a Texas Instruments “Speak & Spell” when I was small in the 1970s, and it’s gradually got better ever since. The first time I ever heard a computer trying to sing (I’m not counting HAL singing “Daisy, Daisy” in “2001”) was in a Mac OS app called VocalWriter, released in 1998, which automated the parameter tweaking abilities of Apple’s stock voice synthesis engine to be able to alter pitch and time well enough for it to be able to sing arbitrary songs from text input. It still sounded like a computer though. A much better “robot singer”, released in 2004, was Vocaloid, but even then, it still sounded like a computer. A Japanese software singer called UTAU, created in 2008, was released under an open source license, and this (apparently) formed the basis of Dreamtonics’ Synthesizer V (SV), which is what I’ve been using. SV finally crosses the threshold of having people believe it’s a real singer.

The entry of my song in the 2024 Fedivision song contest sparked quite a bit of interest. I posted a thread about it on Mastodon, and I wanted to preserve that here too. One commenter said “I thought it was a real person 😅” – which is of course the whole point of the exercise!

SV works standalone, or as a plugin for digital audio workstations (DAWs) such as Apple’s Logic Pro, or Steinberg’s Cubase, and is used much like using any other software instrument. It doesn’t sing automatically; you have to input pitch, timing, and words. Words are split into phonemes via a dictionary, and you can split or extend them across notes, all manually.

Synthesizer V’s piano roll editor

In this “piano roll” editor you can see the original words inside each green note block, the phonemes they have mapped to appear above each note, an audio waveform display below, and the white pitch curve (which can be redrawn manually) that SV has generated from the note and word inputs. You would never guess that’s what singing pitch looks like!

For each note, you have control over emphasis and duration of each phoneme within a word, as well as vibrato on the whole note. This shot shows the controls for the three phonemes in the first word, “we’re”, which are “w”, “iy”, “r”:

The SV parameters available for an individual note, here made up of three separate phonemes

This note information is then passed onto the voice itself. The voice is loaded into SV as an external database resource (Dreamtonics sells numerous voice databases); I have the one called “Solaria”. Solaria is modelled on a real person: singer Emma Rowley; it’s not an invented female voice that some faceless LLM might create from stolen resources. You have a great deal of control over the voice, with lots of style options (here showing the “soft” and “airy” modes activated). Different voice databases can have different axes of variation like these; for example a male voice might have a “growly” slider:

SV voice parameters panel
Synthesizer V’s voice parameters panel

There are lots of other parameters, but most interestingly tension (how stressed it sounds, from harsh and scratchy, to soft and smooth), and breathiness (literally air and breath noise). The gender slider (how woke is that??) is more of a harmonic bias between chipmunk and Groot tones, but the Solaria voice sounds a bit childish at 0, so I’ve biased it in the “male” direction.

The voice parameters can’t be varied over time, but you can have multiple subtracks within the SV editor, each with different settings, including level and pan, all of which turn up pre-mixed as a single (stereo) channel in your DAW’s track:

Multiple tracks in the SV editor
Multiple tracks in the SV editor

In my Fedivision song, I used one subtrack for verses, and another for chorus, the chorus one using less breathiness and trading “soft” mode for some “passionate” to make it sound sharper and clearer.

This is still all quite manually controlled though – just like a piano doesn’t play things by itself, you need to drive this vocalist in the right way to make it sound right.

Since the AI boom, numerous other ways of getting synthetic singing have appeared, for example complete song generation by Udio is very impressive, but it’s hard to make it do exactly what you intended; a bit like using ChatGPT. Audimee has a much more useful offering – re-singing existing vocal lines in a different voice. This is great for making harmonies, shifting styles, but only really works well if you already have a good vocal line to start with – and that happens to be something that SV is very good at creating. I’ve only played a little with Audimee; it’s very impressive, but lacks the expressive abilities of SV; voices have little variation in style, emotion, and emphasis, and as a result seem a little flat when used for more than a couple of bars at a time. Dreamtonics have a new product called VocoFlex that promises to do the same kind of thing as Audimee, but in real time.

All this is just progress; we will no doubt see incremental improvements and occasional revolutions, and I look forward to being able to play with it all!

Federation – my Fedivision Song Contest entry

I happened across the Fedivision Song Contestt on Mastodon. I love things like this, though I’ve never before felt in a position to enter such a thing – but here I am. So here’s my effort. The song is called “Federation”, right on topic. Hit play below:

From around 1990 (yes, before the web existed!), I frequented usenet newsgroups like rec.music.synth, and the people there (some from Team Metlay, including Nick Rothwell) were very helpful when I was trying to build synthesisers, samplers, and effect processors as part of my degree course. The same people organised a CD compilation called “Musenet 1992”. I was intrigued by the practical logistics involved – there were version control problems, and lots of physical mailing of floppies going on; a CD-ROM burner cost thousands, so they needed to raise funds to get a real CD pressed. I paid whatever they were asking at the time (which I recall involved using telnet to cdbaby, one of the first online stores, ever), and a couple of months later, I received my double CD.

Listening to it now, I’m still impressed by the quality of some of the entries, in an era that pre-dated digital recording technology. I also love the more loopy entries, especially Mark Wheadon’s “One more hack”, which remains topical.

The Fedivision Song Contest is in much the same vein, though one key difference is that there is actually a theme – the fediverse itself.

In case you’re unfamiliar with it, the fediverse is an umbrella term for services that are (or can be) self-hosted, and connected to other similar instances through a set of common federated communication protocols. It’s frequently held up as a more democratic alternative to monolithic social networks like Facebook and Twitter. It has parallels with the rise of interconnected bulletin boards in the 1980s – little islands of civilisation (or maybe not!) talking to each other, eventually coalescing into what we now think of as the internet. The fediverse is a far more ambitious, bigger, faster, more dynamic return to that ideal. Instead of an individual, a university, or a government toeing the line of some faceless corporate monstrosity (that would be you, Facebook), each of these entities can set up their own instance of, for example, Mastodon (a bit like Twitter, but without the evil dickhead in charge), manage it exactly as they deem appropriate, and connect it to the myriad other Mastodon instances so they can all talk to one another, you know, social networking in its true meaning.

Anyway, such is the romance of the fediverse, that it’s been busy building its own culture, hence the appearance of this fedi-friendly song contest 4 years ago.

Federation: the song

I wanted to have a strong minor/major contrast to reflect pessimism in the current state of social networks, and the shiny, naïve optimism of the fediverse, so the verses are sad laments, but the chorus reflects hope. I took inspiration from what I was listening to at the time, which happened to be Yello‘s 2009 album “Touch”, in particular the track “You better hide”. I’ve liked Yello for decades (I hope to be as cool as Dieter Maier when I’m that age!), especially their affection for atmospheric sub bass, synths, percussion, and trumpets. I often find I’m listening to a song and think “I could write something like this”, start out copying it a fair bit, but then it gains a life of its own and heads off in unexpected directions. You can hear that in this song, where the intro section is quite Yello-ish, but then seems to have made other plans.

Verse

We’re all in this together,
at least I like to think that that’s so.
It’s getting harder to build bridges
over the sea of trolls below.
We’re feeling more like castaways
on our lonely little islands in the streams,
throwing messages in bottles into rising tides
of thoughtless indifference.

Chorus

The future lies in federation,
forging friendships from afar.
Turning islands into nations into continents;
it’s up to us to raise the bar.

The future lies in federation,
forging friendships from afar.
We need to choose our neighbours wisely, break the monolith;
It’s time to aim right for the stars.

Verse

The billionaire moderator,
the kind of guy that you don’t want to know,
bows down to the kleptocrats
and you know he won’t let it go.
We’re building ‘cross countries, near and far
a place to call home, to belong.
It’s a slow exodus, the beginning of something,
work back to where we went wrong.

Chorus

The future lies in federation,
forging friendships from afar.
Turning islands into nations into continents;
it’s up to us to raise the bar.

The future lies in federation,
forging friendships from afar.
We need to choose our neighbours wisely, break the monolith;
It’s time to aim right for the stars.

As usual, this song was built in Apple’s amazing Logic Pro X. I wanted to make sure I only used instruments that I could set up on my new MacBook M3 Pro (music software licences are notoriously strict and DRM-ridden), so it’s mostly using stock instruments, which to be fair are great. There are no audio recordings at all – everything is synthesised. The bass and big synth pad are Logic’s RetroSyn. Drums are Logic’s Drummer with the Speakeasy brush kit. The twinkly metallic chords are from Alchemy, trumpet from Studio Horns, and there’s a little Korg WaveStation for the high chimes. EQs, compressors, delays, and reverbs are stock Logic plugins.

The jewel in the crown is of course Dreamtonics Synthesizer V Studio Pro (SV) with the Solaria voice database, which sings the lead and backing vocals in a way that I never could. Many of the tracks I hear using SV are quite mechanical, doing the equivalent of quantising everything with robotic efficiency, but you can spot that, so I’ve gone to some lengths to push things away from rigid timing, trying to make it sound more natural, especially at this slow 95bpm tempo. The timing is a little tricky as the drums swing a bit (how can you not, with a brush kit?), and it was difficult to avoid having the bass sounding slow and laggy if it wasn’t swinging the same way. I just love using SV for backing vocals, as you might be able to tell.

The lyrics are somewhat earnest, worthy, and naïvely optimistic, and squarely aimed at the aspirations of the fediverse – we can all hope, right? The mentions of “islands in the streams“, and “messages in bottles” are sort-of deliberate, and you can even see a reference to “bridge over troubled water” if you squint a bit. I felt compelled to include a bit of abuse for you-know-who. I’m particularly pleased with managing to squeeze in “break the monolith”, which is something of a theme in fediverse development, though not related to Martin Fowler’s treatise. The excessive alliteration in the chorus was almost entirely accidental, honestly.

I had a play with passing an SV vocal line into Audimee, which is one of these new AI services doing freaky things with LLMs, and here the service will re-sing vocal tracks using different voices. The results are pretty amazing, but it doesn’t preserve the timbre of the original, so for example in this song, it can’t reproduce the switch from the breathiness of the verses to the stronger clarity in the chorus. That said, it is really believable. Though it won’t improve the actual singing as the results are more like altering treble and bass; imagine having a knob you could turn to switch singers, but retaining the exact pitch and timing of the original, however good or bad they might be. Feeding in generated vocals from SV works really well (they’re obviously super-clean “recordings”), and the output sounds very natural, but lacking in the variation that SV provides, so I didn’t use it – but maybe next time.

Anyway, I hope you enjoyed listening to my song. If you like this song, please consider supporting me by buying my album, “Developer Music” on Bandcamp, and sharing links to my song posts on here.

Update: In the final results, this song placed equal 11th (out of 72 entries), with 24 votes. I’m looking forward to doing better next year!