Developing SongSplit AI
I wish I could say it was straightforward developing this app. I really, really wish I could.
But I wasn’t that lucky.
Finding the Right Model
After testing different options for splitting audio, I settled on Demucs because of the quality. HTDemucs uses a hybrid transformer architecture—it processes audio through both time-domain and frequency-domain branches simultaneously, then combines them with cross-attention. The result is remarkably clean separation.
I really wanted the app to work on my iPad, so I opened Xcode and got started.
The Python Problem
Demucs runs on Python. You can’t ship Python apps to the App Store.
So I looked for ways to run Demucs natively on iPad.
I couldn’t get it working.
The Cloud Detour
I built a web service that took a request from the iPad, processed the file on a Google Cloud Run GPU instance, and returned the separated tracks. It worked well, but left me with three concerns:
Privacy. I respect your privacy. I don’t put ads or analytics in my apps. I don’t want copies of your files on my servers, even if they’re only there temporarily.
Cost. I didn’t want to require a subscription for this product. I hate paying for subscriptions for things that, engineered well, don’t really need one. I couldn’t charge a one-time fee while running cloud GPUs for processing. It would’ve gotten too expensive.
Availability. This isn’t an app that should need an internet connection. If you’re a paying customer, you should be able to use your software whether your internet is working or not. Whether cloud services are up or not. It’s annoying when AWS goes down and suddenly desktop applications stop working.
Back to Square One
I went back to finding a way to get Demucs running natively on iPad.
After weeks of frustration, I gave up.
The Breakthrough
Then Anthropic released Claude Opus 4.5.
With its help, I was able to convert Demucs to work with Apple’s MLX framework. The model has over 500 tensors and a complex dual-branch architecture with custom STFT handling—not exactly a simple port. But we got it working.
That means the application now runs natively on all devices with Apple Silicon. The M1, M2, M3, and M4 chips in Macs, iPads, and iPhones can all run the neural network directly on the GPU.
It leaves some users out. But it solves all three problems that prevented me from shipping:
- Your audio never leaves your device
- No recurring costs, no subscription needed
- Works offline, forever
Now I have Demucs running in MLX, and my app working on macOS and iOS. The three quality modes—Fastest, Balanced, and Best—give you control over the tradeoff between speed and separation quality.
Was it the path I expected? Not at all. But sometimes the hard way turns out to be the right way.