The artificial intelligence (AI) company Stability AI has unveiled its latest creation - Stable Audio - which represents a significant advance in AI's ability to generate high-quality music.
Stable Audio utilizes a cutting-edge latent diffusion architecture to create original music compositions from simple text prompts in just seconds. The model was trained on over 800,000 audio samples and can generate 45-90 second tracks with rich instrumentation and production value.
Early audio samples from Stability AI demonstrate Stable Audio's capacity to produce stylistically diverse music like trance, ambient soundscapes, and drum solos based on descriptive prompts. The quality and coherence of these AI-generated tracks are lightyears ahead of previous text-to-audio models like Google's MusicLM and Meta's Audiocraft.
By producing music at CD-quality 44.1kHz sampling rate, Stable Audio opens up possibilities for AI music generation to be used commercially. The incredibly fast inference time also makes it feasible to use Stable Audio tracks as background audio for AI-generated videos, games, and other multimedia projects.
While the free version of Stable Audio limits users to 45 second clips, the Pro subscription enables generating 90 seconds of downloadable audio for commercial use. This monetization model follows Stability AI's approach with DALL-E 2 image generation.
The launch of Stable Audio accelerates the democratization of music composition through AI. As these models continue to rapidly improve, AI-generated music may come to dominate many commercial applications. Stable Audio represents a breakthrough that brings us significantly closer to that future.