Google Research has launched MusicLM, a text to music AI tool that can generate music in any genre from text prompts. It can also transcribe hummed or whistled melodies into different instruments. The tool works by analyzing text to determine the size and complexity of the composition. At present, the AI tool is not available for personal use. To showcase its capabilities, the company has uploaded several examples of its music output.
The summary of the Google’s research paper reads as follows:
We introduce MusicLM, a model generating high-fidelity music from text descriptions such as “a calming violin melody backed by a distorted guitar riff”. MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modeling task, and it generates music at 24 kHz that remains consistent over several minutes. Our experiments show that MusicLM outperforms previous systems both in audio quality and adherence to the text description. Moreover, we demonstrate that MusicLM can be conditioned on both text and a melody in that it can transform whistled and hummed melodies according to the style described in a text caption. To support future research, we publicly release MusicCaps, a dataset composed of 5.5k music-text pairs, with rich text descriptions provided by human experts.
The demo site for MusicLM displays its capabilities through examples such as 10-second clips of instruments, 8-second clips of specific genres, music suitable for a prison escape scene, and comparisons between beginner and advanced piano players. It also interprets phrases like “futuristic club” and “accordion death metal”. While it can simulate human vocals, the output lacks natural quality and sounds grainy.
MusicLM can also create a musical “story” or narrative from written descriptors such as “time to meditate” and “time to wake up”. It can respond to a picture and caption combination or produce audio played by a specific instrument in a game. Google joins other companies such as AudioLM, Jukebox, and Riffusion in attempting to produce high-fidelity, complex compositions through AI. However, MusicLM may be the first to succeed due to its ability to handle technical restrictions and a lack of training data.
The authors of the research paper have stated they have no plans to release the AI models. This is due to potential issues of plagiarism and cultural appropriation. However, the technology may appear in future musical experiments by Google. Currently, the only individuals who can utilize the research are those developing musical AI systems. Google is releasing a dataset of 5,500 music-text pairs to the public to assist in training and evaluating other musical AI systems.
Thank you for reading. We hope you enjoyed the article and found it informative. Please consider sharing it and stay in this space for more updates. Stay safe and we hope to see you around.