Music that "feels right" for the given video is selected in instantly from a target music library with millions of songs. No limitations in musical styles
In addition to the “video→music” search, “music→video” search is also available, improving a cross-modal content search UX
Combined with Qosmo’s other search algorithms, you can build a wide range of search services, including similar-song suggestions
Music that "feels right" for the given video is selected in instantly from a target music library with millions of songs. No limitations in musical styles
In addition to the “video→music” search, “music→video” search is also available, improving a cross-modal content search UX
Combined with Qosmo’s other search algorithms, you can build a wide range of search services, including similar-song suggestions
Stock music/video services conventionally use tags and keywords to aid search but it requires that users know what they are looking for. Video2Music solves this problem directly by allowing users to ask “find music that fits with this video.”
Song selection has depended on expert librarians at record companies and others who frequently receives requests for matching music. Video2Music allows those less familiar with the library to effectively narrow down candidates to a few selections.
Many movie editing software products offer music library for users to choose from. It would greatly increase UX if the product can recommend a timely selection of music that fits with the content under production.
You can see a detailed output of the Video2Music on this page. For a number of videos, 3 candidate songs are listed. If you want to test it with your own videos for evaluation, please feel free to contact us.Video2Music detailed output
Video2Music Demo
Video2Music uses the deep-learning algorithm called Transformer to convert video/music input into mutually comparable latent vector features. By training a model using a large number of movie contents online*. Using the Contrastive Learning technique, we successfully calculate quantitatively the fitness between a video and a song, two distinct forms of media. The pre-trained model provided with the product license already supports a wide range of input videos and music styles but can be re-trained with additional data to improve accuracy for specific applications.

*Training machine learning models from copyrighted materials is permitted by the copyright law of Japan
Initial fee (initial library indexing, system integration etc)
Monthly fee (charged by fixed rate up to specified number of API calls)
Input: Video (30 seconds or longer)
Output: Song candidates (can reverse input and output)
Cloud: REST API
On-premise: Linux-GPU environment
Indexing: < 3 seconds (per song)
Matching (per search) : < 1 second
Get in touch with us here!
CONTACT