PRODUCTS

Qosmo Music & Sound AI

Video2Music

AI selects songs that “feel right” for the video

Music that "feels right" for the given video is selected in instantly from a target music library with millions of songs. No limitations in musical styles
In addition to the “video→music” search, “music→video” search is also available, improving a cross-modal content search UX
Combined with Qosmo’s other search algorithms, you can build a wide range of search services, including similar-song suggestions

Music that "feels right" for the given video is selected in instantly from a target music library with millions of songs. No limitations in musical styles
In addition to the “video→music” search, “music→video” search is also available, improving a cross-modal content search UX
Combined with Qosmo’s other search algorithms, you can build a wide range of search services, including similar-song suggestions

Boost purchases at stock music/video services

Stock music/video services conventionally use tags and keywords to aid search but it requires that users know what they are looking for. Video2Music solves this problem directly by allowing users to ask “find music that fits with this video.”
Music selection requests from customers in video production

Song selection has depended on expert librarians at record companies and others who frequently receives requests for matching music. Video2Music allows those less familiar with the library to effectively narrow down candidates to a few selections.
Feature integration with movie editing software

Many movie editing software products offer music library for users to choose from. It would greatly increase UX if the product can recommend a timely selection of music that fits with the content under production.

You can see a detailed output of the Video2Music on this page. For a number of videos, 3 candidate songs are listed. If you want to test it with your own videos for evaluation, please feel free to contact us.

Video2Music detailed output

Video2Music Demo

Video2Music uses the deep-learning algorithm called Transformer to convert video/music input into mutually comparable latent vector features. By training a model using a large number of movie contents online*. Using the Contrastive Learning technique, we successfully calculate quantitatively the fitness between a video and a song, two distinct forms of media. The pre-trained model provided with the product license already supports a wide range of input videos and music styles but can be re-trained with additional data to improve accuracy for specific applications.
*Training machine learning models from copyrighted materials is permitted by the copyright law of Japan

Pricing

Initial fee (initial library indexing, system integration etc)

Monthly fee (charged by fixed rate up to specified number of API calls)
Input/Output

Input: Video (30 seconds or longer)

Output: Song candidates (can reverse input and output)
Operating Environment

Cloud: REST API

On-premise: Linux-GPU environment
Processing speed

Indexing: < 3 seconds (per song)

Matching (per search) : < 1 second