ML-Based Indexing of Media Libraries for Insights and Search
Date & Time
Tuesday, October 25, 2022, 5:30 PM - 6:00 PM
Rob Gonsalves

Recent advances in Machine Learning (ML) have produced a new form of semantic indexing that lets users enhance searches, as well as gain new insights into their media libraries. Unlike typical search systems that use extracted metadata, semantic indexing allows users to find relevant material without the need to tag the media with selections from a predefined taxonomy. With semantic search, users can simply enter unstructured text, and the system will find the best matching media clips. The paper extends the use of the same technology to gather analytics on the data which can then be further correlated to generate various insights.   

This new form of media indexing can be performed with the CLIP model from OpenAI. The model encodes images and text into embeddings that can be searched to find the closest semantic similarity, enhanced with learned cultural knowledge. This type of indexing can be made practical using a database like Elasticsearch. The system has the benefit of finding media based on keywords, synonyms, and summaries. The same system can also be used for analytics and insights, such as clustering, shot detection, and creating a 2-dimensional map to display correlations.

The paper also presents extensions to the semantic search systems. Based on a study of multiple existing models, these extensions provide new capabilities - to handle many media types, using additional languages, search for spoken phrases in audio files, finding both verbatim and semantically similar phrases, extracting semantic information that leverages multiple video frames, and searching for ambient sounds.

Location Name
Salon 1
Take-Aways from this Presentation
• AI can be used for a fast and efficient semantic search of media without the need to extract explicit metadata • This approach can work in multiple languages to search and gain insights for collections of images, text, video, and audio with speech and ambient sounds • Elasticsearch can be used for fast media search and retrieval using embeddings