In the ever-evolving landscape of e-commerce, technology continues to push boundaries, making online shopping experiences more intuitive and engaging for users. One such advancement is the introduction of the multi-modal search feature, which leverages artificial intelligence (AI) to enhance the way customers search for products. Gone are the days of traditional text-based searches as users can now combine images and text prompts to find exactly what they're looking for. In this article, learn how Flipkart is integrating this feature to give its customers exactly what they want.
Not long ago, searching for information meant typing a few keywords into a search engine and sifting through pages of results. In other words, being confined to expressing our desires and inquiries through the limitations of the written language and literacy levels, typing out our thoughts one keystroke at a time.
But what if there was a better way to search for information?
Multi-modal search seamlessly integrates various modalities – text, images, audio, or video – providing precise access to the information or content one seeks.
Unveiling the vision
Let’s go back a bit — it’s 2023. The buzzword is generative AI. A few months before, in November 2022, Sam Altman-led OpenAI had launched ChatGPT-3, an artificial intelligence (AI) chatbot built on top of OpenAI’s foundational large language models (LLMs). In the blink of an eye, GPT-3 takes over the world of technology, enabling personalized access to information like never before. While earlier models existed, GPT-3 allowed direct interaction with ChatGPT, showcasing its immense potential. This direct engagement revealed the technology’s profound impact, foreshadowing its future significance.
Around this time, Flipkart was preparing for its yearly hackathon – a 2-day dynamic event where Flipsters come together to unbox innovative ideas aimed at enhancing the shopping experience for every Flipkart customer. Themed GenAI Innovation Days, the 2023 hackathon centered on harnessing the power of generative AI to provide the Flipkart users with personalized shopping journeys.
Senior Data Scientist Surbhi Mathur and her team, inspired by the recent revolution of AI and its potential, embarked on a quest to redefine the online shopping experience using AI technology.
“During the hackathon, my team and I brainstormed ideas to enhance the shopping experience for our users. We developed the idea of introducing multimodal search to allow users to progressively build their shopping journey using both visual and text inputs simultaneously. We modeled this as an image retrieval problem, where the query consists of a seed image accompanied by a relative text input describing the desired changes from the seed image. As the query was composed of both image and text, we chose a multimodal model architecture like Contrastive Language–Image Pre-training (CLIP) from OpenAI, which embeds image and text in the same space. We demonstrated a working prototype of this idea during the hackathon and won first place. Building on this success, we fine-tuned the model on e-commerce distribution to create FK-CLIP, significantly improving the concept understanding,” says Surbhi.
The result? Flipkart’s own AI-powered search feature Immerse that allows a seamless fusion of text and image search capabilities.
“Put it simply, imagine how an Indian shopper shops at stores. They start with something they like, then keep refining it. Like if they see a red dress but want it with long sleeves or in polka dots. They tell the shopkeeper, who helps them find what they want. This is precisely what Immerse aims to replicate in the realm of online shopping,” says Aradhya Saxena, Senior Product Manager, Flipkart.
Immerse won the hearts of the hackathon judges and, after meticulous testing and viability, was integrated into the Flipkart platform in no time. “We won the hackathon and a beta version of the feature went live on the Flipkart app before The Big Billion Days Sale in 2023. From what we learned in the user studies, people really liked this feature. They get what it does and see how it’s different from basic filters,” adds Surbhi.
Engineering the Backbone
Krishna Azad Tripathi, Software Development Engineer, has been working with the Flipkart’s search semantics team since 2018. Over the course of 6 years, Krishna has closely witnessed the development of the search feature at Flipkart and how the user needs determined the trajectory of the feature.
“In the early stages of our project, we relied on a lexical-based approach for search, focusing on exact matches between user queries and our product catalog. As we evolved, we transitioned towards a click-based model, leveraging clickstream data to better understand user intent,” explains Krishna.
From the click-based model the search team soon adopted a demand-based approach, using data from high-demand products or categories. When posed by the limitations of this model – unable to process low-query demands – the team pivoted towards a hybrid approach of combining text-based and click based data. However, when the idea of Immerse burst into the scene, the team found a potential game changer.
“If we fail to showcase good products, users won’t find what they’re looking for, impacting our search performance. To tackle this, we’re prioritizing the development of Immerse. The aim is to improve search quality and to enhance our understanding of user intent and deliver better results in both the short and long term,” adds Krishna.
Here’s how to use the Immerse feature on the Flipkart app:
Flipkart’s journey into generative AI technology began with the introduction of Flippi, its AI-powered shopping chat assistant. Also built using strong AI and machine learning (ML) models, Flippi offers product discovery through a conversational experience, accompanied by intuitive nudges. From there on, Flippi aids in decision-making through intelligent recommendations, providing a holistic shopping process for users.
The Way Forward
Currently, Immerse runs on predefined prompts. Surbhi explains, “For example, on top of a product image, you can apply prompts like “yellow” if the product is gray. You can apply prompts to change the color, pattern, and more. These prompts we’re showing you are predefined and limited in variety and number.” The team understands the need for expansion and is working on solving this challenge to help users experience more visual variety while browsing products.
Immerse represents more than just a search tool – it signifies a revolutionary leap in the way users navigate online shopping. By seamlessly integrating text and image search capabilities, Immerse empowers users to explore products intuitively, just as they would in a physical store. As a pioneer of adopting multi-modal search in the Indian e-commerce space, Flipkart sets a new industry standard, ushering in a future where informed buying decisions are effortless and every search is an adventure in discovery.