This year’s Google I/O, the company’s annual developer conference, has officially kicked off, and the company has announced its headlong thrust into the future of generative AI. At the conference, Google CEO Sundar Pichai, unveiled a wave of advancements powered by Google’s AI technology, Gemini.
A core theme of the event was generative AI, with Google showcasing significant updates to its Gemini family of models and introducing new tools for creators. Here are the highlights of the upgrades:
● Faster and more capable Gemini models: Google unveiled the new 1.5 Flash model, its fastest Gemini yet, excelling in tasks like summarization, chat applications, and data extraction. Additionally, the improved 1.5 Pro model boasts enhanced ability to follow complex instructions and control response styles.
● Gemini Nano expands and Gemma gets an upgrade: Gemini Nano, designed for on-device tasks, now supports image inputs starting with Pixel phones. Google also announced Gemma 2.0, the next generation of open-source models for responsible AI development, as well as PaliGemma, a vision-language model inspired by PaLI-3.
● High-Fidelity Video and Image Generation: Google is introducing Veo, a groundbreaking video generation model capable of producing high-quality, cinematic-style videos exceeding a minute in length. Also joining the field is Imagen 3, the company’s most advanced text-to-image model to date.
● Music AI Sandbox: In collaboration with YouTube, Google introduced a suite of music AI tools designed to empower creators, including the ability to generate original instrumental sections.
Greater Gemini integration across Google products
As the Gemini model establishes itself months after its release, Google announced plans to further integrate it deeply into its core products, including:
● Enhanced Android UX: New features leverage on-device AI to enhance user experience. For instance, Circle to Search allows students to use their phones or tablets to get step-by-step tutoring on math and physics problems. Additionally, Gemini integration enables features like drag-and-dropping generated images into messages and “Ask this video” for information retrieval from YouTube videos.
● Better Search functionality: Search will soon leverage a custom-built Gemini model to answer entirely new types of questions. Users will be able to interact with AI Overviews, adjust the level of detail displayed, and explore AI-organized results pages with categorized content.
● Improved Google Photos organization: Ask Photos, a new feature powered by Gemini, allows users to search their photo libraries in a more natural way, such as by requesting photos from specific locations or based on thematic details. Ask Photos can also curate photo highlights and suggest captions for social media sharing.
● A more synergistic Google Workspace: Gemini for Workspace features are getting an upgrade, including access to the 1.5 Pro model within the side panel of Gmail, Docs, Drive, Slides, and Sheets. This enables users to ask a wider range of questions and receive more insightful responses directly within these applications.
These announcements mark a significant step forward in Google’s AI strategy, placing generative AI at the forefront of user experiences across its products.
Learn more about the new AI features by catching the replay of I/O 2024 here and read the highlights here.