Sunday, June 8, 2025
14.4 C
London

Amazon announces Nova: A New Generation of Multimodal AI Models

At its re:Invent conference on Tuesday, Amazon Web Services (AWS) introduced Nova, a new family of multimodal generative AI models.

Nova includes four text-generating models: Micro, Lite, Pro, and Premier. Starting today, AWS customers can access Micro, Lite, and Pro. Premier will become available in early 2025, according to Amazon CEO Andy Jassy, who made the announcement on stage.

The Nova lineup also features Nova Canvas for image generation and Nova Reel for video generation. Both models launched on AWS this morning.

“We’ve been focused on advancing our own frontier models,” Jassy said. “Over the past four to five months, these models have made remarkable progress. If we’re finding value in them, we believe you will too.”

Micro, Lite, Pro, and Premier

The Nova text-generating models support 15 languages, focusing primarily on English, and offer a range of sizes and capabilities to meet diverse needs.

Micro handles only text inputs and outputs but excels with the fastest response times and lowest latency. Lite processes text, image, and video inputs efficiently, while Pro balances accuracy, speed, and cost for various tasks. Premier stands out as the most advanced, tackling complex workloads and serving as a “teacher” model for creating custom, fine-tuned solutions. Lite, Pro, and Premier can analyze and summarize text, images, and videos, making them ideal for tasks like digesting documents or summarizing charts, meetings, and diagrams.

Micro offers a 128,000-token context window, processing up to 100,000 words. Lite and Pro expand this capability to 300,000 tokens, supporting about 225,000 words, 15,000 lines of code, or 30 minutes of video. By early 2025, AWS plans to enhance certain Nova models with context windows exceeding 2 million tokens.

CEO Andy Jassy highlighted the models’ speed and cost-effectiveness, calling them among the fastest and most affordable in their class. Available on AWS Bedrock, Nova models can be fine-tuned for text, images, and video or distilled for greater efficiency and speed.

“We’ve optimized these models to integrate seamlessly with proprietary systems and APIs,” Jassy said. “This allows you to perform complex, orchestrated tasks—what we call agent behavior—much more easily. These models are incredibly compelling.”

Nova
Image Credits: Frederic Lardinois/TechCrunch

Canvas and Reel

Canvas and Reel represent AWS’s most robust move into generative media so far. It enables users to generate and edit images with prompts, such as removing backgrounds, while offering precise controls for color schemes and layouts. Reel takes it further by creating six-second videos from prompts or reference images. Users can customize camera motions in Reel, adding pans, 360-degree rotations, and zoom effects.

Currently, Reel is limited to six-second videos, which take about three minutes to generate. However, AWS plans to release a version soon that can produce videos up to two minutes long. CEO Andy Jassy emphasized that both Canvas and Reel include built-in controls for responsible use, such as watermarking and content moderation. “We’re focused on limiting the generation of harmful content,” Jassy noted.

In a blog post, AWS elaborated on Nova’s safety measures, which aim to combat risks like misinformation, child exploitation, and hazardous materials. However, details on how these safeguards work remain unclear.

AWS also remains tight-lipped about the data used to train its generative models, citing a mix of proprietary and licensed sources. Like many vendors, AWS views training data as a competitive edge and closely guards this information. Concerns over intellectual property lawsuits further discourage transparency. To address risks, AWS offers an indemnification policy protecting customers if a model outputs potentially copyrighted material.

The Final Say

Amazon says the speech-to-speech model will interpret verbal and nonverbal cues like tone and cadence, delivering natural, human-like voices. Meanwhile, the any-to-any model promises to transform applications such as translators, content editors, and AI assistants.

“You’ll be able to input text, speech, images, or video and output text, speech, images, or video,” CEO Andy Jassy explained. “This is the future of how frontier models will be built and used.”

Of course, these innovations depend on smooth development without setbacks.

Looking ahead, Jassy revealed plans for new Nova capabilities. A speech-to-speech model is set for Q1 2025, followed by an “any-to-any” model expected around mid-2025.

Also Read About

Google launches Veo video model on Vertex AI

AI Surge Make 139-Year-Old Japanese Cable Firm Fujikura into Market Leader

Hot this week

Honor invests in AI-driven ‘smartphone of the future’

Honor announced Sunday that it is developing an AI-powered...

Meta Is Reportedly Launching a Standalone AI App to Compete with OpenAI and Google

Meta is reportedly developing a standalone AI app for...

OpenAI Introduces GPT-4.5 ‘Orion,’ Its Most Advanced AI Model Yet

OpenAI revealed on Thursday that it is launching GPT-4.5,...

Alibaba enters global AI race with $53 billion investment over three years

Alibaba Group announced a bold $53 billion investment in...

DeepSeek to Open-Source AGI Research in Response to Privacy Concerns

DeepSeek, a Chinese AI startup focused on artificial general...

Topics

Honor invests in AI-driven ‘smartphone of the future’

Honor announced Sunday that it is developing an AI-powered...

OpenAI Introduces GPT-4.5 ‘Orion,’ Its Most Advanced AI Model Yet

OpenAI revealed on Thursday that it is launching GPT-4.5,...

Alibaba enters global AI race with $53 billion investment over three years

Alibaba Group announced a bold $53 billion investment in...

DeepSeek to Open-Source AGI Research in Response to Privacy Concerns

DeepSeek, a Chinese AI startup focused on artificial general...

Apple’s $599 iPhone 16e with AI launches on February 28

As expected, Apple unveiled the highly anticipated iPhone SE...

Meta Announces First-Ever LlamaCon: An AI-Focused Event for Developers, to Be Held in April

Meta announced its first-ever LlamaCon event on Tuesday, designed...

Related Articles

Popular Categories