Amazon announces Nova: A New Generation of Multimodal AI Models

AIBoom

12 months ago

At its re:Invent conference on Tuesday, Amazon Web Services (AWS) introduced Nova, a new family of multimodal generative AI models.

Nova includes four text-generating models: Micro, Lite, Pro, and Premier. Starting today, AWS customers can access Micro, Lite, and Pro. Premier will become available in early 2025, according to Amazon CEO Andy Jassy, who made the announcement on stage.

The Nova lineup also features Nova Canvas for image generation and Nova Reel for video generation. Both models launched on AWS this morning.

“We’ve been focused on advancing our own frontier models,” Jassy said. “Over the past four to five months, these models have made remarkable progress. If we’re finding value in them, we believe you will too.”

Micro, Lite, Pro, and Premier

The Nova text-generating models support 15 languages, focusing primarily on English, and offer a range of sizes and capabilities to meet diverse needs.

Micro handles only text inputs and outputs but excels with the fastest response times and lowest latency. Lite processes text, image, and video inputs efficiently, while Pro balances accuracy, speed, and cost for various tasks. Premier stands out as the most advanced, tackling complex workloads and serving as a “teacher” model for creating custom, fine-tuned solutions. Lite, Pro, and Premier can analyze and summarize text, images, and videos, making them ideal for tasks like digesting documents or summarizing charts, meetings, and diagrams.

Micro offers a 128,000-token context window, processing up to 100,000 words. Lite and Pro expand this capability to 300,000 tokens, supporting about 225,000 words, 15,000 lines of code, or 30 minutes of video. By early 2025, AWS plans to enhance certain Nova models with context windows exceeding 2 million tokens.

CEO Andy Jassy highlighted the models’ speed and cost-effectiveness, calling them among the fastest and most affordable in their class. Available on AWS Bedrock, Nova models can be fine-tuned for text, images, and video or distilled for greater efficiency and speed.

“We’ve optimized these models to integrate seamlessly with proprietary systems and APIs,” Jassy said. “This allows you to perform complex, orchestrated tasks—what we call agent behavior—much more easily. These models are incredibly compelling.”

**Image Credits:** Frederic Lardinois/TechCrunch

Canvas and Reel

Canvas and Reel represent AWS’s most robust move into generative media so far. It enables users to generate and edit images with prompts, such as removing backgrounds, while offering precise controls for color schemes and layouts. Reel takes it further by creating six-second videos from prompts or reference images. Users can customize camera motions in Reel, adding pans, 360-degree rotations, and zoom effects.

Currently, Reel is limited to six-second videos, which take about three minutes to generate. However, AWS plans to release a version soon that can produce videos up to two minutes long. CEO Andy Jassy emphasized that both Canvas and Reel include built-in controls for responsible use, such as watermarking and content moderation. “We’re focused on limiting the generation of harmful content,” Jassy noted.

In a blog post, AWS elaborated on Nova’s safety measures, which aim to combat risks like misinformation, child exploitation, and hazardous materials. However, details on how these safeguards work remain unclear.

AWS also remains tight-lipped about the data used to train its generative models, citing a mix of proprietary and licensed sources. Like many vendors, AWS views training data as a competitive edge and closely guards this information. Concerns over intellectual property lawsuits further discourage transparency. To address risks, AWS offers an indemnification policy protecting customers if a model outputs potentially copyrighted material.

The Final Say

Amazon says the speech-to-speech model will interpret verbal and nonverbal cues like tone and cadence, delivering natural, human-like voices. Meanwhile, the any-to-any model promises to transform applications such as translators, content editors, and AI assistants.

“You’ll be able to input text, speech, images, or video and output text, speech, images, or video,” CEO Andy Jassy explained. “This is the future of how frontier models will be built and used.”

Of course, these innovations depend on smooth development without setbacks.

Looking ahead, Jassy revealed plans for new Nova capabilities. A speech-to-speech model is set for Q1 2025, followed by an “any-to-any” model expected around mid-2025.

Also Read About

Google launches Veo video model on Vertex AI

AI Surge Make 139-Year-Old Japanese Cable Firm Fujikura into Market Leader