Anthropic Introduces Claude 3.5 Sonnet, Setting New AI Benchmarks

Gábor Bíró 21 juin 2024
3 min de lecture

Anthropic's new artificial intelligence model, Claude 3.5 Sonnet, sets new industry standards in reasoning, knowledge, and coding capabilities. Operating at twice the speed of its predecessor, the model excels in complex tasks and enhances collaboration with the new Artifacts feature.

Anthropic Introduces Claude 3.5 Sonnet, Setting New AI Benchmarks
Source: Anthropic

Anthropic has unveiled its latest and most advanced artificial intelligence model, Claude 3.5 Sonnet, showcasing significant improvements in performance and capabilities compared to previous models. Key advancements of the new model include:

  • Outperforming competitor models like OpenAI's GPT-4o, Google's Gemini 1.5 Pro, and Meta's Llama 3 400B on 7 out of 9 overall benchmarks and 4 out of 5 vision benchmarks.
  • Setting new industry benchmarks for graduate-level reasoning (GPQA), undergraduate-level knowledge (MMLU), and coding proficiency (HumanEval).
  • Operating at twice the speed of Anthropic's previous top model, Claude 3 Opus.
  • Excelling at writing and translating code, handling multi-step workflows, and interpreting charts and graphs.
  • Demonstrating a better grasp of nuance, humor, and complex instructions.
  • Generating high-quality content with a natural, relatable tone.
  • Solving 64% of problems in internal agentic coding tests, compared to 38% for Claude 3 Opus.
  • Surpassing Claude 3 Opus on standard vision benchmarks, showing improved visual reasoning and text transcription from imperfect images.

These enhancements make Claude 3.5 Sonnet a powerful tool for complex tasks such as context-aware customer support and orchestrating multi-step workflows.

Alongside the new model, Anthropic introduced the Artifacts feature, designed to improve collaboration and productivity. This innovative function allows users to view, edit, and build upon AI-generated content—like code snippets and text documents—in real-time within the chat interface. Artifacts transforms Claude into a dynamic collaborative workspace, enabling teams to seamlessly integrate AI-generated content into their projects and workflows. For example, design and UX teams can use Artifacts to collaboratively create, iterate on, and refine UI prototypes, leveraging Claude's understanding of design principles and ability to generate visual elements.

Anthropic emphasizes its commitment to safety and privacy with Claude 3.5 Sonnet. The model underwent rigorous testing and was trained to reduce misuse, involving external experts like the UK's Artificial Intelligence Safety Institute (UK AISI). Anthropic also incorporated feedback from child safety experts to update classifiers and fine-tune models. The company reaffirms its stance on data privacy, stating that user-submitted data is not used to train its generative models without explicit permission. These measures reflect Anthropic's efforts to address potential risks and maintain user trust in its AI technology.

The new AI model is available for free on Claude.ai and the Claude iOS app, with higher rate limits for Claude Pro and Team subscribers. Users can also access Claude 3.5 Sonnet via the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI. Anthropic plans to complete the Claude 3.5 model family later this year with the release of Claude 3.5 Haiku and Claude 3.5 Opus. The company is also developing new features and integrations, including a Memory feature that will allow Claude to remember user preferences and interaction history.

Gábor Bíró 21 juin 2024