Technology NewsTechnology NewsTechnology News
  • Computing
  • AI
  • Robotics
  • Cybersecurity
  • Electric Vehicle
  • Wearables
  • Gaming
  • Space
Reading: What Makes Poro 34B Exceptional?
Share
Font ResizerAa
Technology NewsTechnology News
Font ResizerAa
Search
  • Computing
  • AI
  • Robotics
  • Cybersecurity
  • Electric Vehicle
  • Wearables
  • Gaming
  • Space
Follow US
  • Cookie Policy (EU)
  • Contact
  • About
© 2025 NEWSLINKER - Powered by LK SOFTWARE
AI

What Makes Poro 34B Exceptional?

Highlights

  • 34-billion-parameter AI model Poro 34B unveiled.

  • Trained on 1 trillion tokens, excels in Finnish.

  • Outperforms dedicated translation models.

Kaan Demirel
Last updated: 6 April, 2024 - 2:17 am 2:17 am
Kaan Demirel 1 year ago
Share
SHARE

The recent development of Poro 34B, a 34-billion-parameter AI model, marks a significant stride in the realm of language processing, particularly for Finnish, English, and programming languages. Trained on a whopping 1 trillion tokens, Poro 34B integrates 8 billion tokens of Finnish-English translation pairs, showcasing an impressive capacity for understanding and generating text in these languages.

Contents
How Was Poro 34B Trained?What Distinguishes Poro 34B’s Tokenization?How Does Poro 34B Perform?Helpful points

Language models have historically been constrained by the availability of large text datasets, particularly for less commonly spoken languages. The creation of models like Poro 34B has been preceded by ongoing debate and research into the efficiency of multilingual training. Despite previous concerns regarding the so-called “curse of multilingualism,” the current trend indicates that multilingual models can indeed offer competitive, if not superior, performance in tasks involving underrepresented languages.

How Was Poro 34B Trained?

To train Poro 34B, researchers undertook extensive preprocessing to eliminate redundant or low-quality content, ensuring a high-caliber training dataset. The corpus included data harvested from diverse sources, with a significant focus on Finnish literature and web content. English data and programming languages were also incorporated, with a custom tokenizer designed to effectively handle the linguistic nuances of the model’s tri-lingual focus.

What Distinguishes Poro 34B’s Tokenization?

Poro 34B’s tokenization process was tailored with a specialized byte-level BPE tokenizer, which helped to maintain low fertility rates across Finnish, English, and programming languages. The model underwent pretraining to the point where it processed 1 trillion tokens, a feat that underscores both its expansive learning capacity and the advanced computational strategies employed, including a customized training configuration for AMD GPU integration.

How Does Poro 34B Perform?

Evaluations of Poro 34B have demonstrated its remarkable prowess, with the model excelling across various benchmarks. In tasks involving Finnish text generation, Poro 34B outperformed previous models, delivering outputs with high coherence and grammatical accuracy. Notably, its translation capabilities have been highlighted as surpassing those of even dedicated translation systems and commercial offerings such as Google Translate.

A study titled “Poro 34B: A 34B Parameter AI Model Trained for 1T Tokens of Finnish, English, and Programming languages, Including 8B Tokens of Finnish-English Translation Pairs” published in the journal provided further insights into the model. Despite the challenges associated with training such a large-scale model, the research team overcame these to produce a tool with significant improvements in language processing for Finnish, validating the approach of multilingual training. The study not only detailed the model’s development but also its environmental considerations, evaluating the compute cost in terms of energy consumption.

Helpful points

– Poro 34B’s multilingual training approach could serve as a blueprint for developing models for other less-represented languages.
– The research underscores the need for benchmarks that better reflect the nuanced capabilities of multilingual models.
– Future research should systematically explore multilingual training’s effects on various language tasks.

In conclusion, Poro 34B represents a groundbreaking achievement in language model development. Its creation not only advances the field of natural language processing but also opens new avenues for research into multilingual models and their applications. With Poro 34B demonstrating unprecedented proficiency in Finnish and maintaining competitive performance in English and programming languages, it serves as a beacon for future efforts aimed at overcoming the data scarcity challenge for smaller languages. The model’s success suggests that the benefits of multilingual training can indeed outweigh the limitations, paving the way for more inclusive and effective language technologies.

You can follow us on Youtube, Telegram, Facebook, Linkedin, Twitter ( X ), Mastodon and Bluesky

You Might Also Like

Persona AI Develops Industrial Humanoids to Boost Heavy Industry Work

DeepSeek Restricts Free Speech with R1 0528 AI Model

Grammarly Pursues Rapid A.I. Growth After $1 Billion Funding Boost

AMR Experts Weigh Growth, AI Impact, and Technical Hurdles

Odyssey AI Model Turns Video Into Real-Time Interactive Worlds

Share This Article
Facebook Twitter Copy Link Print
Kaan Demirel
By Kaan Demirel
Kaan Demirel is a 28-year-old gaming enthusiast residing in Ankara. After graduating from the Statistics department of METU, he completed his master's degree in computer science. Kaan has a particular interest in strategy and simulation games and spends his free time playing competitive games and continuously learning new things about technology and game development. He is also interested in electric vehicles and cyber security. He works as a content editor at NewsLinker, where he leverages his passion for technology and gaming.
Previous Article Splinter Cell Fans Anticipate News at Ubisoft Forward
Next Article What Secrets Does Messier 82 Hold?

Stay Connected

6.2kLike
8kFollow
2.3kSubscribe
1.7kFollow

Latest News

Robotics Innovations Drive Industry Forward at Major 2025 Trade Shows
Robotics
Iridium and Syniverse Deliver Direct-to-Device Satellite Connectivity
IoT
Wordle Players Guess “ROUGH” as June Begins With Fresh Puzzle
Gaming
SpaceX and Axiom Launch New Missions as Japan Retires H-2A Rocket
Technology
AI-Powered Racecars Drive Competition at Laguna Seca Event
Robotics
NEWSLINKER – your premier source for the latest updates in ai, robotics, electric vehicle, gaming, and technology. We are dedicated to bringing you the most accurate, timely, and engaging content from across these dynamic industries. Join us on our journey of discovery and stay informed in this ever-evolving digital age.

ARTIFICAL INTELLIGENCE

  • Can Artificial Intelligence Achieve Consciousness?
  • What is Artificial Intelligence (AI)?
  • How does Artificial Intelligence Work?
  • Will AI Take Over the World?
  • What Is OpenAI?
  • What is Artifical General Intelligence?

ELECTRIC VEHICLE

  • What is Electric Vehicle in Simple Words?
  • How do Electric Cars Work?
  • What is the Advantage and Disadvantage of Electric Cars?
  • Is Electric Car the Future?

RESEARCH

  • Robotics Market Research & Report
  • Everything you need to know about IoT
  • What Is Wearable Technology?
  • What is FANUC Robotics?
  • What is Anthropic AI?
Technology NewsTechnology News
Follow US
About Us   -  Cookie Policy   -   Contact

© 2025 NEWSLINKER. Powered by LK SOFTWARE
Welcome Back!

Sign in to your account

Register Lost your password?