Sony Research and AI Singapore (AISG) have announced a strategic partnership aimed at advancing the SEA-LION family of large language models (LLMs). This initiative focuses on enhancing AI’s ability to process languages predominant in Southeast Asia. With a region as linguistically diverse as Southeast Asia, which boasts over a thousand languages, the development of accurate and capable language models is crucial. This collaboration targets the improvement of these models, particularly in handling Tamil, a language spoken by millions globally.
During earlier initiatives, the emphasis was primarily on broader global languages, often neglecting the complex linguistic landscape of Southeast Asia. This new collaboration aims to fill this gap by creating more inclusive and representative AI models. Previous projects lacked the specificity needed for languages such as Tamil, but with Sony’s expertise in Indian languages and advanced research in related fields, this partnership seeks to bring about significant improvements.
Focus on Linguistic Diversity
SEA-LION, which stands for Southeast Asian Languages In One Network, aims to address the linguistic needs of the Southeast Asian population. Hiroaki Kitano, President of Sony Research, emphasized the importance of diversity and localization in AI development.
“As a global company, diversity and localisation are vital forces,” said Kitano. “In Southeast Asia specifically, there are more than a thousand different languages spoken by the citizens of the region. This linguistic diversity underscores the importance of ensuring AI models and tools are designed to support the needs of all populations around the world.”
Emphasizing Tamil Language
The collaboration will prioritize the enhancement of the SEA-LION model’s capabilities in processing Tamil. With an estimated 60-85 million Tamil speakers worldwide, the focus on this language is significant. Sony Research plans to utilize its expertise in Indian languages, which includes Tamil, along with its research in speech generation, content analysis, and recognition to improve the model.
“Access to LLMs that address the global landscape of language and culture has been a barrier to driving research and developing new technologies that are representative and equitable for the global populations we serve,” Kitano added.
The partnership benefits from Kitano’s established connections within Singapore’s technology sector, where he holds advisory roles on several councils and boards. These roles include the Advisory Council on the Ethical Use of AI and Data, the Infocomm Media Development Authority (IMDA), the Singapore Economic Development Board (EDB), and the National Research Foundation, Singapore (NRF).
“The integration of the SEA-LION model, with its Tamil language capabilities, holds great potential to boost the performance of new solutions. We are particularly eager to contribute to the testing and refinement of the SEA-LION models for Tamil and other Southeast Asian languages, while also sharing our expertise and best practices in LLM development,” stated Teo.
This collaboration signifies a vital step towards creating AI technologies that are more inclusive of Southeast Asia’s vast linguistic diversity. The focus on Tamil and other regional languages represents a meaningful advancement in AI language model development. By leveraging Sony’s and AISG’s capabilities, this partnership aims to break barriers and drive innovation in multilingual AI solutions.