Maritaca AI - Pesquisa e Inovação em LLM para o Português

The interplay between domain specialization and model size: a case study in the legal domain

Screen Shot 2025-01-21 at 8.25.49 AM.png

his paper that attempts to answer the following question: "As the availability of compute continues to grow, will the future be dominated by a few models that know everything, or a society of agents, each specialized in a particular domain?"

To address this question, we investigate the laws governing the continuous pretraining of models of various sizes when trained on either general or specialized data.

To our surprise, we found that, in terms of training computational resources, larger models benefit more from domain specialization than smaller ones. This conclusion is drawn from diverging curves with respect to compute in Figure 1a. If these curves were to converge, that is, intersect at some point with increased compute, then generalist models would be more efficient than specialized ones.

Sabiá-3: Technical Report

In this technical report, we evaluate the capabilities of the Sabiá-3 model in several benchmarks, including 73 national exams (Enem, ENADE, OAB, Revalida, etc.), function calls, tasks that require agent capacity, following instructions and dealing with long contexts.

Sabiá-2: A New Generation of Portuguese Large Language Models

In this technical report, we introduce the new generation of Maritaca models, Sabiá-2, and present the most complete analysis of current LLMs in Portuguese tasks, covering 64 brazilian exams.

Sabiá: Portuguese Large Language Models

6cb9d6_2a555c4c601a4a0ea267b3e2f6c01265~mv2.webp

In this study, we show that a modest amount of in-domain training brings significant improvements in few-shot tasks. Our best model, Sabiá-65B, outperforms ChatGPT-3.5 on average in 14 Portuguese tasks. The work has been published in the BRACIS 2023 conference.

The Sabiá-7B model is available on Hugging Face

Juru: Legal Brazilian Large Language Model from Reputable Sources

Juru is the first LLM trained in Brazilian legal data. In this study, we show that training Sabiá-2 Small on law-related documents from reliable websites, such as the CNPQ library, brings gains in Enade and OAB law tests.

GPT-3.5 and GPT-4 evaluated on Brazilian Universities Entrance Exams

In this study, we evaluated GPT-3.5 and GPT-4 on the ENEM and showed that using the Chain-of-Thought technique significantly improves GPT-4's performance.

Subsequent article about GPT-4 Vision's ability to "see" images.

BLUEX: Um benchmark multimodal baseado em provas da USP e UNICAMP

Captura de Tela 2023-07-02 às 13_34_21.webp

BLUEX is a dataset composed of entrance exams from two of Brazil's leading universities: USP and UNICAMP. Its main purpose is to serve as a reference for the evaluation of current and future AI models, including those with multimodal capabilities (image+text). The work has been published in the BRACIS 2023 conference.

Scientific Publications

Discover the research we've been working on lately.

API Credits for Teaching and Research

If you are a student or researcher and wish to use the Sabiá models, which are LLMs specialized in Portuguese, Maritaca AI offers an API credits program to support your projects.

Apply now