
his paper that attempts to answer the following question: "As the availability of compute continues to grow, will the future be dominated by a few models that know everything, or a society of agents, each specialized in a particular domain?"
To address this question, we investigate the laws governing the continuous pretraining of models of various sizes when trained on either general or specialized data.
To our surprise, we found that, in terms of training computational resources, larger models benefit more from domain specialization than smaller ones. This conclusion is drawn from diverging curves with respect to compute in Figure 1a. If these curves were to converge, that is, intersect at some point with increased compute, then generalist models would be more efficient than specialized ones.

In this technical report, we evaluate the capabilities of the Sabiá-3 model in several benchmarks, including 73 national exams (Enem, ENADE, OAB, Revalida, etc.), function calls, tasks that require agent capacity, following instructions and dealing with long contexts.

In this technical report, we introduce the new generation of Maritaca models, Sabiá-2, and present the most complete analysis of current LLMs in Portuguese tasks, covering 64 brazilian exams.

In this study, we show that a modest amount of in-domain training brings significant improvements in few-shot tasks. Our best model, Sabiá-65B, outperforms ChatGPT-3.5 on average in 14 Portuguese tasks. The work has been published in the BRACIS 2023 conference.

Juru is the first LLM trained in Brazilian legal data. In this study, we show that training Sabiá-2 Small on law-related documents from reliable websites, such as the CNPQ library, brings gains in Enade and OAB law tests.

In this study, we evaluated GPT-3.5 and GPT-4 on the ENEM and showed that using the Chain-of-Thought technique significantly improves GPT-4's performance.
Subsequent article about GPT-4 Vision's ability to "see" images.

BLUEX is a dataset composed of entrance exams from two of Brazil's leading universities: USP and UNICAMP. Its main purpose is to serve as a reference for the evaluation of current and future AI models, including those with multimodal capabilities (image+text). The work has been published in the BRACIS 2023 conference.
Scientific Publications
Discover the research we've been working on lately.
API Credits for Teaching and Research