In this technical report, we evaluate the capabilities of the Sabiá-3 model in several benchmarks, including 73 national exams (Enem, ENADE, OAB, Revalida, etc.), function calls, tasks that require agent capacity, following instructions and dealing with long contexts.
In this technical report, we introduce the new generation of Maritaca models, Sabiá-2, and present the most complete analysis of current LLMs in Portuguese tasks, covering 64 brazilian exams.
In this study, we show that a modest amount of in-domain training brings significant improvements in few-shot tasks. Our best model, Sabiá-65B, outperforms ChatGPT-3.5 on average in 14 Portuguese tasks. The work has been published in the BRACIS 2023 conference.
Juru is the first LLM trained in Brazilian legal data. In this study, we show that training Sabiá-2 Small on law-related documents from reliable websites, such as the CNPQ library, brings gains in Enade and OAB law tests.
In this study, we evaluated GPT-3.5 and GPT-4 on the ENEM and showed that using the Chain-of-Thought technique significantly improves GPT-4's performance.
Subsequent article about GPT-4 Vision's ability to "see" images.
BLUEX is a dataset composed of entrance exams from two of Brazil's leading universities: USP and UNICAMP. Its main purpose is to serve as a reference for the evaluation of current and future AI models, including those with multimodal capabilities (image+text). The work has been published in the BRACIS 2023 conference.
Scientific Publications
Discover the research we've been working on lately.
API Credits for Teaching and Research