Chasing Compute: The Role of Computing Resources in Publishing Foundation Model Research

Yuexing Hao1,2†, Yue Huang3†, Haoran Zhang1, Chenyang Zhao4, Zhenwen Liang3, Paul Pu Liang1, Yue Zhao6, Lichao Sun5, Saleh Kalantari2, Xiangliang Zhang3, Marzyeh Ghassemi1*

1 EECS, MIT; 2 Cornell University; 3 CSE, University of Notre Dame; 4 Computer Science Department, UCLA; 5 CS, Lehigh University; 6 School of Advanced Computing, USC

* Corresponding author: mghassem@mit.edu; Equal contribution

Exploring the relationship between computing resources and scientific advancement in foundation models (FM).

Research Overview

Cutting-edge research in Artificial Intelligence (AI) requires considerable resources, including Graphics Processing Units (GPUs), data, and human resources. In this paper, we evaluate the relationship between these resources and the scientific advancement of foundation models (FM). We reviewed 6,517 FM papers published between 2022–2024 and surveyed 229 first-authors to understand the impact of computing resources on scientific output. We find that increased computing is correlated with individual paper acceptance rates and national funding allocations, but is not correlated with research environment (academic or industrial), domain, or study methodology.

We analyze 34,828 accepted papers between 2022 and 2024, identifying 5,889 FM papers, and examine GPU access and TFLOPs alongside their correlation with research outcomes. We further incorporate insights from 229 authors across 312 papers about resource usage and impact.

Study Design

Study design: sources and FM-relevant filter, metadata/pdf extraction and survey, plus comparisons between scraped and survey data.
Figure 1. Study Design. (A) We initially collected 34,828 papers from eight major computer science conferences, from which 6,517 FM-related papers were retrieved using the OpenReview API and the ACL ARR platform. Then we extracted their pdfs with GPU related information. In the survey study, we recruited 229 first-author FM researchers, representing 312 papers in total, to participate in our survey. Participants provided self-reported responses regarding computing resources when such information was not documented in their publications. Dotted boxes indicate potentially unavailable information. (B) Percentage of valid GPU type by year and conference, and whether the conference author and reviewer checklists contain related guidelines to report these computing resource usage. Noted that ARR conferences (including ACL, EACL, EMNLP, and NAACL) used the same author and reviewer checklists with slightly modifications based on each conference's requirements. (C) GPU Usage and TFLOPS 16 Between GPT-4o Scraped and Self-Reported Survey Data.

Results

Temporal evolution of FM research, phases, methods, domains, and GPU model distribution in scraped and self-reported data.
Figure 2. Temporal Evolution of FMs. A) FM papers as a proportion of published papers over time. B) Evolution of FM papers over phases, methods, and domains. C) Temporal evolution of GPU model distribution in LLM‑extract and self‑reported data. “Pub.” denotes publications from the scraped dataset, while “Await Publish” refers to papers that are either under review, rejected, or in preparation for submission.
Distribution analyses of FM papers: senior author affiliation, countries, LLM usage, GPU types; with boxplots of GPU Number and TFLOPs across categories.
Figure 3. Distribution of FM Papers. We analyze FM papers across various dimensions: (A) Senior Author’s Affiliations in (A1) Academia and (A2) Industry; (B) Countries by Senior Author’s Affiliation; (C) Paper Count by LLM Usage; and (D) GPU Types used. The boxplots below panel (D) display GPU Number and TFLOPs across four categories: Affiliation, Phase, Method, and Domain. TFLOPS: Tera Floating‑Point Operations Per Second; FP16: 16‑bit floating‑point format (Note: * p < 0.001).
Funding distribution by country and source categories; relationships between resources, output, and citations in academic and industry settings.
Figure 4. Funding Distribution of FM Research. Only 15.3% of FM papers contain funding country and agency information in their manuscript. A) Distribution of funding by country across three categories: Government, Corporate, and Foundation. B) Relationship between each country’s GDP per capita and the number of funded papers. For academic and industry settings: C) Relationship between available GPU resources and average number of papers produced. D) Relationship between GPU resources and average citation count per paper.

Discussion

More/Better?

Our study provides empirical evidence that increasing the number of GPUs does not inherently lead to higher research impact. This is important as unchecked expansion of computational requirements further exacerbates environmental concerns (Schwartz et al., 2020). Furthermore, the current landscape of FM research remains highly centralized, with China and the United States disproportionately dominating the field, as access to computing resources often serves as a fundamental prerequisite for participation (Lehdonvirta, 2024).

We note that many papers utilized multiple GPUs for different tasks, making it challenging to clearly categorize GPU numbers, types, and memory configurations. Consequently, our calculated FLOPs values may differ slightly from the actual computational resources reported in these studies.

Open Reporting

While initiatives such as the required computing statement from some conferences acknowledge the role of computational resources, they remain insufficiently reported. Greater transparency in GPU usage (Bommasani et al., 2025) and recognition of computing resources, including GPU availability, storage, and human labor, are integral components in evaluating AI research. This will ensure the long-term sustainability of AI research (Maslej et al., 2025).

While we quantified GPU usage and authorship, other resource costs are often overlooked. The cost of failed experiments is rarely acknowledged; research highlights successful outcomes, yet unsuccessful attempts are crucial to the progress of FM research. Furthermore, infrastructure costs, which vary between countries due to gross domestic product (GDP) and AI policies, are generally not considered.

Automated Evaluation

We relied on GPT-4o to extract and summarize detailed information from PDF files, a method which is susceptible to inaccuracies. We performed ten rounds of GPT-4o extraction and summarized the results through majority voting to minimize errors. Nevertheless, the extracted information may still contain inaccuracies, necessitating careful interpretation and validation.

Research Team

Yuexing Hao

Yuexing Hao

EECS, MIT & Cornell University

Yue Huang

Yue Huang

CSE, University of Notre Dame

Haoran Zhang

Haoran Zhang

EECS, MIT

Chenyang Zhao

Chenyang Zhao

CS, University of California, Los Angeles

Zhenwen Liang

Zhenwen Liang

CSE, University of Notre Dame

Paul Pu Liang

Paul Pu Liang

EECS, MIT

Yue Zhao

Yue Zhao

School of Advanced Computing, USC

Lichao Sun

Lichao Sun

CS, Lehigh University

Saleh Kalantari

Saleh Kalantari

Cornell University

Xiangliang Zhang

Xiangliang Zhang

CSE, University of Notre Dame

Marzyeh Ghassemi

Marzyeh Ghassemi

EECS, MIT