AIs Makes us Stupid, Smart

Table of Contents

TL;DR

  • AI massively improves job performance.

  • AI improves job performance for less competent, junior developers more than it does senior developers.

  • H1Bs, excessive CS graduates, and tech layoffs have created a glut of labor supply.

  • The theoretical labor supply has increased even further, since AI has made programming accessible to a large population pool.

Introduction

Recently, there were debates over H1B visas. The debate’s two camps were as follows:

  • In support, Elon Musk, Indians and his followers. By increasing H1Bs, they argue this will expand the overall developer pool in quantity and quantity of quality and will thus keep US companies competitive with lower wages and greater talent pools.

  • The opposition consisted of a bipartisan coalition, from leftists on Reddit to the dissident right on X. By abating H1Bs, they argue jobs will be kept for Americans developers and graduates, which are sufficient in quantity and quality to fulfill the needs of the US tech companies.

Instead of subsiding as ‘current thing’ debates usually do, it maintained intensity for weeks, eventually expanding in scope. Popular conservatives commentators began siding with the Elon faction arguing that young university educated men should suck it up. If their jobs are taken by H1Bs, then they should pursue careers at fast food chains instead.

The conversation sort of shifted away from tech employment and more towards generational inequality, focusing on whether life was harder for Zoomers over Boomers, etc. This is an oversimplification of course. Given the intensity and scope of the debate, many peripheral subjects were touched.

However, one insufficiently addressed subject - in my opinion - that was the shifting landscape of tech employment pertaining to the impact of AIs and overall credential inflation (devaluing of education credentials at every level of educational attainment).

How are these tangentially related?

For AI, especially for developers, it may boost developer productivity by an amount comparable to over a standard deviation increase in IQ, based on typical IQ / job performance correlations.

Since the underlying complaint of the H1B opposition was that, given an increase in quantity of H1Bs, quality may necessarily fall since H1Bs would become less selective. H1B opponents are usually thinking of IQ when they think about quality. However, if the job performance gains from AI are large, then it doesn’t matter if they’re low IQ because AI can simply enhance their performance to where this simply doesn’t matter.

Of course, H1B opponents would simply retaliate by saying that, if low IQ fellows can program with AIs, then why need H1Bs at all? One could simply use nationals of deficient talent - not foreigners - as they are still in sufficient supply.

This is especially true if we consider that there are more students graduating with degrees in computer science per capita.

credinflation

But this got me thinking, if anyone can be a developer now, and there’s a glut of workers on the market, then being a developer isn’t very prestigious anymore, is it? All the market forces would be pushing downward on developer salaries, as there is now a legion of applicants with sufficient post-AI ability.

Now curious, I wanted to see some real data on the situation. How much does AI improve developer, and non-developer professional performance? What’s the equivalent IQ x job performance effect size that we see? Are there any other nuances in the data?

AIs makes us stupid smart

After some digging and citation hopping, these are the main papers on productivity gains with AI. You can click on the header for the source. Unfortunately not all effects are reported in standard units.

Note: *** indicates statistical significance at p<0.01, * indicates significance at p<0.10. No * either means not significant or significance not reported.

Brynjolfsson et al. (2023)

Task: Customer support agents at a Fortune 500 software firm handling customer inquiries through chat

  • 14% increase in customer support resolutions per hour overall***
  • 34% increase in resolutions per hour for novice workers***
  • 9% decrease in average customer chat duration***
  • 1.3% increase in successful chat resolution rate*

Peng et al. (2023)

Task: Professional programmers implementing an HTTP server in JavaScript

  • 55.8% reduction in time to complete server implementation task***
  • 7% higher task success rate in completing implementation requirements

Gambacorta et al. (2024)

Task: Software programmers at Ant Group working on regular coding tasks

  • 55% increase in lines of code produced overall***
  • 67% increase in lines of code produced by junior staff***
  • 11-18% of productivity gains directly attributable to LLM code output

Cui et al. (2024a)

Task: Software developers at Microsoft, Accenture, and Fortune 100 company performing regular development work

  • 54.03% increase in completed pull requests at anonymous company
  • 38.38% increase in code compilation attempts***
  • 26.08% increase in completed pull requests overall***
  • 13.55% increase in code commits
  • 5.53% decrease in successful build rate

Yeverechyahu et al. (2024)

Task: Open-source developers contributing to Python and R packages

  • 51% increase in code commits to repositories***
  • 17.82% increase in new package version releases***
  • 15.14% higher increase in maintenance-related commits compared to feature development***

Cui et al. (2024b)

Task: Software developers at Microsoft and Accenture performing regular development work

  • 84-107% increase in successful code builds at Accenture***
  • 12.92-21.83% increase in completed pull requests at Microsoft***
  • 11.53% increase in lines of code changed***
  • 7.51-8.69% increase in completed pull requests at Accenture***

Gains are notable here because not everyone in the treatment group even used AI, they just had the option to use it, furthermore, adoption of AI use was slow.

McKinsey (2023)

Task: Software developers performing various coding tasks including documentation, generation, and refactoring

  • 45-50% reduction in time spent on code documentation
  • 35-45% reduction in time spent on new code generation
  • 20-30% reduction in time spent on code refactoring
  • <10% reduction in time spent on complex programming tasks

Vaithilingam et al. (2022)

Task: Students and engineers completing Python programming assignments

  • ~1 minute faster task completion time
  • Significantly higher helpfulness rating (6.16 vs 4.45 out of 10)***

Mozannar et al. (2024)

Task: Software developers completing pre-selected coding tasks

  • 55.8% potential reduction in overall task completion time
  • 76% of participants reported improved productivity (16/21 participants)
  • 81% of participants reported faster task completion (17/21 participants)

Campero et al. (2022)

Task: HTML “programmers” and non-programmers creating web pages

  • 27% improvement in task completion speed (using regression method)***
  • 17% improvement in task completion speed (using ratio of means)***

Noy & Zhang (2023)

Task: College-educated professionals completing occupation-specific writing tasks

  • 37% or 0.8 SD reduction in task completion time (from 27 to 17 minutes)***
  • 0.45 standard deviation increase in output quality***
  • 33% vs 18% adoption rate post-experiment***
  • 0.40 standard deviation increase in job satisfaction***
  • 0.20 standard deviation increase in self-efficacy*

Okay, so there are some pretty sizable gains. What was the method? What generation of AIs did they use, etc?

AIs makes us stupid, smart.

Here’s a rough table of the method, sample size, when the study was conducted, and the rough equivalent generation of AI that was employed, and the AI that was reported in use. As you can see there are many RCTs here, that are well sampled.

What’s astonishing here are the AIs used in the studies.

Early versions of co-pilot which are basically glorified autocompletes and GPT-3.5.

THESE AIs ARE AWFUL.

GPT-3.5 was unusable, it hallucinated continuously, couldn’t do kindergarten maths, it had a memory of a few thousand words, and even then it only remembered either the beginning or the end of your conversation. There isn’t a single AI in these studies that was even as advanced as GPT-4o.

But DESPITE all that, as reported above, the gains were enormous!

An interesting finding I found perusing the papers was the presence of an interaction effect between productivity gains and developer experience or competence.

Less experienced devs benefited more! See negative interaction field.

Reference Year of Study AI OpenAI Gen Occupations Negative Interaction Sample Size RCT
Brynjolfsson et al. (2023) 2020-2021 GPT-3 GPT-3 Customer Support True*** 5,179 True
Peng et al. (2023) 2022 GitHub Copilot GPT-3 Software Developers True 95 True
Gambacorta et al. (2024) 2023 CodeFuse (Chinese Open Source AI) GPT-3 Software Programmers True*** 1,219 True
Cui et al. (2024a) 2024 GitHub Copilot GPT-4 Software Developers True*** 4,867 True
Yeverechyahu et al. (2024) 2024 GitHub Copilot GPT-4 Software Developers Not Applicable 3,220 packages False
Cui et al. (2024b) 2022-2023 GitHub Copilot GPT-3.5 Software Developers Not Tested 1,974 True
McKinsey (2023) 2023 Multiple Gen AI Tools GPT-3.5 Software Developers False Not Listed False
Vaithilingam et al. (2022) 2022 GitHub Copilot GPT-3 Students & Software Engineers Not Tested 24 True
Mozannar et al. (2024) 2024 GitHub Copilot GPT-4 Software Developers True 21 False
Campero et al. (2022) 2022 GPT-3 GPT-3 HTML “Programmers” & Non-Programmers True? 145 True
Noy & Zhang (2023) 2023 ChatGPT GPT-3.5 Various Professionals True*** 444 True

This negative interaction effect is theoretically devastating for developers, and any professional class apparently according to these papers. This is because it closes the gap between the incompetent and the competent, the low and high IQ, the inexperience and the experienced. Why would you hire a talented experienced senior dev when you can save 50% by going with a junior dev that can nearly get the same job done with AI? The same could be said about H1Bs. Employ foreigners that are beholden to you though a visa, pay them less, and get the same performance because they’re using AI.

By how much does AI increase developer IQ?

Okay, but what’s the ballpark equivalent IQ gain? This depends on three assumptions:

  1. The correlation between IQ and job performance.
  2. “Job performance” is analogous to “productivity gains” we’ve just mentioned.
  3. Guessing from the various papers, which are heterogenous in how they measure productivity, the standard deviation gain in productivity or job performance.

We don’t really have to assume point one, but it is fiercely debated - check out Meng Hu’s great article. From reading his work, it seems that the correlation between IQ and productivity of about r=0.4.

Point two difficult to ascertain because, as it is even in job performance x IQ papers, measurements are heterogeneous. Not every paper is using the same method to measure job performance, and not every AI productivity paper is either. So we need to assume that the two are mostly analogous. I think this is likely true, but it is an assumption.

Lastly, and annoyingly, not every paper is reporting standardized gains. Completing a task 37% faster might sound impressive, but if it only equates to a performance gain of 0.2 SDs, then it’s misleading. So we have to ballpark guess.

Given the above, I wouldn’t take the below too seriously. However, given that the gains that were saw in the papers above were from low quality AIs, in a way these should be considered lower bound.

My personal guess:

  • 95% CIs 0.1-0.6 SD improvement or 0.35 SD for seniors.
  • 95% CIs 0.3-1 SD improvement or 0.65 SD for juniors.

Assuming IQ x job performance correlation of 0.4, a 0.35 and 0.65 SD gain, this is a 0.875 or 1.625 SD IQ gain equivalent. That’s about ~13 or ~24 IQ points respectively.

Accessing the supplementary materials of this paper, it appears that “Programmers and software development professionals” have an IQ of about 111.2.

So maybe a developer from a third world country with an IQ of ~90 can perform as well as a first world programmer post AI? That seems to be the implication here if we take the ~24 point gain at face value.

Conclusion, Discussion

Recent data suggests AIs have democratized programming by effectively adding >15 IQ points to developers’ capabilities. The effect is most pronounced among less competent programmers, with multiple RCTs showing 30-70% productivity gains using even primitive AI models like GPT-3.5.

This reduced barrier to entry, combined with credential inflation (surge in CS graduates), theoretically should be negative for developer employment. Why hire senior devs when juniors with AI assistance can perform similarly at half the cost? AI doesn’t just augment talent, it flattens the skill distribution.

The implications are stark. Developer salaries will likely continue declining as the market saturates with AI-augmented developers. The profession’s prestige diminishes as programming becomes increasingly accessible.

Already, the job situation hasn’t returned to pre-COVID levels.

fred

The future may see programming transform from a high-skill profession to a commodity skill, with AI serving as the great equalizer. That is until AI replaces the occupation entirely.

References
  1. Brynjolfsson, E., Li, D., & Raymond, L. (2023). Generative AI at work. arXiv. https://doi.org/10.48550/arXiv.2304.11771
  2. Campero, A., Vaccaro, M., Song, J., Wen, H., Almaatouq, A., & Malone, T. W. (2022). A test for evaluating performance in human-computer systems. arXiv. https://doi.org/10.48550/arXiv.2206.12390
  3. Cui, Z., Demirer, M., Jaffe, S., Musolff, L., Peng, S., & Salz, T. (2024a). The effects of generative AI on high skilled work: Evidence from three field experiments with software developers. SSRN. http://dx.doi.org/10.2139/ssrn.4945566
  4. Cui, K., Demirer, M., Jaffe, S., Musolff, L., Peng, S., & Salz, T. (2024b). The productivity effects of generative AI: Evidence from a field experiment with GitHub Copilot. MIT Exploration of Generative AI. https://doi.org/10.21428/e4baedd9.3ad85f1c
  5. Gambacorta, L., Qiu, H., Shan, S., & Rees, D. (2024). Generative AI and labour productivity: A field experiment on coding (BIS Working Paper No. 1208). Bank for International Settlements. https://www.bis.org/publ/work1208.htm
  6. McKinsey Digital. (2023, June 27). Unleashing developer productivity with generative AI. McKinsey & Company. https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/unleashing-developer-productivity-with-generative-ai
  7. Mozannar, H., Bansal, G., Fourney, A., & Horvitz, E. (2024). Reading between the lines: Modeling user behavior and costs in AI-assisted programming. arXiv. https://doi.org/10.48550/arXiv.2210.14306
  8. Noy, S., & Zhang, W. (2023). Experimental evidence on the productivity effects of generative artificial intelligence. Science, 381(6654), 187-192. https://doi.org/10.1126/science.adh2586
  9. Peng, S., Kalliamvakou, E., Cihon, P., & Demirer, M. (2023). The impact of AI on developer productivity: Evidence from GitHub Copilot. arXiv. https://doi.org/10.48550/arXiv.2302.06590
  10. Vaithilingam, P., Zhang, T., & Glassman, E. L. (2022). Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In CHI Conference on Human Factors in Computing Systems Extended Abstracts (CHI EA ‘22). Association for Computing Machinery. https://doi.org/10.1145/3491101.3519665
  11. Yeverechyahu, D., Mayya, R., & Oestreicher-Singer, G. (2024). The impact of large language models on open-source innovation: Evidence from GitHub Copilot. arXiv. https://doi.org/10.48550/arXiv.2409.08379
  12. Hu, M. (2024, June 2). Controversy over the predictive validity of IQ on job performance. Substack. https://menghu.substack.com/p/controversy-over-the-predictive-validity-of-iq
  13. Wolfram, T. (2023). (Not just) Intelligence stratifies the occupational hierarchy: Ranking 360 professions by IQ and non-cognitive traits. Intelligence, 98, Article 101755. https://doi.org/10.1016/j.intell.2023.101755