AIs Makes Us Stupid, Smart

Table of Contents

TL;DR

AI massively improves job performance.
AI improves job performance for less competent, junior developers more than it does senior developers.
H1Bs, excessive CS graduates, and tech layoffs have created a glut of labor supply.
The theoretical labor supply has increased even further, since AI has made programming accessible to a large population pool.

Introduction

Recently, there were debates over H1B visas. The debate’s two camps were as follows:

In support, Elon Musk, Indians and his followers. By increasing H1Bs, they argue this will expand the overall developer pool in quantity and quantity of quality and will thus keep US companies competitive with lower wages and greater talent pools.
The opposition consisted of a bipartisan coalition, from leftists on Reddit to the dissident right on X. By abating H1Bs, they argue jobs will be kept for American developers and graduates, which are sufficient in quantity and quality to fulfill the needs of the US tech companies.

Instead of subsiding as ‘current thing’ debates usually do, it maintained intensity for weeks, eventually expanding in scope. Popular conservatives commentators began siding with the Elon faction arguing that young university educated men should suck it up. If their jobs are taken by H1Bs, then they should pursue careers at fast food chains instead.

The conversation sort of shifted away from tech employment and more towards generational inequality, focusing on whether life was harder for Zoomers over Boomers, etc. This is an oversimplification of course. Given the intensity and scope of the debate, many peripheral subjects were touched.

However, one insufficiently addressed subject - in my opinion - was the shifting landscape of tech employment pertaining to the impact of AIs and overall credential inflation (devaluing of education credentials at every level of educational attainment).

How are these tangentially related?

For AI, especially for developers, it may boost developer productivity by an amount comparable to over a standard deviation increase in IQ, based on typical IQ / job performance correlations.

Since the underlying complaint of the H1B opposition was that, given an increase in quantity of H1Bs, quality may necessarily fall since H1Bs would become less selective. H1B opponents are usually thinking of IQ when they think about quality. However, if the job performance gains from AI are large, then it doesn’t matter if they’re low IQ because AI can simply enhance their performance to where this simply doesn’t matter.

Of course, H1B opponents would simply retaliate by saying that, if low IQ fellows can program with AIs, then why need H1Bs at all? One could simply use nationals of deficient talent - not foreigners - as they are still in sufficient supply.

This is especially true if we consider that there are more students graduating with degrees in computer science per capita.

But this got me thinking, if anyone can be a developer now, and there’s a glut of workers on the market, then being a developer isn’t very prestigious anymore, is it? All the market forces would be pushing downward on developer salaries, as there is now a legion of applicants with sufficient post-AI ability.

Now curious, I wanted to see some real data on the situation. How much does AI improve developer, and non-developer professional performance? What’s the equivalent IQ x job performance effect size that we see? Are there any other nuances in the data?

AIs makes us stupid smart

After some digging and citation hopping, these are the main papers on productivity gains with AI. You can click on the header for the source. Unfortunately not all effects are reported in standard units.

Note: *** indicates statistical significance at p<0.01, * indicates significance at p<0.10. No * either means not significant or significance not reported.

Brynjolfsson et al. (2023)

Task: Customer support agents at a Fortune 500 software firm handling customer inquiries through chat

14% increase in customer support resolutions per hour overall***
34% increase in resolutions per hour for novice workers***
9% decrease in average customer chat duration***
1.3% increase in successful chat resolution rate*

Peng et al. (2023)

Task: Professional programmers implementing an HTTP server in JavaScript

55.8% reduction in time to complete server implementation task***
7% higher task success rate in completing implementation requirements

Gambacorta et al. (2024)

Task: Software programmers at Ant Group working on regular coding tasks

55% increase in lines of code produced overall***
67% increase in lines of code produced by junior staff***
11-18% of productivity gains directly attributable to LLM code output

Cui et al. (2024a)

Task: Software developers at Microsoft, Accenture, and Fortune 100 company performing regular development work

54.03% increase in completed pull requests at anonymous company
38.38% increase in code compilation attempts***
26.08% increase in completed pull requests overall***
13.55% increase in code commits
5.53% decrease in successful build rate

Yeverechyahu et al. (2024)

Task: Open-source developers contributing to Python and R packages

51% increase in code commits to repositories***
17.82% increase in new package version releases***
15.14% higher increase in maintenance-related commits compared to feature development***

Cui et al. (2024b)

Task: Software developers at Microsoft and Accenture performing regular development work

84-107% increase in successful code builds at Accenture***
12.92-21.83% increase in completed pull requests at Microsoft***
11.53% increase in lines of code changed***
7.51-8.69% increase in completed pull requests at Accenture***

McKinsey (2023)

Task: Software developers performing various coding tasks including documentation, generation, and refactoring

45-50% reduction in time spent on code documentation
35-45% reduction in time spent on new code generation
20-30% reduction in time spent on code refactoring
<10% reduction in time spent on complex programming tasks

Vaithilingam et al. (2022)

Task: Students and engineers completing Python programming assignments

~1 minute faster task completion time
Significantly higher helpfulness rating (6.16 vs 4.45 out of 10)***

Mozannar et al. (2024)

Task: Software developers completing pre-selected coding tasks

55.8% potential reduction in overall task completion time
76% of participants reported improved productivity (16/21 participants)
81% of participants reported faster task completion (17/21 participants)

Campero et al. (2022)

Task: HTML “programmers” and non-programmers creating web pages

27% improvement in task completion speed (using regression method)***
17% improvement in task completion speed (using ratio of means)***

Noy & Zhang (2023)

Task: College-educated professionals completing occupation-specific writing tasks

37% or 0.8 SD reduction in task completion time (from 27 to 17 minutes)***
0.45 standard deviation increase in output quality***
33% vs 18% adoption rate post-experiment***
0.40 standard deviation increase in job satisfaction***
0.20 standard deviation increase in self-efficacy*

Okay, so there are some pretty sizable gains. What was the method? What generation of AIs did they use, etc?

AIs makes us stupid, smart.

Here’s a rough table of the method, sample size, when the study was conducted, and the rough equivalent generation of AI that was employed, and the AI that was reported in use. As you can see there are many RCTs here that are well sampled.

What’s astonishing here are the AIs used in the studies.

Early versions of co-pilot which are glorified autocompletes and GPT-3.5.

THESE AIs ARE AWFUL.

GPT-3.5 was unusable, it hallucinated continuously, couldn’t do kindergarten maths, it had a memory of a few thousand words, and even then it only remembered either the beginning or the end of your conversation. There isn’t a single AI in these studies that was even as advanced as GPT-4o.

But DESPITE all that, as reported above, the gains were enormous!

An interesting finding I found perusing the papers was the presence of an interaction effect between productivity gains and developer experience or competence.

Less experienced devs benefited more! See negative interaction field.

Reference	Year of Study	AI	OpenAI Gen	Occupations	Negative Interaction	Sample Size	RCT
Brynjolfsson et al. (2023)	2020-2021	GPT-3	GPT-3	Customer Support	True***	5,179	True
Peng et al. (2023)	2022	GitHub Copilot	GPT-3	Software Developers	True	95	True
Gambacorta et al. (2024)	2023	CodeFuse (Chinese Open Source AI)	GPT-3	Software Programmers	True***	1,219	True
Cui et al. (2024a)	2024	GitHub Copilot	GPT-4	Software Developers	True***	4,867	True
Yeverechyahu et al. (2024)	2024	GitHub Copilot	GPT-4	Software Developers	Not Applicable	3,220 packages	False
Cui et al. (2024b)	2022-2023	GitHub Copilot	GPT-3.5	Software Developers	Not Tested	1,974	True
McKinsey (2023)	2023	Multiple Gen AI Tools	GPT-3.5	Software Developers	False	Not Listed	False
Vaithilingam et al. (2022)	2022	GitHub Copilot	GPT-3	Students & Software Engineers	Not Tested	24	True
Mozannar et al. (2024)	2024	GitHub Copilot	GPT-4	Software Developers	True	21	False
Campero et al. (2022)	2022	GPT-3	GPT-3	HTML “Programmers” & Non-Programmers	True?	145	True
Noy & Zhang (2023)	2023	ChatGPT	GPT-3.5	Various Professionals	True***	444	True

This negative interaction effect is theoretically devastating for developers, and any professional class apparently according to these papers. This is because it closes the gap between the incompetent and the competent, the low and high IQ, the inexperience and the experienced. Why would you hire a talented experienced senior dev when you can save 50% by going with a junior dev that can nearly get the same job done with AI? The same could be said about H1Bs. Employ foreigners that are beholden to you though a visa, pay them less, and get the same performance because they’re using AI.

By how much does AI increase developer IQ?

Okay, but what’s the ballpark equivalent IQ gain? This depends on three assumptions:

The correlation between IQ and job performance.
“Job performance” is analogous to “productivity gains” we’ve just mentioned.
Guessing from the various papers, which are heterogenous in how they measure productivity, the standard deviation gain in productivity or job performance.

We don’t really have to assume point one, but it is fiercely debated - check out Meng Hu’s great article. From reading his work, it seems that the correlation between IQ and productivity of about r=0.4.

Point two difficult to ascertain because, as it is even in job performance x IQ papers, measurements are heterogeneous. Not every paper is using the same method to measure job performance, and not every AI productivity paper is either. So we need to assume that the two are mostly analogous. I think this is likely true, but it is an assumption.

Lastly, and annoyingly, not every paper is reporting standardized gains. Completing a task 37% faster might sound impressive, but if it only equates to a performance gain of 0.2 SDs, then it’s misleading. So we have to ballpark guess.

Given the above, I wouldn’t take the below too seriously. However, given that the gains that were saw in the papers above were from low quality AIs, in a way these should be considered lower bound.

My personal guess:

95% CIs 0.1-0.6 SD improvement or 0.35 SD for seniors.
95% CIs 0.3-1 SD improvement or 0.65 SD for juniors.

Assuming IQ x job performance correlation of 0.4, a 0.35 and 0.65 SD gain, this is a 0.875 or 1.625 SD IQ gain equivalent. That’s about ~13 or ~24 IQ points respectively.

Accessing the supplementary materials of this paper, it appears that “Programmers and software development professionals” have an IQ of about 111.2.

So maybe a developer from a third world country with an IQ of ~90 can perform as well as a first world programmer post AI? That seems to be the implication here if we take the ~24 point gain at face value.

Conclusion, Discussion

Recent data suggests AIs have democratized programming by effectively adding >15 IQ points to developers’ capabilities. The effect is most pronounced among less competent programmers, with multiple RCTs showing 30-70% productivity gains using even primitive AI models like GPT-3.5.

This reduced barrier to entry, combined with credential inflation (surge in CS graduates), theoretically should be negative for developer employment. Why hire senior devs when juniors with AI assistance can perform similarly at half the cost? AI doesn’t just augment talent, it flattens the skill distribution.

The implications are stark. Developer salaries will likely continue declining as the market saturates with AI-augmented developers. The profession’s prestige diminishes as programming becomes increasingly accessible.

Already, the job situation hasn’t returned to pre-COVID levels.

The future may see programming transform from a high-skill profession to a commodity skill, with AI serving as the great equalizer. That is until AI replaces the occupation entirely.

References

Brynjolfsson, E., Li, D., & Raymond, L. (2023). Generative AI at work. arXiv. https://doi.org/10.48550/arXiv.2304.11771
Campero, A., Vaccaro, M., Song, J., Wen, H., Almaatouq, A., & Malone, T. W. (2022). A test for evaluating performance in human-computer systems. arXiv. https://doi.org/10.48550/arXiv.2206.12390
Cui, Z., Demirer, M., Jaffe, S., Musolff, L., Peng, S., & Salz, T. (2024a). The effects of generative AI on high skilled work: Evidence from three field experiments with software developers. SSRN. http://dx.doi.org/10.2139/ssrn.4945566
Cui, K., Demirer, M., Jaffe, S., Musolff, L., Peng, S., & Salz, T. (2024b). The productivity effects of generative AI: Evidence from a field experiment with GitHub Copilot. MIT Exploration of Generative AI. https://doi.org/10.21428/e4baedd9.3ad85f1c
Gambacorta, L., Qiu, H., Shan, S., & Rees, D. (2024). Generative AI and labour productivity: A field experiment on coding (BIS Working Paper No. 1208). Bank for International Settlements. https://www.bis.org/publ/work1208.htm
McKinsey Digital. (2023, June 27). Unleashing developer productivity with generative AI. McKinsey & Company. https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/unleashing-developer-productivity-with-generative-ai
Mozannar, H., Bansal, G., Fourney, A., & Horvitz, E. (2024). Reading between the lines: Modeling user behavior and costs in AI-assisted programming. arXiv. https://doi.org/10.48550/arXiv.2210.14306
Noy, S., & Zhang, W. (2023). Experimental evidence on the productivity effects of generative artificial intelligence. Science, 381(6654), 187-192. https://doi.org/10.1126/science.adh2586
Peng, S., Kalliamvakou, E., Cihon, P., & Demirer, M. (2023). The impact of AI on developer productivity: Evidence from GitHub Copilot. arXiv. https://doi.org/10.48550/arXiv.2302.06590
Vaithilingam, P., Zhang, T., & Glassman, E. L. (2022). Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In CHI Conference on Human Factors in Computing Systems Extended Abstracts (CHI EA ‘22). Association for Computing Machinery. https://doi.org/10.1145/3491101.3519665
Yeverechyahu, D., Mayya, R., & Oestreicher-Singer, G. (2024). The impact of large language models on open-source innovation: Evidence from GitHub Copilot. arXiv. https://doi.org/10.48550/arXiv.2409.08379
Hu, M. (2024, June 2). Controversy over the predictive validity of IQ on job performance. Substack. https://menghu.substack.com/p/controversy-over-the-predictive-validity-of-iq
Wolfram, T. (2023). (Not just) Intelligence stratifies the occupational hierarchy: Ranking 360 professions by IQ and non-cognitive traits. Intelligence, 98, Article 101755. https://doi.org/10.1016/j.intell.2023.101755

Previous:
Incels Rising International Edition

Next:
PedoAI