Table of Contents
TL;DR
-
AI massively improves job performance.
-
AI improves job performance for less competent, junior developers more than it does senior developers.
-
H1Bs, excessive CS graduates, and tech layoffs have created a glut of labor supply.
-
The theoretical labor supply has increased even further, since AI has made programming accessible to a large population pool.
Introduction
Recently, there were debates over H1B visas. The debate’s two camps were as follows:
-
In support, Elon Musk, Indians and his followers. By increasing H1Bs, they argue this will expand the overall developer pool in quantity and quantity of quality and will thus keep US companies competitive with lower wages and greater talent pools.
-
The opposition consisted of a bipartisan coalition, from leftists on Reddit to the dissident right on X. By abating H1Bs, they argue jobs will be kept for Americans developers and graduates, which are sufficient in quantity and quality to fulfill the needs of the US tech companies.
Instead of subsiding as ‘current thing’ debates usually do, it maintained intensity for weeks, eventually expanding in scope. Popular conservatives commentators began siding with the Elon faction arguing that young university educated men should suck it up. If their jobs are taken by H1Bs, then they should pursue careers at fast food chains instead.
The conversation sort of shifted away from tech employment and more towards generational inequality, focusing on whether life was harder for Zoomers over Boomers, etc. This is an oversimplification of course. Given the intensity and scope of the debate, many peripheral subjects were touched.
However, one insufficiently addressed subject - in my opinion - that was the shifting landscape of tech employment pertaining to the impact of AIs and overall credential inflation (devaluing of education credentials at every level of educational attainment).
How are these tangentially related?
For AI, especially for developers, it may boost developer productivity by an amount comparable to over a standard deviation increase in IQ, based on typical IQ / job performance correlations.
Since the underlying complaint of the H1B opposition was that, given an increase in quantity of H1Bs, quality may necessarily fall since H1Bs would become less selective. H1B opponents are usually thinking of IQ when they think about quality. However, if the job performance gains from AI are large, then it doesn’t matter if they’re low IQ because AI can simply enhance their performance to where this simply doesn’t matter.
Of course, H1B opponents would simply retaliate by saying that, if low IQ fellows can program with AIs, then why need H1Bs at all? One could simply use nationals of deficient talent - not foreigners - as they are still in sufficient supply.
This is especially true if we consider that there are more students graduating with degrees in computer science per capita.

But this got me thinking, if anyone can be a developer now, and there’s a glut of workers on the market, then being a developer isn’t very prestigious anymore, is it? All the market forces would be pushing downward on developer salaries, as there is now a legion of applicants with sufficient post-AI ability.
Now curious, I wanted to see some real data on the situation. How much does AI improve developer, and non-developer professional performance? What’s the equivalent IQ x job performance effect size that we see? Are there any other nuances in the data?
AIs makes us stupid smart
After some digging and citation hopping, these are the main papers on productivity gains with AI. You can click on the header for the source. Unfortunately not all effects are reported in standard units.
Note: *** indicates statistical significance at p<0.01, * indicates significance at p<0.10. No * either means not significant or significance not reported.
Brynjolfsson et al. (2023)
Task: Customer support agents at a Fortune 500 software firm handling customer inquiries through chat
- 14% increase in customer support resolutions per hour overall***
- 34% increase in resolutions per hour for novice workers***
- 9% decrease in average customer chat duration***
- 1.3% increase in successful chat resolution rate*
Peng et al. (2023)
Task: Professional programmers implementing an HTTP server in JavaScript
- 55.8% reduction in time to complete server implementation task***
- 7% higher task success rate in completing implementation requirements
Gambacorta et al. (2024)
Task: Software programmers at Ant Group working on regular coding tasks
- 55% increase in lines of code produced overall***
- 67% increase in lines of code produced by junior staff***
- 11-18% of productivity gains directly attributable to LLM code output
Cui et al. (2024a)
Task: Software developers at Microsoft, Accenture, and Fortune 100 company performing regular development work
- 54.03% increase in completed pull requests at anonymous company
- 38.38% increase in code compilation attempts***
- 26.08% increase in completed pull requests overall***
- 13.55% increase in code commits
- 5.53% decrease in successful build rate
Yeverechyahu et al. (2024)
Task: Open-source developers contributing to Python and R packages
- 51% increase in code commits to repositories***
- 17.82% increase in new package version releases***
- 15.14% higher increase in maintenance-related commits compared to feature development***
Cui et al. (2024b)
Task: Software developers at Microsoft and Accenture performing regular development work
- 84-107% increase in successful code builds at Accenture***
- 12.92-21.83% increase in completed pull requests at Microsoft***
- 11.53% increase in lines of code changed***
- 7.51-8.69% increase in completed pull requests at Accenture***
Gains are notable here because not everyone in the treatment group even used AI, they just had the option to use it, furthermore, adoption of AI use was slow.
McKinsey (2023)
Task: Software developers performing various coding tasks including documentation, generation, and refactoring
- 45-50% reduction in time spent on code documentation
- 35-45% reduction in time spent on new code generation
- 20-30% reduction in time spent on code refactoring
- <10% reduction in time spent on complex programming tasks
Vaithilingam et al. (2022)
Task: Students and engineers completing Python programming assignments
- ~1 minute faster task completion time
- Significantly higher helpfulness rating (6.16 vs 4.45 out of 10)***
Mozannar et al. (2024)
Task: Software developers completing pre-selected coding tasks
- 55.8% potential reduction in overall task completion time
- 76% of participants reported improved productivity (16/21 participants)
- 81% of participants reported faster task completion (17/21 participants)
Campero et al. (2022)
Task: HTML “programmers” and non-programmers creating web pages
- 27% improvement in task completion speed (using regression method)***
- 17% improvement in task completion speed (using ratio of means)***
Noy & Zhang (2023)
Task: College-educated professionals completing occupation-specific writing tasks
- 37% or 0.8 SD reduction in task completion time (from 27 to 17 minutes)***
- 0.45 standard deviation increase in output quality***
- 33% vs 18% adoption rate post-experiment***
- 0.40 standard deviation increase in job satisfaction***
- 0.20 standard deviation increase in self-efficacy*
Okay, so there are some pretty sizable gains. What was the method? What generation of AIs did they use, etc?
AIs makes us stupid, smart.
Here’s a rough table of the method, sample size, when the study was conducted, and the rough equivalent generation of AI that was employed, and the AI that was reported in use. As you can see there are many RCTs here, that are well sampled.
What’s astonishing here are the AIs used in the studies.
Early versions of co-pilot which are basically glorified autocompletes and GPT-3.5.
THESE AIs ARE AWFUL.
GPT-3.5 was unusable, it hallucinated continuously, couldn’t do kindergarten maths, it had a memory of a few thousand words, and even then it only remembered either the beginning or the end of your conversation. There isn’t a single AI in these studies that was even as advanced as GPT-4o.
But DESPITE all that, as reported above, the gains were enormous!
An interesting finding I found perusing the papers was the presence of an interaction effect between productivity gains and developer experience or competence.
Less experienced devs benefited more! See negative interaction field.
Reference | Year of Study | AI | OpenAI Gen | Occupations | Negative Interaction | Sample Size | RCT |
---|---|---|---|---|---|---|---|
Brynjolfsson et al. (2023) | 2020-2021 | GPT-3 | GPT-3 | Customer Support | True*** | 5,179 | True |
Peng et al. (2023) | 2022 | GitHub Copilot | GPT-3 | Software Developers | True | 95 | True |
Gambacorta et al. (2024) | 2023 | CodeFuse (Chinese Open Source AI) | GPT-3 | Software Programmers | True*** | 1,219 | True |
Cui et al. (2024a) | 2024 | GitHub Copilot | GPT-4 | Software Developers | True*** | 4,867 | True |
Yeverechyahu et al. (2024) | 2024 | GitHub Copilot | GPT-4 | Software Developers | Not Applicable | 3,220 packages | False |
Cui et al. (2024b) | 2022-2023 | GitHub Copilot | GPT-3.5 | Software Developers | Not Tested | 1,974 | True |
McKinsey (2023) | 2023 | Multiple Gen AI Tools | GPT-3.5 | Software Developers | False | Not Listed | False |
Vaithilingam et al. (2022) | 2022 | GitHub Copilot | GPT-3 | Students & Software Engineers | Not Tested | 24 | True |
Mozannar et al. (2024) | 2024 | GitHub Copilot | GPT-4 | Software Developers | True | 21 | False |
Campero et al. (2022) | 2022 | GPT-3 | GPT-3 | HTML “Programmers” & Non-Programmers | True? | 145 | True |
Noy & Zhang (2023) | 2023 | ChatGPT | GPT-3.5 | Various Professionals | True*** | 444 | True |
This negative interaction effect is theoretically devastating for developers, and any professional class apparently according to these papers. This is because it closes the gap between the incompetent and the competent, the low and high IQ, the inexperience and the experienced. Why would you hire a talented experienced senior dev when you can save 50% by going with a junior dev that can nearly get the same job done with AI? The same could be said about H1Bs. Employ foreigners that are beholden to you though a visa, pay them less, and get the same performance because they’re using AI.
By how much does AI increase developer IQ?
Okay, but what’s the ballpark equivalent IQ gain? This depends on three assumptions:
- The correlation between IQ and job performance.
- “Job performance” is analogous to “productivity gains” we’ve just mentioned.
- Guessing from the various papers, which are heterogenous in how they measure productivity, the standard deviation gain in productivity or job performance.
We don’t really have to assume point one, but it is fiercely debated - check out Meng Hu’s great article. From reading his work, it seems that the correlation between IQ and productivity of about r=0.4.
Point two difficult to ascertain because, as it is even in job performance x IQ papers, measurements are heterogeneous. Not every paper is using the same method to measure job performance, and not every AI productivity paper is either. So we need to assume that the two are mostly analogous. I think this is likely true, but it is an assumption.
Lastly, and annoyingly, not every paper is reporting standardized gains. Completing a task 37% faster might sound impressive, but if it only equates to a performance gain of 0.2 SDs, then it’s misleading. So we have to ballpark guess.
Given the above, I wouldn’t take the below too seriously. However, given that the gains that were saw in the papers above were from low quality AIs, in a way these should be considered lower bound.
My personal guess:
- 95% CIs 0.1-0.6 SD improvement or 0.35 SD for seniors.
- 95% CIs 0.3-1 SD improvement or 0.65 SD for juniors.
Assuming IQ x job performance correlation of 0.4, a 0.35 and 0.65 SD gain, this is a 0.875 or 1.625 SD IQ gain equivalent. That’s about ~13 or ~24 IQ points respectively.
Accessing the supplementary materials of this paper, it appears that “Programmers and software development professionals” have an IQ of about 111.2.
So maybe a developer from a third world country with an IQ of ~90 can perform as well as a first world programmer post AI? That seems to be the implication here if we take the ~24 point gain at face value.
Conclusion, Discussion
Recent data suggests AIs have democratized programming by effectively adding >15 IQ points to developers’ capabilities. The effect is most pronounced among less competent programmers, with multiple RCTs showing 30-70% productivity gains using even primitive AI models like GPT-3.5.
This reduced barrier to entry, combined with credential inflation (surge in CS graduates), theoretically should be negative for developer employment. Why hire senior devs when juniors with AI assistance can perform similarly at half the cost? AI doesn’t just augment talent, it flattens the skill distribution.
The implications are stark. Developer salaries will likely continue declining as the market saturates with AI-augmented developers. The profession’s prestige diminishes as programming becomes increasingly accessible.
Already, the job situation hasn’t returned to pre-COVID levels.

The future may see programming transform from a high-skill profession to a commodity skill, with AI serving as the great equalizer. That is until AI replaces the occupation entirely.
References
- Brynjolfsson, E., Li, D., & Raymond, L. (2023). Generative AI at work. arXiv. https://doi.org/10.48550/arXiv.2304.11771
- Campero, A., Vaccaro, M., Song, J., Wen, H., Almaatouq, A., & Malone, T. W. (2022). A test for evaluating performance in human-computer systems. arXiv. https://doi.org/10.48550/arXiv.2206.12390
- Cui, Z., Demirer, M., Jaffe, S., Musolff, L., Peng, S., & Salz, T. (2024a). The effects of generative AI on high skilled work: Evidence from three field experiments with software developers. SSRN. http://dx.doi.org/10.2139/ssrn.4945566
- Cui, K., Demirer, M., Jaffe, S., Musolff, L., Peng, S., & Salz, T. (2024b). The productivity effects of generative AI: Evidence from a field experiment with GitHub Copilot. MIT Exploration of Generative AI. https://doi.org/10.21428/e4baedd9.3ad85f1c
- Gambacorta, L., Qiu, H., Shan, S., & Rees, D. (2024). Generative AI and labour productivity: A field experiment on coding (BIS Working Paper No. 1208). Bank for International Settlements. https://www.bis.org/publ/work1208.htm
- McKinsey Digital. (2023, June 27). Unleashing developer productivity with generative AI. McKinsey & Company. https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/unleashing-developer-productivity-with-generative-ai
- Mozannar, H., Bansal, G., Fourney, A., & Horvitz, E. (2024). Reading between the lines: Modeling user behavior and costs in AI-assisted programming. arXiv. https://doi.org/10.48550/arXiv.2210.14306
- Noy, S., & Zhang, W. (2023). Experimental evidence on the productivity effects of generative artificial intelligence. Science, 381(6654), 187-192. https://doi.org/10.1126/science.adh2586
- Peng, S., Kalliamvakou, E., Cihon, P., & Demirer, M. (2023). The impact of AI on developer productivity: Evidence from GitHub Copilot. arXiv. https://doi.org/10.48550/arXiv.2302.06590
- Vaithilingam, P., Zhang, T., & Glassman, E. L. (2022). Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In CHI Conference on Human Factors in Computing Systems Extended Abstracts (CHI EA ‘22). Association for Computing Machinery. https://doi.org/10.1145/3491101.3519665
- Yeverechyahu, D., Mayya, R., & Oestreicher-Singer, G. (2024). The impact of large language models on open-source innovation: Evidence from GitHub Copilot. arXiv. https://doi.org/10.48550/arXiv.2409.08379
- Hu, M. (2024, June 2). Controversy over the predictive validity of IQ on job performance. Substack. https://menghu.substack.com/p/controversy-over-the-predictive-validity-of-iq
- Wolfram, T. (2023). (Not just) Intelligence stratifies the occupational hierarchy: Ranking 360 professions by IQ and non-cognitive traits. Intelligence, 98, Article 101755. https://doi.org/10.1016/j.intell.2023.101755