~/today's vibe
Published on

AI Code Has 2.74x More Vulnerabilities

Authors
  • avatar
    Name
    오늘의 바이브
    Twitter

1.6 Million Apps, 141.3 Million Findings

Code screen showing security vulnerability analysis

On February 24, 2026, Veracode published its 16th annual "State of Software Security" report. The dataset: 1.6 million unique applications. The methods: static analysis (SAST) producing 115.6 million findings, software composition analysis (SCA) producing 22.1 million, and dynamic analysis (DAST) adding another 3.6 million. In total, 141.3 million raw security findings.

The numbers tell a clear story. The AI coding productivity party is over. The security hangover has arrived. AI-generated code contains 2.74x more vulnerabilities than human-written code. 82% of organizations carry security debt older than one year. High-risk vulnerability share jumped 36% year over year. The same tools that accelerated code production accelerated flaw production at the same rate.

Why AI Code Is 2.74x More Dangerous

Veracode's GenAI Code Security Report ran a controlled experiment. Over 100 LLMs were tested across 4 languages -- Java, JavaScript, Python, and C# -- on 80 coding tasks. Each task was designed around the MITRE CWE framework, targeting scenarios where security vulnerabilities commonly emerge.

The results are blunt. 45% of AI-generated code samples introduced OWASP Top 10 vulnerabilities. Nearly half the output was insecure.

Break it down by vulnerability type and it gets worse.

Vulnerability TypeCWEAI Failure Rate
Log InjectionCWE-11788%
Cross-Site ScriptingCWE-8086%
SQL InjectionCWE-89Tested
Insecure Deserialization--Tested

88% failure rate on log injection. 86% on XSS. These are not obscure edge cases. These are the most basic web security principles, and AI fails at them almost every time. Veracode CTO Jens Wessling put it directly: "Our research reveals GenAI models make the wrong choices nearly half the time, and it's not improving."

The Language Gap: Java Is the Worst Offender

Security data visualization on a matrix screen

The same AI produces vastly different security outcomes depending on the programming language.

LanguageSecurity Failure Rate
Java72%
Python38-45%
C#38-45%
JavaScript38-45%

Java is the clear outlier at 72%. Python, C#, and JavaScript cluster in the 38-45% range -- better, but still means more than a third of all generated code is insecure. Java's high failure rate likely stems from its complex security-sensitive APIs: serialization, JDBC, logging frameworks. These APIs have subtle security implications that AI models haven't learned to handle.

This should alarm anyone using AI to generate backend code. If 7 out of 10 AI-generated Java code samples fail security tests, what happens when that code hits production? The vibe coding trend -- letting AI handle everything from prototype to deployment -- runs headfirst into this data.

Security Debt Hits 82%

Veracode's bigger finding isn't about AI code generation specifically. It's about security debt: vulnerabilities left unfixed for more than a year. Think of it as technical debt's more dangerous cousin.

Metric2026 ValueYoY Change
Organizations with security debt82%+11%
Organizations with critical debt60%+20%
High-risk vulnerability share11.3%+36%
Apps with any flaw78%Slight decrease
Apps with open-source vulnerabilities62%Down from 70%

Four out of five organizations are sitting on year-old vulnerabilities. 60% carry flaws severe enough to cause catastrophic damage if exploited -- up 20% from last year. The irony: overall flaw prevalence and open-source vulnerability rates actually dipped slightly. Fewer new flaws per app, but more unfixed flaws accumulating. The creation rate is outpacing the fix rate.

Chris Wysopal, Veracode's Chief Security Evangelist, summarized: "The speed of software development has skyrocketed, meaning the pace of flaw creation is outstripping the current capacity for remediation."

66% of the Worst Vulnerabilities Aren't Your Code

Cybersecurity threat visualization screen

One statistic from the report stands out. 66% of the most dangerous, longest-lived vulnerabilities originate from third-party libraries and open-source dependencies. Two-thirds of the problem lives in code you didn't write -- code you imported.

This gets worse in the AI coding era. AI models aggressively pull in external packages. They npm install and pip install without checking whether a package is actively maintained or whether the recommended version has known CVEs. AI doesn't ask "is there a safer alternative?" It recommends whatever was popular in its training data.

Apiiro's research on Fortune 50 companies backs this up. AI-generated code showed 322% more privilege escalation paths, 153% more design flaws, and a 40% jump in secrets exposure. CVSS 7.0+ vulnerabilities appeared 2.5x more frequently in AI-generated code. As of June 2025, these companies were seeing 10,000 new security findings per month -- a 10x increase from December 2024.

Reasoning Models: The Only Sign of Progress

In October 2025, Veracode updated its model-by-model security benchmarks. The results reveal a sharp divide.

ModelSecurity Pass Rate
GPT-5 Mini (reasoning)72%
GPT-5 (reasoning)70%
GPT-5-chat (non-reasoning)52%
Claude Sonnet 4.5 (Anthropic)50%
Claude Opus 4.1 (Anthropic)49%
Other models (Google, Qwen, xAI)50-59%

GPT-5's reasoning models posted the highest security pass rates at 70-72%. The gap between reasoning and non-reasoning versions of the same model is 18-20 percentage points. Reasoning models appear to perform an internal code review -- "thinking through" security implications before generating output.

But 72% as the ceiling means 3 out of 10 code samples are still insecure. Non-reasoning models, regardless of size or training sophistication, stall around 50%. Anthropic's Claude actually regressed slightly from Opus 4.1 (50%) to Sonnet 4.5 (49%). Veracode concluded that "reasoning models are the only ones showing meaningful security improvement."

The Remediation Crisis

Data dashboard and analytics screen

The report's core message is simple: vulnerabilities are being created faster than they're being fixed. The Register characterized it as "the velocity of development in the AI era makes comprehensive security unattainable."

The data is concrete. Apiiro's Fortune 50 research showed AI-related security findings increased 10x between December 2024 and June 2025 -- 10,000 new findings per month. A study of 1,645 vibe-coded applications on Sweden's Lovable platform found 170 (10.3%) contained exploitable vulnerabilities.

Wysopal's proposed framework is "Prioritize, Protect, Prove." Focus on the 11.3% of flaws that pose actual real-world risk. Deploy automated remediation at scale. Demonstrate compliance. The logic: 78% of apps have flaws, but only 11.3% are genuinely dangerous. With limited resources, triage ruthlessly.

The Hangover Is Just Starting

AI coding's security hangover is here, and it will get worse before it gets better. Faster code generation means faster vulnerability generation. If remediation can't keep pace, security debt compounds.

The irony is that AI is both the cause and the likely cure. The same technology producing insecure code is the only technology that can fix it at scale. Reasoning models showing meaningful security gains hints at the direction: generate fast, validate automatically, remediate immediately. Without that pipeline, AI-era security has no viable path forward.

Most teams today lack this pipeline. They generate code with AI but rely on manual review for security validation. Production velocity is 10x higher. Security review bandwidth hasn't moved. The 82% security debt figure is the direct result of that gap.

If you adopted an AI code generator, you need an AI code security tool right alongside it. Enjoying the productivity gains while deferring the security bill is not a strategy. 1.6 million apps worth of data just proved it.


Sources