Skip to main content

Modernization Intel / Research

AI & Modernization

Two directions, one dependency. AI tools now cut code migration cost by 70% and testing cost by 86%. But 82% of enterprises can't use AI at all — because their data infrastructure isn't ready. Here's both sides of the equation.

80%+
COBOL-to-Java structural accuracy (IBM WCA4Z, 2025)
86%
cost reduction: AI-native vs manual testing
82%
of enterprises blocked by fragmented data (Cisco, 2025)
5%
of firms are AI future-built — 5× revenue vs laggards (BCG, 2025)

Two Directions, One Research Hub

This hub covers a bidirectional relationship between AI and modernization that is frequently conflated into a single, confusing narrative. We separate it into two distinct questions:

Direction A
AI as a Modernization Tool

Using LLMs and AI tools to accelerate legacy code migration, generate test suites, and document undocumented systems. The tooling is real and maturing fast.

Direction B
Modernization for AI Readiness

What organizations must modernize — data infrastructure, APIs, MLOps pipelines — before AI can deliver value in production. Most companies are blocked here.

The AI and modernization relationship has reached an inflection point in 2026. AI-assisted code translation tools have crossed the threshold of practical usefulness: IBM's watsonx Code Assistant for Z achieves 80%+ structural accuracy on COBOL-to-Java translation, and AI-native testing platforms cut annual testing costs from $6M to $840K. At the same time, BCG research establishes that only 5% of firms are "AI future-built" — delivering 5× the revenue increases of laggards — while 60% report negligible gains despite substantial investment, almost always because their data infrastructure cannot support production AI.

+ Read full background

The AI tooling market for software modernization has bifurcated. General-purpose tools (GitHub Copilot, Google Gemini Code Assist) excel at everyday developer assistance but produce inadequate results for legacy language translation — they generate syntactically correct code that lacks the semantic understanding of COBOL's data division structure, packed-decimal arithmetic, and CICS transaction context. Specialized tools (IBM WCA4Z, Amazon Q Developer Transform) are purpose-built for legacy modernization and produce demonstrably better results for that specific use case.

On the readiness side, Gartner's prediction that 40% of agentic AI projects will be cancelled by 2027 due to infrastructure constraints reflects the gap between AI ambition and data reality. Pilots succeed in sandboxed environments with clean, curated data. Production deployment fails when the agent must call a 1990s batch system via SFTP, read from five inconsistent CRMs, or wait 24 hours for a nightly ETL job to refresh the data it needs in real time. Modernization is the prerequisite — not an alternative — to AI value.

Direction A: AI as a Modernization Tool

AI tools are demonstrably useful for three specific modernization tasks: code translation, test generation, and documentation of undocumented systems.

AI Code Translation: What the Benchmarks Actually Show

IBM watsonx Code Assistant for Z

The most documented results for enterprise COBOL-to-Java translation. Achieves median structural quality above 80% and functional correctness above 75% across client deployments. A 2025 study found that summary augmentation — adding natural language descriptions to COBOL source before translation — improves 36% of CodeNet benchmark samples (727 samples) and 50% of low-scoring enterprise samples (571 samples). The tool handles CICS integration, VSAM/DB2 data access patterns, and COBOL-compatible serialization that general-purpose LLMs consistently miss.

Best choice for COBOL-to-Java at enterprise scale

GitHub Copilot & General-Purpose LLMs

Produces "plain class skeletons" for COBOL translation — syntactically valid Java that lacks business logic understanding. Misses database handling specifics, CICS context, and COBOL data structure semantics (COMP-3 packed decimal, REDEFINES, 88-level condition names). Appropriate for everyday developer assistance and greenfield code but not for legacy language migration at scale. GPT-4-class models exhibit 33–60% hallucination rates in general translation tasks; best models achieve sub-1% but most fall in the 2–5% range (Vectara 2026).

Use for modern code; avoid for COBOL/legacy migration
TOOL BEST USE CASE COBOL QUALITY PRICING
IBM watsonx Code Asst. for Z COBOL → Java, mainframe migration 80%+ structural, 75%+ functional Enterprise contract
Amazon Q Developer Transform Java 8→17 upgrades, .NET migration Moderate (Java-focused) $19/user/month
GitHub Copilot Daily coding assistance, greenfield Poor (class skeletons only) $19-$39/user/month
Google Gemini Code Assist GCP workloads, modern languages No published legacy benchmarks $19/user/month
Moderne Large-scale Java/Spring refactoring Excellent for Java, no COBOL Enterprise contract

AI Test Generation for Legacy Code: Documented Results

UnitTenX (2025)
100%
line coverage achieved on real-world C codebase starting from 0%
186/199 functions (93.5%) covered automatically
Salesforce / Cursor AI
85%
reduction in time to reach legacy code coverage targets
Applied to repos with <10% initial test coverage
Virtuoso (2025)
$840K
annual testing cost with AI-native platform
vs $6.03M manual, $2.29M traditional automation

Where AI Test Generation Fails

AI-generated tests cover code paths reliably but cannot validate that the business rules encoded in those paths are correct. For undocumented legacy systems, the business logic is the unknown — the AI has no way to verify that a 1998 calculation produces the right answer because there is no specification to check against. The Salesforce approach — mandatory human review of all generated tests, AI-generated JavaDoc comments for human validation, SonarQube as final gate — is the right model. Treat AI test generation as coverage scaffolding, not correctness verification.

AI Legacy Code Documentation Tools

LegacyMap (Sector7)

Specialized for COBOL, FORTRAN, BASIC, C++, and Pascal on OpenVMS and Mainframe (z/OS). Generates structured callgraphs, SQL access diagrams, and procedure maps without requiring code changes. Supports platform-specific dialects including VAX, Alpha, and z/OS variants. Best tool for generating structural maps of undocumented mainframe estates — accelerates onboarding and identifies dead code without modifying production systems.

Codegram

Targets VB, Delphi, and COBOL to modern languages (Java, C#, Python) with integrated documentation generation, debugging tools, and code optimization. Provides structured documentation templates and maintains conversion history for auditing. Combined conversion + documentation toolchain for mid-market legacy application migrations. Appropriate for organizations that want a unified tool rather than separate documentation and translation workflows.

Quality ceiling on documentation tools: All documentation tools excel at structural analysis — identifying callers/callees, visualizing system flows, and accelerating engineer onboarding. None can reconstruct business intent from undocumented logic. They map what the code does, not what it was supposed to do or why specific decisions were made. Human domain experts remain required for semantic validation.

Direction B: Modernization for AI Readiness

Before AI delivers value in production, the underlying infrastructure must support it. Most enterprises are blocked at data and integration layers.

The Blockers: Why 60% of Companies See No AI Value

Data Fragmentation (82% of enterprises)

Cisco's 2025 global study: 82% of enterprises report fragmented data blocks accessibility and slows AI integration. 69% report poor data quality limits decisions. 45% cite fragmented, unstructured data as the top AI adoption blocker. 86% of companies now prioritize data unification as their #1 AI readiness initiative. The specific failure: siloed CRM, POS, ecommerce, service, and transaction data that cannot be joined — making it impossible to calculate true customer lifetime value or provide AI agents with full context.

Legacy System Integration (60% of AI leaders)

Deloitte Tech Trends 2026: 60% of AI leaders identify legacy system integration as their primary barrier to agentic AI adoption. Gartner predicts 40% of agentic AI projects will be cancelled by end of 2027 due to infrastructure constraints. The failure pattern: AI pilots succeed in isolated environments with clean test data, then production deployment fails when connecting to legacy systems without APIs, real-time data, or the query patterns that AI agents generate. Monolithic architectures requiring full-stack coordination for minor updates, tightly coupled systems with brittle integrations, and waterfall development cycles that bury AI pilots in backlogs are the specific culprits.

The Cost of Not Modernizing for AI (BCG, October 2025)

AI FUTURE-BUILT COMPANIES
5× revenue gains
vs AI laggards, 3× cost reductions
SHARE OF FIRMS THAT ARE FUTURE-BUILT
5% globally
35% scaling; 60% reaping minimal value
AI ADOPTERS WITH NO REAL GAINS
80%+
McKinsey, June 2025 — stuck in copilot mode

What AI-Ready Data Infrastructure Looks Like

AI-Ready Enterprise

  • API integration layers: 80% adoption rate among AI-ready organizations
  • Data lake/lakehouse architecture (77% adoption)
  • Enterprise data warehouse for structured analytics (72%)
  • Data governance with contracts, freshness SLAs, and error budgets
  • Event-driven pipelines that activate instantly on business events
  • Federated data mesh: teams publish data products under shared contracts
  • Policy-as-code: consent, residency, and retention in version control

Legacy Enterprise (AI Blocked)

  • Siloed CRM, POS, ecommerce, service data — no joins possible
  • No APIs on core systems — batch SFTP or database polling only
  • Nightly ETL — agents can't get real-time data
  • "One-off" integrations with no reusable contracts or schemas
  • No data lineage or quality controls — models train on stale data
  • Manual approvals for deploying or retraining models
  • Inflexible permission structures blocking experimentation

MLOps Maturity: The Gap Between Pilot and Production

A 2025 consolidated MLOps lifecycle framework identifies the most common gaps that prevent AI from reaching production at legacy organizations. The enterprise AI readiness framework requires an 80%+ data quality score, executive sponsorship, cloud ML infrastructure, and defined ROI metrics before model deployment — criteria most organizations pursuing AI pilots cannot currently meet.

No MLOps CI/CD Pipeline

Models are trained by data scientists but deployed manually — inconsistently and infrequently. No automation for model validation, staging, rollback, or A/B testing. Each deployment is a one-off project.

No Production Monitoring

Models degrade silently when production data drifts from training data. No alerting when model performance drops. No feedback loops into retraining cycles. Models deployed in 2024 still running without updates in 2026.

Undefined Roles

Organizations have data scientists but no ML engineers. No one owns the boundary between model development and production deployment. MLOps responsibilities fall into a gap between data science and IT operations.

No Data Governance

Models trained on data with no quality controls, freshness guarantees, or bias audits. Regulatory exposure increases as AI decisions become consequential. EU AI Act compliance requires documentation that doesn't exist.

AI Readiness Modernization Path: Three Phases

Phase 1 — 3–6 Months

Data Foundation

  • → Data quality audit (target 80%+ score)
  • → API layer over legacy core systems
  • → Master data governance baseline
  • → Cloud infrastructure rebalancing
  • → Data sovereignty compliance mapping
Phase 2 — 6–12 Months

Integration & Pipelines

  • → Data lakehouse implementation
  • → Real-time event-driven pipelines
  • → MLOps CI/CD pipeline build
  • → Model monitoring & observability
  • → Feature store for shared ML features
Phase 3 — 12–24 Months

Production AI

  • → Federated data mesh with guardrails
  • → Policy-as-code for AI governance
  • → Fine-tuning on proprietary data
  • → Agentic system deployment
  • → EU AI Act / regulatory compliance

Only one-third of enterprises are currently fully equipped to deploy AI at scale (NetApp, October 2025). Timeline compresses when data quality is already high; extends when core systems are on-premise with no APIs.

Service Guides

AI readiness assessment and modernization strategy services.

Cost Benchmarks

AI-assisted vs manual modernization cost comparison, sourced from published 2025 data.

AI-Assisted vs Manual Modernization Costs

* Costs are industry averages based on market research

Testing Cost Transformation: Manual → AI-Native (Virtuoso 2025)

APPROACH ANNUAL COST VS MANUAL 3-YEAR SAVINGS
Manual Testing $6,030,000 Baseline
Traditional Test Automation (Selenium/Cypress) $2,290,000 −62% $11.2M
AI-Native Testing $840,000 −86% $15.6M

* Source: Virtuoso QA ROI calculator, 2025. 3-year savings vs manual baseline. Does not include competitive advantage value ($47M+ estimated by Virtuoso for faster release velocity).

Looking for implementation partners?

AI Modernization Services & Vendor Guide

Compare 10 AI modernization tools and strategy partners, see adoption data, and explore service offerings.

View Services Guide →

AI & Modernization FAQ

Q1 How accurate is AI code translation from COBOL to Java in 2026?

IBM watsonx Code Assistant for Z (WCA4Z) achieves a median structural quality score above 80% and functional correctness above 75% across enterprise COBOL-to-Java deployments — the most documented results available. A 2025 academic study found that summary augmentation (adding natural language descriptions to source code before translation) improves 36% of benchmark samples and 50% of low-scoring enterprise samples. In direct comparisons, WCA4Z substantially outperforms GitHub Copilot for mainframe code: Copilot produces 'plain class skeletons' lacking business logic, while WCA4Z delivers context-aware translations with proper database handling and CICS integration. Manual refinement effort is substantially reduced but not eliminated — expect 20–30% human review on any enterprise COBOL migration.

Q2 What are the real hallucination rates for AI code translation tools?

Published data is sparse, but: broader LLM translation research (October 2025) shows GPT-4-class models exhibit 33–60% hallucination rates in general translation tasks, adding or removing content not present in source material. The best-performing specialized models (2026 Vectara benchmarks) achieve sub-1% hallucination rates. Most enterprise-grade tools fall into a 'medium hallucination group' with 2–5% error rates. For production code, this means automated evaluation is non-negotiable — manual review of AI-translated code by engineers who understand both the source language semantics and target language idioms is required. Do not deploy AI-translated code without a regression test suite.

Q3 Can AI generate test suites for legacy code that has no existing tests?

Yes, with documented results. UnitTenX (an open-source AI multi-agent system, 2025) achieved 100% line coverage on a real-world C codebase starting from 0% coverage, covering 186 of 199 functions (93.5%) automatically. Salesforce's production deployment using Cursor AI cut legacy code coverage time by 85% for repositories with less than 10% initial coverage. The Salesforce approach combined AI generation with mandatory human oversight: engineers reviewed all generated code, checking test intentions through AI-generated JavaDoc comments, with SonarQube as the final validation gate. Where AI-generated tests fail: business logic validation in undocumented legacy systems — the AI can cover code paths but cannot verify that the business rules encoded in those paths are correct.

Q4 What are the biggest blockers preventing enterprises from adopting AI in 2026?

Data fragmentation is the #1 barrier by a significant margin: 82% of enterprises report fragmented data blocks AI accessibility (Cisco global study, 2025), 69% report poor data quality limits decisions, and 45% cite fragmented/unstructured data as the top AI adoption blocker. Legacy system integration is the #2 barrier: 60% of AI leaders identify legacy system integration as their primary barrier to agentic AI adoption (Deloitte Tech Trends 2026), and Gartner predicts 40% of agentic AI projects will be cancelled by end of 2027 due to infrastructure constraints. The failure pattern: AI pilots succeed in isolated environments with clean test data, then fail when connecting to legacy systems without APIs or real-time data capabilities.

Q5 What does 'AI-ready data infrastructure' actually look like?

AI-ready enterprises deploy three core integration layers: API integration layers (80% adoption among AI-ready companies), data lake/lakehouse architectures (77%), and enterprise data warehouses (72%). Beyond technology, AI-ready infrastructure requires: data governance with operational data contracts, freshness SLAs, and error budgets; real-time event-driven pipelines that activate instantly on business events; federated data mesh with centralized guardrails where regional teams publish data products under shared contracts; and policy-as-code embedding consent, residency, and retention rules. Legacy infrastructure fails on all of these: siloed CRM, POS, ecommerce, and service data creating blind spots, no lineage or quality controls, and 'one-off' integrations lacking reusable contracts.

Q6 What is the cost of NOT modernizing for AI — what are companies actually losing?

BCG research (October 2025) quantifies the gap: AI future-built companies achieve 5x the revenue increases and 3x the cost reductions compared to AI laggards. However, only 5% of firms worldwide are 'AI future-built,' while 60% report negligible gains despite substantial investment. McKinsey (June 2025) found that over 80% of companies embracing AI see no real productivity gains — most remain in 'copilot mode' rather than deploying agentic systems. Accenture data: competitors successfully leveraging AI are cutting costs by 30% and boosting productivity by 50%. The window for catching up is closing: in a world where competitors condense a month's work into a single day using agentic systems, the cost of remaining in pilot mode may soon exceed the cost of doing nothing.

Q7 How long does an AI readiness transformation actually take?

The typical AI readiness modernization path has three phases: Phase 1 (3–6 months) — data audit, master data governance, API layer over legacy systems, cloud rebalancing to get workloads on infrastructure that supports AI inference; Phase 2 (6–12 months) — data lakehouse implementation, real-time event pipelines, MLOps CI/CD pipeline build, model monitoring infrastructure; Phase 3 (12–24 months) — federated data mesh, policy-as-code, retraining infrastructure on proprietary data, agentic system deployment. The timeline compresses when data quality is already high (rare) and extends when ERP/CRM systems are on-premise with no APIs (common). Only one-third of enterprises are currently fully equipped to deploy AI at scale (NetApp, October 2025).

Q8 What are the MLOps maturity gaps that prevent AI from reaching production?

A 2025 consolidated MLOps lifecycle framework identifies the most common organizational gaps: lack of defined roles and skill requirements for MLOps (teams have data scientists but no ML engineers), insufficient CI/CD pipelines for ML model deployment (models are trained but deployment is manual), inadequate monitoring and observability for production models (no alerting when model performance degrades), absence of data governance frameworks (models trained on stale or biased data), and no clear resource allocation for MLOps at various maturity levels. The enterprise AI readiness framework requires an 80%+ data quality score, executive sponsorship, and defined ROI metrics before model deployment — criteria that most organizations pursuing AI pilots cannot meet.