Modernization Intel / Research
AI & Modernization
UpdatedTwo directions, one dependency. AI tools now cut code migration cost by 70% and testing cost by 86%. But 82% of enterprises can't use AI at all — because their data infrastructure isn't ready. Here's both sides of the equation.
Two Directions, One Research Hub
This hub covers a bidirectional relationship between AI and modernization that is frequently conflated into a single, confusing narrative. We separate it into two distinct questions:
Using LLMs and AI tools to accelerate legacy code migration, generate test suites, and document undocumented systems. The tooling is real and maturing fast.
What organizations must modernize — data infrastructure, APIs, MLOps pipelines — before AI can deliver value in production. Most companies are blocked here.
The AI and modernization relationship has reached an inflection point in 2026. AI-assisted code translation tools have crossed the threshold of practical usefulness: IBM's watsonx Code Assistant for Z achieves 80%+ structural accuracy on COBOL-to-Java translation, and AI-native testing platforms cut annual testing costs from $6M to $840K. At the same time, BCG research establishes that only 5% of firms are "AI future-built" — delivering 5× the revenue increases of laggards — while 60% report negligible gains despite substantial investment, almost always because their data infrastructure cannot support production AI.
+ Read full background
The AI tooling market for software modernization has bifurcated. General-purpose tools (GitHub Copilot, Google Gemini Code Assist) excel at everyday developer assistance but produce inadequate results for legacy language translation — they generate syntactically correct code that lacks the semantic understanding of COBOL's data division structure, packed-decimal arithmetic, and CICS transaction context. Specialized tools (IBM WCA4Z, Amazon Q Developer Transform) are purpose-built for legacy modernization and produce demonstrably better results for that specific use case.
On the readiness side, Gartner's prediction that 40% of agentic AI projects will be cancelled by 2027 due to infrastructure constraints reflects the gap between AI ambition and data reality. Pilots succeed in sandboxed environments with clean, curated data. Production deployment fails when the agent must call a 1990s batch system via SFTP, read from five inconsistent CRMs, or wait 24 hours for a nightly ETL job to refresh the data it needs in real time. Modernization is the prerequisite — not an alternative — to AI value.
Direction A: AI as a Modernization Tool
AI tools are demonstrably useful for three specific modernization tasks: code translation, test generation, and documentation of undocumented systems.
AI Code Translation: What the Benchmarks Actually Show
IBM watsonx Code Assistant for Z
The most documented results for enterprise COBOL-to-Java translation. Achieves median structural quality above 80% and functional correctness above 75% across client deployments. A 2025 study found that summary augmentation — adding natural language descriptions to COBOL source before translation — improves 36% of CodeNet benchmark samples (727 samples) and 50% of low-scoring enterprise samples (571 samples). The tool handles CICS integration, VSAM/DB2 data access patterns, and COBOL-compatible serialization that general-purpose LLMs consistently miss.
GitHub Copilot & General-Purpose LLMs
Produces "plain class skeletons" for COBOL translation — syntactically valid Java that lacks business logic understanding. Misses database handling specifics, CICS context, and COBOL data structure semantics (COMP-3 packed decimal, REDEFINES, 88-level condition names). Appropriate for everyday developer assistance and greenfield code but not for legacy language migration at scale. GPT-4-class models exhibit 33–60% hallucination rates in general translation tasks; best models achieve sub-1% but most fall in the 2–5% range (Vectara 2026).
| TOOL | BEST USE CASE | COBOL QUALITY | PRICING |
|---|---|---|---|
| IBM watsonx Code Asst. for Z | COBOL → Java, mainframe migration | 80%+ structural, 75%+ functional | Enterprise contract |
| Amazon Q Developer Transform | Java 8→17 upgrades, .NET migration | Moderate (Java-focused) | $19/user/month |
| GitHub Copilot | Daily coding assistance, greenfield | Poor (class skeletons only) | $19-$39/user/month |
| Google Gemini Code Assist | GCP workloads, modern languages | No published legacy benchmarks | $19/user/month |
| Moderne | Large-scale Java/Spring refactoring | Excellent for Java, no COBOL | Enterprise contract |
AI Test Generation for Legacy Code: Documented Results
Where AI Test Generation Fails
AI-generated tests cover code paths reliably but cannot validate that the business rules encoded in those paths are correct. For undocumented legacy systems, the business logic is the unknown — the AI has no way to verify that a 1998 calculation produces the right answer because there is no specification to check against. The Salesforce approach — mandatory human review of all generated tests, AI-generated JavaDoc comments for human validation, SonarQube as final gate — is the right model. Treat AI test generation as coverage scaffolding, not correctness verification.
AI Legacy Code Documentation Tools
LegacyMap (Sector7)
Specialized for COBOL, FORTRAN, BASIC, C++, and Pascal on OpenVMS and Mainframe (z/OS). Generates structured callgraphs, SQL access diagrams, and procedure maps without requiring code changes. Supports platform-specific dialects including VAX, Alpha, and z/OS variants. Best tool for generating structural maps of undocumented mainframe estates — accelerates onboarding and identifies dead code without modifying production systems.
Codegram
Targets VB, Delphi, and COBOL to modern languages (Java, C#, Python) with integrated documentation generation, debugging tools, and code optimization. Provides structured documentation templates and maintains conversion history for auditing. Combined conversion + documentation toolchain for mid-market legacy application migrations. Appropriate for organizations that want a unified tool rather than separate documentation and translation workflows.
Quality ceiling on documentation tools: All documentation tools excel at structural analysis — identifying callers/callees, visualizing system flows, and accelerating engineer onboarding. None can reconstruct business intent from undocumented logic. They map what the code does, not what it was supposed to do or why specific decisions were made. Human domain experts remain required for semantic validation.
Direction B: Modernization for AI Readiness
Before AI delivers value in production, the underlying infrastructure must support it. Most enterprises are blocked at data and integration layers.
The Blockers: Why 60% of Companies See No AI Value
Data Fragmentation (82% of enterprises)
Cisco's 2025 global study: 82% of enterprises report fragmented data blocks accessibility and slows AI integration. 69% report poor data quality limits decisions. 45% cite fragmented, unstructured data as the top AI adoption blocker. 86% of companies now prioritize data unification as their #1 AI readiness initiative. The specific failure: siloed CRM, POS, ecommerce, service, and transaction data that cannot be joined — making it impossible to calculate true customer lifetime value or provide AI agents with full context.
Legacy System Integration (60% of AI leaders)
Deloitte Tech Trends 2026: 60% of AI leaders identify legacy system integration as their primary barrier to agentic AI adoption. Gartner predicts 40% of agentic AI projects will be cancelled by end of 2027 due to infrastructure constraints. The failure pattern: AI pilots succeed in isolated environments with clean test data, then production deployment fails when connecting to legacy systems without APIs, real-time data, or the query patterns that AI agents generate. Monolithic architectures requiring full-stack coordination for minor updates, tightly coupled systems with brittle integrations, and waterfall development cycles that bury AI pilots in backlogs are the specific culprits.
The Cost of Not Modernizing for AI (BCG, October 2025)
What AI-Ready Data Infrastructure Looks Like
AI-Ready Enterprise
- ✓API integration layers: 80% adoption rate among AI-ready organizations
- ✓Data lake/lakehouse architecture (77% adoption)
- ✓Enterprise data warehouse for structured analytics (72%)
- ✓Data governance with contracts, freshness SLAs, and error budgets
- ✓Event-driven pipelines that activate instantly on business events
- ✓Federated data mesh: teams publish data products under shared contracts
- ✓Policy-as-code: consent, residency, and retention in version control
Legacy Enterprise (AI Blocked)
- ✗Siloed CRM, POS, ecommerce, service data — no joins possible
- ✗No APIs on core systems — batch SFTP or database polling only
- ✗Nightly ETL — agents can't get real-time data
- ✗"One-off" integrations with no reusable contracts or schemas
- ✗No data lineage or quality controls — models train on stale data
- ✗Manual approvals for deploying or retraining models
- ✗Inflexible permission structures blocking experimentation
MLOps Maturity: The Gap Between Pilot and Production
A 2025 consolidated MLOps lifecycle framework identifies the most common gaps that prevent AI from reaching production at legacy organizations. The enterprise AI readiness framework requires an 80%+ data quality score, executive sponsorship, cloud ML infrastructure, and defined ROI metrics before model deployment — criteria most organizations pursuing AI pilots cannot currently meet.
No MLOps CI/CD Pipeline
Models are trained by data scientists but deployed manually — inconsistently and infrequently. No automation for model validation, staging, rollback, or A/B testing. Each deployment is a one-off project.
No Production Monitoring
Models degrade silently when production data drifts from training data. No alerting when model performance drops. No feedback loops into retraining cycles. Models deployed in 2024 still running without updates in 2026.
Undefined Roles
Organizations have data scientists but no ML engineers. No one owns the boundary between model development and production deployment. MLOps responsibilities fall into a gap between data science and IT operations.
No Data Governance
Models trained on data with no quality controls, freshness guarantees, or bias audits. Regulatory exposure increases as AI decisions become consequential. EU AI Act compliance requires documentation that doesn't exist.
AI Readiness Modernization Path: Three Phases
Data Foundation
- → Data quality audit (target 80%+ score)
- → API layer over legacy core systems
- → Master data governance baseline
- → Cloud infrastructure rebalancing
- → Data sovereignty compliance mapping
Integration & Pipelines
- → Data lakehouse implementation
- → Real-time event-driven pipelines
- → MLOps CI/CD pipeline build
- → Model monitoring & observability
- → Feature store for shared ML features
Production AI
- → Federated data mesh with guardrails
- → Policy-as-code for AI governance
- → Fine-tuning on proprietary data
- → Agentic system deployment
- → EU AI Act / regulatory compliance
Only one-third of enterprises are currently fully equipped to deploy AI at scale (NetApp, October 2025). Timeline compresses when data quality is already high; extends when core systems are on-premise with no APIs.
Service Guides
AI readiness assessment and modernization strategy services.
Cost Benchmarks
AI-assisted vs manual modernization cost comparison, sourced from published 2025 data.
AI-Assisted vs Manual Modernization Costs
Testing Cost Transformation: Manual → AI-Native (Virtuoso 2025)
| APPROACH | ANNUAL COST | VS MANUAL | 3-YEAR SAVINGS |
|---|---|---|---|
| Manual Testing | $6,030,000 | Baseline | — |
| Traditional Test Automation (Selenium/Cypress) | $2,290,000 | −62% | $11.2M |
| AI-Native Testing | $840,000 | −86% | $15.6M |
* Source: Virtuoso QA ROI calculator, 2025. 3-year savings vs manual baseline. Does not include competitive advantage value ($47M+ estimated by Virtuoso for faster release velocity).
Looking for implementation partners?
AI Modernization Services & Vendor Guide
Compare 10 AI modernization tools and strategy partners, see adoption data, and explore service offerings.
AI & Modernization FAQ
Q1 How accurate is AI code translation from COBOL to Java in 2026?
IBM watsonx Code Assistant for Z (WCA4Z) achieves a median structural quality score above 80% and functional correctness above 75% across enterprise COBOL-to-Java deployments — the most documented results available. A 2025 academic study found that summary augmentation (adding natural language descriptions to source code before translation) improves 36% of benchmark samples and 50% of low-scoring enterprise samples. In direct comparisons, WCA4Z substantially outperforms GitHub Copilot for mainframe code: Copilot produces 'plain class skeletons' lacking business logic, while WCA4Z delivers context-aware translations with proper database handling and CICS integration. Manual refinement effort is substantially reduced but not eliminated — expect 20–30% human review on any enterprise COBOL migration.
Q2 What are the real hallucination rates for AI code translation tools?
Published data is sparse, but: broader LLM translation research (October 2025) shows GPT-4-class models exhibit 33–60% hallucination rates in general translation tasks, adding or removing content not present in source material. The best-performing specialized models (2026 Vectara benchmarks) achieve sub-1% hallucination rates. Most enterprise-grade tools fall into a 'medium hallucination group' with 2–5% error rates. For production code, this means automated evaluation is non-negotiable — manual review of AI-translated code by engineers who understand both the source language semantics and target language idioms is required. Do not deploy AI-translated code without a regression test suite.
Q3 Can AI generate test suites for legacy code that has no existing tests?
Yes, with documented results. UnitTenX (an open-source AI multi-agent system, 2025) achieved 100% line coverage on a real-world C codebase starting from 0% coverage, covering 186 of 199 functions (93.5%) automatically. Salesforce's production deployment using Cursor AI cut legacy code coverage time by 85% for repositories with less than 10% initial coverage. The Salesforce approach combined AI generation with mandatory human oversight: engineers reviewed all generated code, checking test intentions through AI-generated JavaDoc comments, with SonarQube as the final validation gate. Where AI-generated tests fail: business logic validation in undocumented legacy systems — the AI can cover code paths but cannot verify that the business rules encoded in those paths are correct.
Q4 What are the biggest blockers preventing enterprises from adopting AI in 2026?
Data fragmentation is the #1 barrier by a significant margin: 82% of enterprises report fragmented data blocks AI accessibility (Cisco global study, 2025), 69% report poor data quality limits decisions, and 45% cite fragmented/unstructured data as the top AI adoption blocker. Legacy system integration is the #2 barrier: 60% of AI leaders identify legacy system integration as their primary barrier to agentic AI adoption (Deloitte Tech Trends 2026), and Gartner predicts 40% of agentic AI projects will be cancelled by end of 2027 due to infrastructure constraints. The failure pattern: AI pilots succeed in isolated environments with clean test data, then fail when connecting to legacy systems without APIs or real-time data capabilities.
Q5 What does 'AI-ready data infrastructure' actually look like?
AI-ready enterprises deploy three core integration layers: API integration layers (80% adoption among AI-ready companies), data lake/lakehouse architectures (77%), and enterprise data warehouses (72%). Beyond technology, AI-ready infrastructure requires: data governance with operational data contracts, freshness SLAs, and error budgets; real-time event-driven pipelines that activate instantly on business events; federated data mesh with centralized guardrails where regional teams publish data products under shared contracts; and policy-as-code embedding consent, residency, and retention rules. Legacy infrastructure fails on all of these: siloed CRM, POS, ecommerce, and service data creating blind spots, no lineage or quality controls, and 'one-off' integrations lacking reusable contracts.
Q6 What is the cost of NOT modernizing for AI — what are companies actually losing?
BCG research (October 2025) quantifies the gap: AI future-built companies achieve 5x the revenue increases and 3x the cost reductions compared to AI laggards. However, only 5% of firms worldwide are 'AI future-built,' while 60% report negligible gains despite substantial investment. McKinsey (June 2025) found that over 80% of companies embracing AI see no real productivity gains — most remain in 'copilot mode' rather than deploying agentic systems. Accenture data: competitors successfully leveraging AI are cutting costs by 30% and boosting productivity by 50%. The window for catching up is closing: in a world where competitors condense a month's work into a single day using agentic systems, the cost of remaining in pilot mode may soon exceed the cost of doing nothing.
Q7 How long does an AI readiness transformation actually take?
The typical AI readiness modernization path has three phases: Phase 1 (3–6 months) — data audit, master data governance, API layer over legacy systems, cloud rebalancing to get workloads on infrastructure that supports AI inference; Phase 2 (6–12 months) — data lakehouse implementation, real-time event pipelines, MLOps CI/CD pipeline build, model monitoring infrastructure; Phase 3 (12–24 months) — federated data mesh, policy-as-code, retraining infrastructure on proprietary data, agentic system deployment. The timeline compresses when data quality is already high (rare) and extends when ERP/CRM systems are on-premise with no APIs (common). Only one-third of enterprises are currently fully equipped to deploy AI at scale (NetApp, October 2025).
Q8 What are the MLOps maturity gaps that prevent AI from reaching production?
A 2025 consolidated MLOps lifecycle framework identifies the most common organizational gaps: lack of defined roles and skill requirements for MLOps (teams have data scientists but no ML engineers), insufficient CI/CD pipelines for ML model deployment (models are trained but deployment is manual), inadequate monitoring and observability for production models (no alerting when model performance degrades), absence of data governance frameworks (models trained on stale or biased data), and no clear resource allocation for MLOps at various maturity levels. The enterprise AI readiness framework requires an 80%+ data quality score, executive sponsorship, and defined ROI metrics before model deployment — criteria that most organizations pursuing AI pilots cannot meet.