A U.S. Air Force F-35A Lightning II from the 58th Fighter Squadron, 33rd Fighter Wing, Eglin Air Force Base, Fla., takes off for a training mission during Northern Lightning at Volk Field Air National Guard Base, Wisc., in August. Airman Christian Corley
Photo Caption & Credits

Strategy & Policy: Accelerate Weapons Testing Or … Fail?

Feb. 16, 2023

The Pentagon’s Director of Operational Test and Evaluation said weapons testing must speed up or the U.S. won’t be able to stay ahead of China and other potential adversaries. Writing in his annual report to Congress, released in January, DOT&E Director Nickolas H. Guertin said the test enterprise must be reimagined, so weapons are no longer tested in isolation, but rather alongside the dissimilar systems they’ll have to work with. 

The Intelligence Community should also play a bigger role in ensuring weapons are measured against current threats. The Pentagon faces a bow-wave of testing programs for which ranges, resources, and time are all short; testing against simulated threats that don’t accurately represent the threat and rapid development programs that outran themselves by failing to develop strategies to deal with setbacks and delays.

The 412-page report is also more candid than last year’s edition, when two versions were released, one to the public, and one containing “controlled unclassified” data.    

The DOT&E enterprise must do its part to accelerate “delivery of weapons that work,” Guertin said. Automation and “more widespread use of digital technologies,” such as simulation and predictive modelling, to speed up development, he said. Keeping the test strategy, equipment, and workforce “static” will slow things down, he noted. 

Accurately simulating threats today “often takes three to five years,” Guertin said. “We will need to continue to innovate on the use of simulation and/or emulation of threats in representative environments to ensure that weapon systems will be effective when called upon.” 

Broadening the Test Enterprise

Guertin sees a “threat realism gap,” he said, and in 2022 developed an updated strategy so that programs relying on similar technologies can share knowledge and test results. This could have “dramatic effects” on research, development, testing, and acquisition.  So far, “we do not have a clear view of the test-related data that exists across the entire DOD,” he said. The new strategy recognizes “the need to improve access to that data in order to extract new insights.”

Aggregating data and sharing it with all affected parties “will accelerate the fielding of robust, combat-credible capabilities,” Guertin said.

The Pentagon must “test the way we fight,” Guertin stated, to determine whether a system “will be effective, suitable, survivable, and lethal in the hands of a warfighter facing a thinking enemy.” Weapons don’t succeed on their own, but as part of a “kill web,” so should not be tested in isolation. Rather, test plans should incorporate realistic scenarios, including the joint and allied capabilities that will also be part of the fight. Testing should include live, virtual, and constructive elements.  

Cyber, electromagnetic spectrum, and space threats are particularly challenging, Guertin said. The T&E ecosystem must develop more robust testing and figure out ways to replicate “the space environment and space threats, both kinetic and nonkinetic.” These efforts will require funding. 

New “T&E infrastructure, tools, and processes” are needed to keep up, Guertin asserted. They must be “able to scale and adapt quickly to reflect changes when they arise, and efficiently evaluate kill web and system-of-systems performance.”

The T&E enterprise also has to be more cognizant that upgrades and system changes throughout a program’s life will necessarily have a domino effect on the other systems they touch. 

“We must therefore ‘look right’ into the life cycle of a system,” he said, revisiting systems after they have “evolved substantially after fielding” to look for those unintended consequences. 

This will be especially important in assessing systems using artificial intelligence, autonomy, and machine learning, according to Guertin. This approach will require developing “a framework to evaluate iterative software improvements and their impact to a system’s role in, and interoperability with, the kill web.”

Going Digital

The T&E community will also have to broaden its use of digital twins as a testing method, he said. These approaches will allow the T&E community to “keep pace with rapid and frequent changes … with minimal disruption” to operational units.

Digital twins will “aid, but not obviate, the need for live operational and live-fire” test events, Guertin said. Modeling and simulation systems must be consistently “validated, verified and accredited,” and if simulation is seen to “diverge substantially” from the real world, the T&E community me be willing to throw out the models and conduct live tests “to reconnect us to an accurate reflection of the operating environment.”

The key to all this adaptation is workforce, but Guertin sees a shortage of experts “in the use of automation, cyber survivability, data management, artificial intelligence, and digital engineering.” A combination of training, cooperation with industry, and both internships and externships can help. 

“We need … heavy investment in individual brainstorming; collaborative brainstorming among government entities, the private sector, and academia; and smartly timed planning and programming in the amounts required,” Guertin asserted.

More Candor

Past complaints about issuing public and controlled versions of last year’s DOT&E report were amplified when negative testing results from the F-35 were withheld from the public version.  

“Issuing two documents allowed DOT&E to be more transparent with congressional and DOD personnel, while maintaining the integrity of information related to programs under oversight,” Guertin said in his foreword to the new edition. The new, single version “reflects careful consultation with the program offices that determine the classification of information about systems under DOT&E oversight, and contains the maximum detail permitted.” Guertin offered to provide additional information to lawmakers “on request.”

Pentagon spokesman Air Force Brig. Gen. Patrick Ryder said last year’s “controlled, unclassified” version was required by law, and not intended to shield embarrassing information from public view.  

Some Notable Air Force Test Programs

The DOT&E report examined 17 Air Force programs. Several of them, listed here, exemplified the problems Guertin called out with Defense programs overall. To read what DOT&E said about the KC-46, see “World” p. 32.

Lockheed Martin AGM-183 Air-Launched Rapid Response Weapon (ARRW). The hypersonic ARRW missile was launched as a mid-tier, rapid acquisition program. But the ARRW, to be launched from a bomber, does not yet have an integrated test plan blessed by DOT&E. A shortage of range space and long waits between test shots, coupled with snafus unrelated to its design, have slowed progress. ARRW lacks approved modeling and simulation tools to augment physical testing and DOT&E has yet to see a plan for testing its warhead. It’s unclear whether ARRW might be vulnerable to cyber disruption, DOT&E said.  So far, it “has not yet demonstrated the required warfighting capability.”

BAE Systems/Boeing AN/ALQ-250(V)1 Eagle Passive Active Warning and Survivability System (EPAWSS). EPAWSS was criticized for frequent hardware failures that threaten required rates for mean time between failures. The result means pilots might not trust EPAWSS or may not know when it has failed. EPAWSS completed cybersecurity tests, however, and the Air Force is eliminating the vulnerabilities identified. DOT&E assessed that EPAWSS will probably otherwise be operationally effective.

Boeing MH-139 Grey Wolf Helicopter. Several deficiencies are keeping the MH-139 from proceeding to production this year, largely the result of assumptions that the commercial version of the helicopter would easily translate to military capability. But issues with the automatic flight control, the intercom, and the layout of the crew cabin and deficiencies in the flight manual for crosswind operations have arisen, as have concerns about whether the powertrain might need extra maintenance under military use. Ballistic protection and electromagnetic hardening are “areas to watch,” the DOT&E said, suggesting more testing before the Air Force commits to a production decision. 

Boeing F-15EX Eagle II. The DOT&E commended the Air Force for putting its first two F-15EXs into high-level wargames early, and said the aircraft performed as well as, or better than, the F-15C and E that it will replace. It found the EX operationally effective and probably suitable. Only about half the planned test flights planned in 2022 were flown, though, mostly due to an FAA restriction on the aircraft using Link 16. The EX will be assessed for its survivability mid-2023. It needs to be integrated with the Open-Air Battle Shaping system (OABS), which is a wargame model. DOT&E wants the EX to be tested against threat-representative radars, and for the test aircraft to be updated with any changes made to production versions.

Northrop Grumman F-16 Radar Modernization Program. There’s “compelling evidence” the F-16’s new APG-83 active, electronically scanned array (AESA) radar “is a significant improvement over the legacy” APG-68 radar. Testing is “on track, with some schedule risk,” although its resilience against cyberattack cannot yet be assessed, the DOT&E said. It wants an updated test and evaluation master plan to answer those questions, and others.

Lockheed Martin F-22 Raptor. DOT&E was generally pleased with the Raptor’s capability upgrades. A single suitability concern and cyber resilience issue persist but are being fixed. Testing is being hampered by the FAA’s slowness in allowing the aircraft to use Link 16 data transmission, though. It also urged the Air Force to develop a plan so that the test and evaluation community can “keep pace” with a series of rapid planned improvements in the F-22’s capabilities. 

Lockheed Sikorsky HH-60W Jolly Green II. While DOT&E found the HH-60W likely to be both operationally effective and suitable, “the Air Force is tracking several deficiencies that result in degraded crew situational awareness from threat warnings and indications on navigation displays during engagements.” Software updates over the next few years are expected to resolve these problems. The fuel system and aerial refueling apparatus also have some issues that will need further testing. Testing is on track and will support a full-rate production decision in mid-2023.

Boeing T-7A Redhawk Trainer. DOT&E said the T-7’s crew escape system and the canopy’s bird-strike resistance “failed to meet minimum safety requirements” during subsystem qualification tests and need “design changes” before the T-7A can enter low-rate initial production. The DOT&E also wants more testing of the T-7’s oxygen generation system and believes the aircraft should have an Automatic Ground Collision Avoidance System and wants more cyber resiliency testing.

Raytheon Technologies AIM-120 Advanced Medium Range Air-to-Air Missile (AMRAAM). The DOT&E found that AIM-120D3 version of the AMRAAM is effective and suitable, but the program’s testing realism would benefit from realistic, stealthy targets.

Space Command and Control System. The Space C2 program fell way behind on test in 2022 “primarily due to delayed product delivery, understaffed development teams, unclear test team constructs and responsibilities, and development focus on non-critical capabilities.” To address those concerns, “the program changed key leadership personnel, restructured development teams, more clearly defined their integrated testing construct, and refocused capability development to only the most crucial capabilities,” Guertin said. Yet, despite the lack of operational testing, one Data-as-a-Service capability, Warp Core, “was conditionally accepted for operations by the U.S. Space Force (USSF) in FY22, pending completion of cyber survivability testing.

Lockheed Martin F-35 Joint Strike Fighter. The F-35 has been on the verge of initial operational capability for years, but its integration into the Joint Simulation Environment—a wargaming system—remains a hurdle. Guertin said initial operational test and evaluation will be cleared late this summer, although there could be more slips due to “further discoveries of deficiencies and potential delays” in the verification, validation, and accreditation process. The DOT&E enterprise was dissatisfied with “immature, deficient” software updates, which are costing the program “time and resources.” Guertin said the Joint Program Office hasn’t “adequately planned” for testing of the Tech Refresh 3 (TR-3), necessary to support most of the Block 4 upgrades. The JPO hasn’t yet put flight-test gear for Tech Refresh 2 on contract, and this is delaying TR-3 testing as well. There might not be enough test aircraft available, Guertin noted. And ODIN—the Operational Data Integrated Network, which is to replace the much-maligned ALIS Autonomic Logistics Information System—is now a year behind schedule due to safety deficiencies.