AWS and NVIDIA advance rack-scale AI infrastructure for hyperscalers

AWS has deepened its collaboration with NVIDIA to advance rack-scale AI infrastructure for hyperscalers, aligning Trainium4 with NVLink Fusion and MGX to reduce deployment complexity and speed up custom AI silicon rollouts.

author-image
DQC Bureau
New Update
ChatGPT Image Dec 14, 2025, 08_09_18 AM (1) (1)

AWS and NVIDIA advance rack-scale AI infrastructure for hyperscalers

Amazon Web Services has moved a step deeper into custom AI infrastructure by working with NVIDIA on NVLink Fusion, a rack-scale platform designed to speed up deployment of Trainium4, Graviton CPUs, Elastic Fabric Adapters and the Nitro System virtualisation stack. The collaboration brings NVLink scale-up interconnect technology and the NVIDIA MGX rack architecture into the heart of AWS’ next generation silicon roadmap.

Advertisment

AWS strengthens rack-scale plans

AWS is building Trainium4 to work with NVLink 6 and MGX, marking the first phase of a multigenerational tie-up around NVLink Fusion. The platform combines scale-up networking, a full technology stack and a broad supplier base. According to the release, the aim is to boost performance, increase return on investment and shorten the path to market for custom AI silicon.

This move arrives as AI workloads grow more demanding. Training and inference for planning, reasoning and agentic AI now rely on hundreds of billions- to trillion-parameter models and mixture-of-experts architectures. These require many accelerators working in parallel and operating as a single fabric. Meeting such needs demands a high-bandwidth, low-latency scale-up network, which is the design goal of NVLink.

The bottlenecks behind custom AI silicon

Hyperscalers face several barriers when building custom AI systems at rack scale.

Advertisment
  • Slow development cycles: Creating a custom chip is only one part of the job. The organisation must also design networking for scale-up and scale-out workloads, integrate storage pathways and build full-rack designs covering trays, cooling, power delivery, system management and AI software. The process can cost billions and take years.

  • Complex supplier dependencies: Rack-scale builds involve CPUs, GPUs, networking layers, racks and trays, power equipment, cooling systems and thousands of components. A single delay or change can disrupt the entire process.

NVLink Fusion is positioned to reduce these risks by offering a tested rack-scale platform with established interoperability between components, helping hyperscalers avoid common deployment delays.

How NVLink Fusion supports custom silicon

NVLink Fusion gives hyperscalers and ASIC designers a way to integrate their chips with NVLink and MGX, forming a common rack-scale infrastructure.

Advertisment

Scale-up networking with NVLink 6

The core is the NVLink Fusion chiplet, which can be added to custom ASICs to connect directly to the NVLink interconnect and NVLink Switch. The platform includes the Vera-Rubin NVLink Switch tray, using sixth-generation switching and 400G custom SerDes.

This setup lets adopters link up to 72 custom ASICs in an all-to-all configuration, delivering 3.6 TB/s per ASIC and 260 TB/s of total scale-up bandwidth. NVLink Switch also supports peer-to-peer memory access and NVIDIA’s SHARP protocol for in-network reductions and multicast acceleration.

The release notes that NVLink’s track record and its pairing with NVIDIA AI software can deliver up to three times the performance and revenue for inference by combining 72 accelerators in one scale-up domain.

Advertisment

Lower development costs and faster deployment

NVLink Fusion adopters can use a modular portfolio of technology that includes MGX rack designs, GPUs, Vera CPUs, co-packaged optics switches, ConnectX SuperNICs, BlueField DPUs and Mission Control software. The ecosystem also includes CPU and IP providers and manufacturers.

For AWS, the platform extends to OEMs/ODMs supplying complete rack-scale components, racks, chassis, power systems and cooling. The simplified supply chain reduces the risk normally associated with building full data centre-scale racks.

One rack, heterogeneous silicon

NVLink Fusion allows hyperscalers to run heterogeneous AI silicon using the same physical footprint, cooling and power distribution they already use. Each part of the platform can be used independently, letting organisations scale incrementally as workloads shift towards heavier inference and agentic AI training demands.

Advertisment

The release states that bringing custom AI chips to market remains challenging, and NVLink Fusion provides a way to use the tested MGX rack architecture and NVLink networking to speed up innovation cycles. With Trainium4 aligned to this architecture, AWS expects to push new silicon to market more quickly.

Closing view

The collaboration marks a clear moment in the race to optimise datacentres for larger, more complex AI models. Rack-scale approaches are becoming a strategic priority for hyperscalers, and NVLink Fusion signals a move towards shared architectures that can reduce cost, cut deployment cycles and remove supply-chain friction. For AWS, integrating Trainium4 into this framework suggests a future where custom silicon arrives faster and scales across data centres with far fewer dependencies.

aws