F5 and NVIDIA team up to boost AI traffic with smart LLM routing

5 BIG-IP Next for Kubernetes now runs on NVIDIA BlueField-3 DPUs, boosting AI efficiency, LLM routing, and security for modern infrastructure.

author-image
DQC Bureau
New Update
F5 and NVIDIA team up to boost AI traffic with smart LLM routing

F5 and NVIDIA team up to boost AI traffic with smart LLM routing

F5 has introduced advanced capabilities for its BIG-IP Next for Kubernetes platform, powered by NVIDIA BlueField-3 DPUs and the NVIDIA DOCA software framework. The upgrades, validated through deployment by European AI infrastructure provider Sesterce, are designed to meet the escalating demands of large-scale AI workloads by combining high-performance traffic management with low-latency routing and enhanced GPU efficiency.

Advertisment

The partnership aims to simplify the delivery, security, and management of large language model (LLM) traffic in distributed environments, giving enterprises a clear path toward scalable, multi-tenant AI systems.

Sesterce validates multi-model AI traffic gains

Sesterce, a European AI infrastructure operator focused on sovereign AI and accelerated computing, tested the F5-NVIDIA integration across performance, security, and operational metrics. Their validation revealed:

Advertisment
  • 20% improvement in GPU utilisation

  • Enhanced multi-tenancy and traffic management in Kubernetes

  • Reduced inference latency using NVIDIA Dynamo and KV Cache Manager

  • Seamless integration with NVIDIA NIM microservices for routing across multiple LLMs

“The integration between F5 and NVIDIA was enticing even before we conducted any tests,” said Youssef El Manssouri, CEO and Co-Founder of Sesterce. “We can now dynamically balance traffic, optimise GPU workloads, and deliver more value to our customers.”

At the core of the solution is intelligent routing of LLM traffic. Simple queries are sent to lightweight models, while complex prompts are directed to advanced LLMs. This approach balances cost, performance, and response time.

Advertisment

F5’s deep programmability enables routing logic to be deployed directly on NVIDIA BlueField-3 DPUs. This helps enterprises scale AI applications while keeping latency low and throughput high.

“Routing and classifying LLM traffic can be compute-heavy,” said Kunal Anand, Chief Innovation Officer at F5. “By programming that logic at the infrastructure level, we unlock new efficiencies across AI data centres.”

Accelerating AI inference with NVIDIA Dynamo

Advertisment

The updated platform integrates tightly with NVIDIA Dynamo, a framework introduced to simplify AI inference across distributed systems. By leveraging BlueField DPUs and F5’s KV Cache Manager, the system reroutes requests intelligently and avoids costly recomputations, saving GPU memory and increasing throughput.

Offloading key functions from CPUs to DPUs streamlines tasks such as model scheduling, caching, and memory orchestration, especially in real-time inference environments. This translates to faster responses for generative AI applications and improved resource utilisation.

MCP protection and rapid protocol adaptation

Advertisment

F5 also adds a security layer for Model Context Protocol (MCP), an open standard by Anthropic for supplying LLMs with contextual data. Acting as a reverse proxy, F5 strengthens MCP server protections and helps enterprises build scalable, secure, and compliant AI environments.

The platform’s iRules programmability enables organisations to adapt rapidly to emerging LLM protocol changes and evolving security threats.

Now generally available, F5 BIG-IP Next for Kubernetes with NVIDIA BlueField-3 DPUs delivers smart LLM routing and query optimisation, multi-model inference with NVIDIA NIM, AI-driven load balancing and reverse proxy features, and full programmability and security for MCP workflow

Advertisment

As AI adoption grows, this joint F5–NVIDIA effort offers enterprises a plug-and-play path to deploy intelligent, high-performance, and secure AI services at scale.

 

Read More:

Advertisment

NetApp Enhances AI Capabilities with NVIDIA AI Data Platform

Building Partner-Led Identity Security in the Cloud Era

Redefining Managed Services with AI, Localisation, and Innovation

Strategy for Affordable Cloud Infrastructure for Startups & SMBs in India