/dqc/media/media_files/2025/06/20/f5-and-nvidia-team-up-to-boost-ai-traffic-with-smart-llm-routing-2025-06-20-12-44-16.png)
F5 and NVIDIA team up to boost AI traffic with smart LLM routing
F5 has introduced advanced capabilities for its BIG-IP Next for Kubernetes platform, powered by NVIDIA BlueField-3 DPUs and the NVIDIA DOCA software framework. The upgrades, validated through deployment by European AI infrastructure provider Sesterce, are designed to meet the escalating demands of large-scale AI workloads by combining high-performance traffic management with low-latency routing and enhanced GPU efficiency.
The partnership aims to simplify the delivery, security, and management of large language model (LLM) traffic in distributed environments, giving enterprises a clear path toward scalable, multi-tenant AI systems.
Sesterce validates multi-model AI traffic gains
Sesterce, a European AI infrastructure operator focused on sovereign AI and accelerated computing, tested the F5-NVIDIA integration across performance, security, and operational metrics. Their validation revealed:
-
20% improvement in GPU utilisation
-
Enhanced multi-tenancy and traffic management in Kubernetes
-
Reduced inference latency using NVIDIA Dynamo and KV Cache Manager
-
Seamless integration with NVIDIA NIM microservices for routing across multiple LLMs
“The integration between F5 and NVIDIA was enticing even before we conducted any tests,” said Youssef El Manssouri, CEO and Co-Founder of Sesterce. “We can now dynamically balance traffic, optimise GPU workloads, and deliver more value to our customers.”
At the core of the solution is intelligent routing of LLM traffic. Simple queries are sent to lightweight models, while complex prompts are directed to advanced LLMs. This approach balances cost, performance, and response time.
F5’s deep programmability enables routing logic to be deployed directly on NVIDIA BlueField-3 DPUs. This helps enterprises scale AI applications while keeping latency low and throughput high.
“Routing and classifying LLM traffic can be compute-heavy,” said Kunal Anand, Chief Innovation Officer at F5. “By programming that logic at the infrastructure level, we unlock new efficiencies across AI data centres.”
Accelerating AI inference with NVIDIA Dynamo
The updated platform integrates tightly with NVIDIA Dynamo, a framework introduced to simplify AI inference across distributed systems. By leveraging BlueField DPUs and F5’s KV Cache Manager, the system reroutes requests intelligently and avoids costly recomputations, saving GPU memory and increasing throughput.
Offloading key functions from CPUs to DPUs streamlines tasks such as model scheduling, caching, and memory orchestration, especially in real-time inference environments. This translates to faster responses for generative AI applications and improved resource utilisation.
MCP protection and rapid protocol adaptation
F5 also adds a security layer for Model Context Protocol (MCP), an open standard by Anthropic for supplying LLMs with contextual data. Acting as a reverse proxy, F5 strengthens MCP server protections and helps enterprises build scalable, secure, and compliant AI environments.
The platform’s iRules programmability enables organisations to adapt rapidly to emerging LLM protocol changes and evolving security threats.
Now generally available, F5 BIG-IP Next for Kubernetes with NVIDIA BlueField-3 DPUs delivers smart LLM routing and query optimisation, multi-model inference with NVIDIA NIM, AI-driven load balancing and reverse proxy features, and full programmability and security for MCP workflow
As AI adoption grows, this joint F5–NVIDIA effort offers enterprises a plug-and-play path to deploy intelligent, high-performance, and secure AI services at scale.
Read More:
NetApp Enhances AI Capabilities with NVIDIA AI Data Platform
Building Partner-Led Identity Security in the Cloud Era
Redefining Managed Services with AI, Localisation, and Innovation
Strategy for Affordable Cloud Infrastructure for Startups & SMBs in India