Video summarisation for traffic management moves closer to real-time use

Video systems generate more data than operators can practically review. New AI-driven video summarisation tools aim to turn traffic footage into searchable, structured insights, reducing manual effort and improving response times.

author-image
DQC Bureau
New Update
ChatGPT Image Dec 24, 2025, 04_04_32 PM (1)

Video summarisation for traffic management moves closer to real-time use

Traffic surveillance systems have grown more powerful over the years, but one problem has remained stubbornly unchanged. The footage keeps piling up. Operators still spend hours watching screens, scrubbing timelines, and filtering out noise to find events that matter.

A new push towards video summarisation for traffic management aims to change that balance by turning raw video into readable, searchable text, delivered in seconds rather than hours.

Milestone Systems has introduced two related offerings built on a specialised vision language model focused on traffic environments. One is a video summarisation tool integrated directly into its XProtect video management software. The other is a vision language model offered as a service for developers and system integrators.

Together, they target a common bottleneck in traffic operations: too much video, too little time.

Turning traffic footage into usable summaries

The video summarisation tool works as a plug-in within the XProtect Smart Client. Instead of manually reviewing long video sequences, operators can submit a short video clip along with a prompt describing what they want to know. The system then generates a structured text summary of what occurred in the footage.

This approach shifts video review from a timeline-based task to a content-based one. Operators can search summaries by what happened in the scene rather than by time or manually added tags.

According to early reports shared by the company, the tool has the potential to reduce false alarm fatigue by up to 30 percent. For traffic control rooms dealing with constant motion, weather effects, and background noise, this filtering capability is critical.

Key functions of the tool include:

  • Converting video segments into structured text summaries inside the XProtect Smart Client

  • Searching summaries based on scene content rather than timestamps

  • Bookmarking and filtering summaries to speed up review workflows

  • Triggering automated summaries through existing event and rule logic

  • Filtering out irrelevant motion to help operators focus on valid traffic events

The tool is available as a free download for XProtect users, with usage billed only when the vision language model is prompted.

Video intelligence as a service for developers

Alongside the operator-facing tool, Milestone has launched avision language model as a service, aimed at developers building traffic and mobility applications.

The service provides API-based access to a production-ready model that can interpret traffic video and respond to prompt-based instructions. Developers can integrate video intelligence into existing systems without setting up their own AI infrastructure or fine-tuning models from scratch.

This model-as-a-service approach is designed to support both early-stage testing and large-scale deployments. Milestone states that development effort can be reduced significantly compared to building and training a similar model independently.

Key capabilities of the service include:

  • Access to a traffic-optimised vision language model

  • Prompt-based control for traffic-related analysis

  • API-first delivery over HTTPS

  • Region-specific models for the US and EU, with additional regions planned

  • Support for standalone applications or integration with existing Milestone products

Pricing follows a pay-per-use model based on API calls, avoiding upfront investments or custom training costs.

Built on traffic data and responsible AI principles

Both offerings are powered by Milestone’s Hafnia vision language model, which has been fine-tuned using 75,000 hours of responsibly sourced real-world traffic video from Europe and the US. Data preparation uses NVIDIA Cosmos Curator, while reasoning capabilities are built on NVIDIA Cosmos Reason.

The model can be deployed on cloud infrastructure or regional datacentres, depending on customer requirements. Milestone has positioned data lineage and regulatory compliance as central design considerations, stating that the training data used for fine-tuning is auditable and aligned with GDPR and EU AI Act requirements.

Andrew Burnett, Acting Chief Technology Officer, Milestone Systems, said the focus is on removing manual friction from video operations while maintaining trust in AI-driven outputs.

“With the vision language model as a service and video summarisation for XProtect, we’re tackling video overload and time-consuming manual work,” he said. “Operators get immediate insight directly within XProtect; builders get API-first access to production-ready intelligence without bespoke training or heavy infrastructure.”

Early interest from city deployments

Cities including Genoa in Italy and Dubuque in the US have expressed interest in applying these capabilities to traffic management use cases. The emphasis is on improving situational awareness while reducing the operational burden on control room staff.

As traffic volumes increase and urban monitoring expands, video summarisation for traffic management may become less about innovation and more about necessity. The shift from watching video to reading it could redefine how traffic operations scale in the years ahead.

Advertisment
video