Cloudflare Upgrades AI Inference Platform with Enhanced GPU, Faster Performance, and Updated Features

SWorkers AI, the leading serverless AI platform, has announced the introduction of new powerful features that enable developers to create faster, more efficient AI applications. This latest update allows applications built on Workers AI to benefit from quicker inference, support for larger models, enhanced performance analytics, and more.

Designed for global accessibility, Workers AI allows for AI inference to be executed closer to users, no matter where they are located. As large language models (LLMs) become smaller and more efficient, network speed is emerging as a critical factor for customer adoption and seamless AI interactions. Cloudflare’s globally distributed network minimizes latency, setting it apart from traditional networks that often rely on concentrated resources in limited data centers.

With GPUs available in over 180 cities worldwide, Workers AI has one of the largest global footprints among AI platforms, facilitating local AI inference while keeping customer data closer to home.

Matthew Prince, Co-Founder and CEO of Cloudflare, highlighted the growing importance of network speed as AI becomes a more integral part of daily life. “As AI workloads shift from training to inference, performance and regional availability will be essential for supporting the next phase of AI development,” he stated.

Key New Features of Workers AI:

Enhanced Performance and Support for Larger Models: Cloudflare has upgraded its global network with more powerful GPUs, improving AI inference performance. Workers AI can now handle significantly larger models, including Llama 3.1 70B and various Llama 3.2 models. This capability allows applications to tackle complex tasks more efficiently, resulting in seamless user experiences.
Improved Monitoring with Persistent Logs: The new persistent logs feature, currently in open beta, allows developers to store user requests and model responses over extended periods. This enhancement offers better insights into application performance, helping developers refine user experiences based on data regarding request costs and durations.
Faster and More Affordable Queries: Cloudflare’s vector database, Vectorize, is now generally available and supports up to five million vector indices. The average query latency has improved dramatically, dropping from 549 milliseconds to just 31 milliseconds, enabling quicker access to relevant information and reducing operational costs for AI applications.

Source Link