Inference Decode KV Cache

Dnotitia Unveils STAR-KV, Achieving UP to 20x KV Cache Compression, Selected as an ICML 2026 Spotlight Paper

Introduces a low-rank-based approach to KV cache compression, one of the key bottlenecks in long-context AI; Speeds up ...

2don MSN

DeepSeek's DSpark just made Nvidia's most important new bet harder to close

DeepSeek just released DSpark, an inference module that makes its AI models 60% to 85% faster without new hardware. Nvidia is ...

SDxCentral

DDN, Google Cloud claim Lustre KV cache trick boosts AI inference throughput 75%

DDN added new capabilities to the Lustre platform it manages with Google Cloud, including means to share key-value (KV) cache to boost AI inference workloads. Unveiled at Google’s annual Next event, ...

Semiconductor Engineering

Dynamic KV Cache Scheduling in Heterogeneous Memory Systems for LLM Inference (Rensselaer Polytechnic Institute, IBM)

A new technical paper titled “Accelerating LLM Inference via Dynamic KV Cache Placement in Heterogeneous Memory System” was published by researchers at Rensselaer Polytechnic Institute and IBM. “Large ...

Why AI Infrastructure Bottlenecks Are Moving Beyond GPUs

The variable most organizations are missing isn’t compute — it’s storage purpose-built for AI context, not just data capacity ...

Network World

Tether is shipping TurboQuant KV-cache quantization with Vulkan support into its QVAC SDK

The latest release of qvac-fabric-llm.cpp, the inference engine of the QVAC Fabric LLM, features TurboQuant integration for resource management in long-running inference sessions. Tether adopts the ...

Business Wire

Penguin Solutions Introduces Industry's First Production-Ready CXL-Based KV Cache Server

FREMONT, Calif.--(BUSINESS WIRE)--Penguin Solutions, Inc. (Nasdaq: PENG), the AI factory platform company, today announced the industry's first production-ready KV cache server that utilizes CXL ...

16d

AI hit the memory wall — now it needs a new context tier

As inference workloads evolve from discrete question-and-answer exchanges into persistent, multi-step agentic systems, GPU ...

Yahoo Finance

VAST Data Redesigns AI Inference Architecture for the Agentic Era with NVIDIA

Remote-First-Company | NEW YORK CITY, Jan. 05, 2026 (GLOBE NEWSWIRE) -- VAST Data, the AI Operating System company, today announced a new inference architecture that enables the NVIDIA Inference ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results