LLM Inference Optimization

New LLM optimization technique slashes memory costs up to 75%

Researchers at the Tokyo-based startup Sakana AI have developed a new technique that enables language models to use memory more efficiently, helping enterprises cut the costs of building applications ...

NVIDIA Enters Production With Dynamo, the Broadly Adopted Inference Operating System for AI Factories

NVIDIA Dynamo 1.0 provides a production-grade, open source foundation for inference at scale.Dynamo and NVIDIA TensorRT-LLM ...

Semiconductor Engineering

HW-SW Co-Designed System With 3 Core Optimization Pathways For Long-Context Agentic LLM Inference (Cambridge, ICL)

A new technical paper titled “Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference” was published by researchers at University of Cambridge, Imperial College London ...

Search Engine Land

LLM optimization in 2026: Tracking, visibility, and what’s next for AI discovery

Marketing, technology, and business leaders today are asking an important question: how do you optimize for large language models (LLMs) like ChatGPT, Gemini, and Claude? LLM optimization is taking ...

Business Wire

MangoBoost Launches Mango LLMBoost™: AI Inference Optimization Software with Up to 12.6x Relative Performance Improvement and 92% Cost Savings

BELLEVUE, Wash.--(BUSINESS WIRE)--MangoBoost, a provider of cutting-edge system solutions designed to maximize AI data center efficiency, is announcing the launch of Mango LLMBoost™, system ...

CRN

Nvidia Says New Software Will Double LLM Inference Speed On H100 GPU

The AI chip giant says the open-source software library, TensorRT-LLM, will double the H100’s performance for running inference on leading large language models when it comes out next month. Nvidia ...

NextBigFuture

Defeating Nondeterminism in LLM Inference by Thinking Machines

A research article by Horace He and the Thinking Machines Lab (X-OpenAI CTO Mira Murati founded) addresses a long-standing issue in large language models (LLMs). Even with greedy decoding bu setting ...

Qubrid AI Accelerates Open-Source Model Inferencing with NVIDIA AI Infrastructure and One Single API for Enterprise Agents

Qubrid AI, a leading Open, Inference-First Full-Stack AI Platform company, today at NVIDIA GTC 2026 announced the addition ...

Business Wire

ASC24 Finals Set for April in Shanghai: Focus on Cutting-Edge Large Language Model Inference and Seepage Simulation!

BEIJING--(BUSINESS WIRE)--On January 4th, the inaugural ceremony for the 2024 ASC Student Supercomputer Challenge (ASC24) unfolded in Beijing. With a global interest, ASC24 has garnered the ...

XDA Developers on MSN

Local LLMs are powerful, but cloud AI is still better at these 3 things

There are trade-offs when using a local LLM ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results