How to Infer - 搜索 News

TensorRT-LLM: A Comprehensive Guide to Optimizing Large Language Model Inference for ...

Learn how to optimize large language models (LLMs) using TensorRT-LLM for faster and more efficient inference on NVIDIA GPUs.

3 天

AMD 即将夺得英伟达的 AI 领先地位

英伟达 ( NASDAQ: NVDA ) 长期以来在 GPU 领域无人能及，尤其在 AI 应用领域的领先地位似乎难以逾越。虽然这种局面可能不会在明天发生改变，但 ...

VentureBeat28 天

DeepMind and UC Berkeley shows how to make the most of LLM inference-time compute

Learn More Given the high costs and slow speed of training large language models (LLMs), there is an ongoing discussion about whether spending more compute cycles on inference can help improve the ...

5 天

深度｜OpenAI o1 的思维链与周鸿祎的慢思考

OpenAI 在 9 月 13 日发布的 o1-preview 模型后，o1 系列凭借其强大推理能力和解决问题能力得到大家广泛关注。有业内人士表示，o1 开创了“强化学习的 Scaling Laws”，即“Inference ...

16 天

用Test Time换Training Time能让LLM更强吗？

2、该工作的核心思路基于一项假设展开。即，通过在测试时（test-time）投入额外的计算资源，LLM 在理论上应该能做到比训练时更好的表现，且这种测试时获得的能力应当还有潜力在智能体和推理任务中带来新的研究方向。

红板报 on MSN6 天

OpenAI o1开辟“慢思考”，国产AI早已集结在CoE“组团”先出发

OpenAI o1发布之后，复杂逻辑推理能力惊艳业界，数理能力达到博士水平。比如此前一直困扰LLMs的“9.9和9.11谁更大”问题，就在o1时代得到了解决。于是有了一种说法：这一轮AI浪潮，中国越追赶越落后，跟OpenAI的差距越来越大。

13 天

为什么这家公司的芯片推理速度比英伟达快20倍？

从更深层次来看，大模型推理速度的瓶颈源于底层计算架构的固有限制，主要体现在存算交换带宽方面，这就是所谓的"存储墙"问题。在传统的冯·诺依曼架构中，计算单元和存储单元是分离的，数据需要在这两个单元之间不断移动，这个过程会消耗大量时间和能源。而随着处理器 ...

12 天

开学第一课 | 黄坚教授：人工智能中的统计学及其相互作用

Jian Huang is a Chair Professor of Data Science and Analytics in the Departments of Data Science and AI, and Applied Mathematics at The Hong Kong Polytechnic University. He obtained his Ph.D. in Stati ...

NextBigFuture28 天

Cerebras Inference – Cloud Access to Wafer Scale AI Chips

The Nvidia multi-tasks its AI inference chips to support more people for AI inference. A cluster of Nvidia H200s is designed to give AI answers to thousands of people at the same time. The 60-90 ...

SiliconANGLE27 天

Cerebras Systems throws down gauntlet to Nvidia with launch of ‘world’s fastest’ AI ...

is raising the stakes in its battle against Nvidia Corp., launching what it says is the world’s fastest AI inference service, and it’s available now in the cloud. AI inference refers to the ...

一些您可能无法访问的结果已被隐去。

显示无法访问的结果