Package: wnpp Severity: wishlist Owner: Christian Kastner <c...@debian.org> X-Debbugs-Cc: debian-de...@lists.debian.org, debian...@lists.debian.org
* Package name : llama.cpp Version : b2116 Upstream Author : Georgi Gerganov * URL : https://github.com/ggerganov/llama.cpp * License : MIT Programming Lang: C++ Description : Inference of Meta's LLaMA model (and others) in pure C/C++ The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud. * Plain C/C++ implementation without any dependencies * Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks * AVX, AVX2 and AVX512 support for x86 architectures * 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use * Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP) * Vulkan, SYCL, and (partial) OpenCL backend support * CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity This package will be maintained by the Debian Deep Learning Team.