Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++

Christian Kastner Thu, 06 Feb 2025 00:12:42 -0800

On 2025-02-06 01:33, Petter Reinholdtsen wrote:
> [Christian Kastner]
> I checked in a few minor fixes.


Look fine, though I deliberately skipped the poetry dependency for now
as it looked more like a false positive.

> I noticed llama.cpp depend on llama.cpp-backend with no concrete
> dependency first.  This lead to unpredictable behaviour, and I suggest
> depending on for example 'llama.cpp-cpu | llama.cpp-backend' to make
> sure 'apt install llama.cpp' behave predictably.

This was my intention, but I initially wasn't sure what the default
would be (-cpu or -blas). Looks like I forgot to add one before upload.

> I was sad to discover the server example is missing, as it is the
> llama.cpp progam I use the most.  Without it, I will have to continue
> using my own build.

It'll be re-enabled soon. The were a few generated and minified files in
that example, so I just opted to skip those for now, and focus on the
build process.

> I hope to get whisper.cpp to the same state, so it can have a fighting
> chance to get into testing before the freeze.

Seeing as how closely llama.cpp and whisper.cpp are related, in the
ideal case, you should be able to just carry over some patches, and
mostly just copy d/rules, as llama.cpp and whisper.cpp share the ggml
library on a source basis.

Note that for amd64, there actually is a type of dynamic dispatching for
ggml, though it needs to be patched as the lowest supported level is
AVX. Also something I had planned for the next iteration.

Best,
Christian

Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++

Reply via email to