Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++

Rix, Tom Mon, 16 Dec 2024 05:33:29 -0800

[AMD Official Use Only - AMD Internal Distribution Only]

I maintain Fedora's llama-cpp
This package is fast moving, best to pick a reasonably recent build and stick 
with it for a while.
I picked ours to solve some CVE's reported against llama-cpp as well as sync 
with our python-llama-cpp  package.
I have stripped out almost all of it and export what is needed by 
python-llama-cpp.
If you want to coordinate with Fedora on versions, let me know.
Tom

-----Original Message-----
From: Cordell Bloor <c...@slerp.xyz>
Sent: Saturday, December 14, 2024 11:46 PM
To: 1063...@bugs.debian.org; Christian Kastner <c...@debian.org>; Petter 
Reinholdtsen <p...@hungry.com>
Cc: Debian ROCm Team <debian...@lists.debian.org>
Subject: Re: Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model 
(and others) in pure C/C++

Hi Christian and Petter,

On Sat, 9 Mar 2024 10:20:32 +0100 Christian Kastner <c...@debian.org> wrote:
 > I've discarded the simple package and now plan another approach: a  > 
 > package that ships a helper to rebuild the utility when needed, similar  > 
 > to DKMS. Rationale:
 > * Continuously developed upstream, no build suited for stable  > * Build 
 > optimized for the current host's hardware, which is a key  > feature. 
 > Building for our amd64 ISA standard would be absurd.
 > I'm open for better ideas, though.

Perhaps we are letting the perfect be the enemy of the good?

There are lots of fast-moving projects that get frozen at some version for 
stable. While that can be annoying for maintenance, it is also something that 
provides value. It's hard to build on top of something that keeps changing.

I would also argue that you're taking on too much responsibility trying to 
enable -march=native optimizations. It's true that you can get significantly 
more performance using AVX instructions available on most modern computers, but 
if llama.cpp really wanted they could implement dynamic dispatch themselves. 
The CPU instruction set is also irrelevant for the GPU-accelerated version of 
the package.

Why not deliver the basics before we try to do something fancy? In the time 
that passed between the creation of this issue and now, Fedora created their 
own llama.cpp package [1]. I think they had the right idea. There's value in 
providing a working package to users today, even if it's imperfect.

Sincerely,
Cory Bloor

[1]: https://packages.fedoraproject.org/pkgs/llama-cpp/llama-cpp/

Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++

Reply via email to