Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++

M. Zhou Thu, 06 Feb 2025 06:18:25 -0800

On Thu, 2025-02-06 at 09:13 +0100, Christian Kastner wrote:
> 
> I meant to ask anyway: performance-wise, is it comparable to your local
> build? I mean, I wouldn't know what in the code would alter this, but I
> built and tested this on platti.d.o and performance was poor, so another
> data point would be useful.


For ppc64el, the llama.cpp-blas backend is way slower than the -cpu backend.
I did not test on amd64. But on ppc64el the package does not feel different
than local build.

CPU is slow anyway. How does HIP performs?

phi-4-q4.gguf | power9, cpu (8-threads) | 0.62 tokens/s
phi-4-q4.gguf | amd64, 13900H           | 6.7 tokens/s

GPU is way faster than this. The phi-4 model does not fit in my nvidia GPU.
No number for GPU this time.

Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++

Reply via email to