On 2025/01/30 13:27, Dave Voutila wrote:
> Stuart Henderson <s...@spacehopper.org> writes:
> 
> > On 2025/01/30 08:15, Dave Voutila wrote:
> >>
> >> FWIW we should be able to include Vulkan support as its in ports. I've
> >> played with llama.cpp locally with it, but I don't have a GPU that's
> >> worth a damn top see if it's an improvement over pure CPU-based
> >> inferencing.
> >
> > Makes sense, though I think it would be better to commit without and
> > add that later.
> >
> >> Also should this be arm64 and amd64 specific? I'm not a ports person so
> >> not sure :)
> >
> > Do you mean for llama.cpp at all, or just the vulkan support?
> > (If it's "at all", afaik the original intention was that - like
> > whisper.cpp - it would run without anything special).
> 
> I think some of its cpu-based inferencing relies on specific cpu
> extensions, like AVX. Not sure it's truly cross-platform. I may be
> wrong.

I _think_ it should be ok:

- Plain C/C++ implementation without any dependencies

as well as

- Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate 
and Metal frameworks
- AVX, AVX2, AVX512 and AMX support for x86 architectures
- 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization 
for faster inference and reduced memory use
- Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via 
HIP and Moore Threads MTT GPUs via MUSA)
- Vulkan and SYCL backend support
- CPU+GPU hybrid inference to partially accelerate models larger than the total 
VRAM capacity

Reply via email to