Here's one way that NEON could be employed to accelerate Huffman
decoding. The most common 32 symbols typically account for over 99%
of Huffman codes in a JPEG image, and are typically encoded with
codons of length 2-10 bits. Four 128-bit registers can hold these 32
codons as left-justified 16-bi
On 10/27/11 2:30 PM, Siarhei Siamashka wrote:
> Also huffman decoder optimizations (which are C code, not SIMD) in
> libjpeg-turbo seem to be providing only some barely measurable
> improvement on ARM, while huffman speedup is clearly more impressive
> on x86. This gives libjpeg-turbo more points o
> I have spent much time investigating
> that as well, and I couldn't manage to find a method that didn't require
> moving data back and forth between the SIMD registers and the regular
> registers (because you can't branch when using SIMD instructions, and
> branching is somewhat critical to the H