Re: [Libjpeg-turbo-devel] libjpeg8c vs libjpeg-turbo with libjpeg8 compat on

2011-10-28 Thread Michael K. Edwards
Here's one way that NEON could be employed to accelerate Huffman decoding. The most common 32 symbols typically account for over 99% of Huffman codes in a JPEG image, and are typically encoded with codons of length 2-10 bits. Four 128-bit registers can hold these 32 codons as left-justified 16-bi

Re: [Libjpeg-turbo-devel] libjpeg8c vs libjpeg-turbo with libjpeg8 compat on

2011-10-27 Thread DRC
On 10/27/11 2:30 PM, Siarhei Siamashka wrote: > Also huffman decoder optimizations (which are C code, not SIMD) in > libjpeg-turbo seem to be providing only some barely measurable > improvement on ARM, while huffman speedup is clearly more impressive > on x86. This gives libjpeg-turbo more points o

Re: [Libjpeg-turbo-devel] libjpeg8c vs libjpeg-turbo with libjpeg8 compat on

2011-10-27 Thread Justin Lebar
> I have spent much time investigating > that as well, and I couldn't manage to find a method that didn't require > moving data back and forth between the SIMD registers and the regular > registers (because you can't branch when using SIMD instructions, and > branching is somewhat critical to the H