Re: Android libjpeg

2011-06-23 Thread DRC
Yes, I'd think we'd want to merge the v6 support into libjpeg-turbo and
verify its correct operation before trying to replace the version of
libjpeg in Android.  Also, v6 would need to be selected using the same
mechanisms (or similar) to the ones we currently use to select NEON.

I also wanted to let you guys know that I have set up a
libjpeg-turbo-devel list
(https://lists.sourceforge.net/lists/listinfo/libjpeg-turbo-devel) which
can be used to submit patches to the project or talk about development
topics specific to the libjpeg-turbo code.  You can also use the Patch
tracker on Sourceforge to submit patches and discuss them.


On 6/22/11 8:30 PM, Christian Robottom Reis wrote:
> Hi there,
> 
> I took a look at the AOSP libjpeg code which is included in
> 
>  git://android.git.kernel.org/platform/external/jpeg
> 
> during my flight back home (which incidentally had been diverted and
> landed me in Rio de Janeiro; not sure if I celebrate or cry) and noted
> the following things:
> 
> - There is a v6 implementation of the fast IDCT algorithm which
>   lives in armv6_idct.S.
> 
> - The commit which adds this implementation was added October 2010,
>   and there haven't been any changes since.
> 
> - The code that selects the decoder IDCT implementation in
>   jddctmgr.c always uses that implementation if ANDROID_ARMV6_IDCT
>   is defined.
> 
> - Google have an "ashmem" backing store implementation, and have
>   code to enable tile-based mode. It's a fairly non-intrusive change
>   to use ashmem since it just replaces jpeg_open_backing_store.
> 
> - The code is pretty much standard libjpeg without any structural
>   changes to it.
> 
> - There isn't any NEON code in this branch.
> 
> - Mans has an optimized version here:
> 
> http://git.mansr.com/?p=libjpeg;a=summary
> 
>   I don't know if he's submitted this to AOSP or not.
> 
> This suggests to me that a simple drop-in of libjpeg-turbo might be
> actually easy to do, and that there is probably a significant
> performance benefit to be achieved. One thing to keep in mind is that
> this code still supports armv6, so we'd probably want to preserve that.
> 
> Thanks!

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: Android libjpeg

2011-06-30 Thread DRC
I still posit that it's possible to avoid many of those inefficiencies by using 
a sufficiently large buffer in libjpeg-turbo and using an in-memory 
source/destination manager. Much of the inefficiency in the code relates to the 
buffering that it does to avoid reading the entire image into memory.

I also hasten to point out that not all of the compute-intensive parts of the 
code are NEON-accelerated. The general speedup we're seeing in NEON vs non-NEON 
is about 1.5-2x rather than the 3-4x we see with x86-64. Not sure whether ARM 
is 64-bit, but using 64-bit code will improve Huffman en/decoding performance 
significantly. It may also be the case that the hand-tuned code I wrote in the 
Huffman codec is making performance assumptions based on x86 that aren't true 
for ARM. It would be interesting to see what the speedup is with the 
unoptimized Huffman code out of libjpeg. At least on x86, Huffman can account 
for 40% of the compute time, so optimizing it further has a potentially big 
pay-off. However, I've personally spent hundreds of hours getting it where it 
is, and I have a gut feeling that further optimization of it would require 
dropping down to assembly.

On Jun 29, 2011, at 3:03 PM, Måns Rullgård  wrote:

> Vladimir Pantelic  writes:
> 
>> Mandeep Kumar wrote:
>>> Hi All,
>>> 
>>> I have done some benchmarking on OMAP4  running Ubuntu for various versions 
>>> of libjpegs. Benchmarks were collected with
>>> modified version of djpeg that prints out ms time taken for decoding. 
>>> Sample used for benchmarking is a 12MP image
>>> downloaded from a photography website. Here are the results:
>> 
>> ...
>> 
>>> libjpeg-turbo trunk version that has NEON patches (5 runs). 
>>> *http://libjpeg-turbo.svn.sourceforge.net/viewvc/libjpeg-turbo/*
>>> * Decoding Time for Run 1: 1068 ms
>>>  Decoding Time for Run 2: 1065 ms
>>>  Decoding Time for Run 3: 1093 ms
>>>  Decoding Time for Run 4: 1066 ms
>>>  Decoding Time for Run 5: 1067 ms
>>> *Median Decoding Time: 1067 ms*
>> 
>> One remark:
>> 
>> a 12MP image decoded in 1076ms equals ~12MP/s decoding speed.
>> 
>> decoding a 640x480 MJPEG file on a 1GHz OMAP4 using libavcodec
>> gives me an average decoding time per frame of ~10ms which yields:
>> 
>> 640x480/10ms = ~30MP/s
>> 
>> so roughly 2.5 times faster.
>> 
>> Either I am doing something wrong or this libjpeg-turbo is not so turbo.
> 
> Libjpeg (turbo or regular) is full of inefficiencies.  I guess they all
> add up.
> 
> -- 
> Måns Rullgård
> m...@mansr.com

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: Linaro 11.08 + libjpeg-turbo for linux + android

2011-08-19 Thread DRC
Sounds good.  Siarhei has just submitted a new patch for implementing
accelerated ISLOW decoding, and he plans to tweak that over the coming
days.  Unless anyone sees a reason not to, I would like to release the
official libjpeg-turbo 1.2 beta in September or early October.


On 7/22/64 1:59 PM, Tom Gall wrote:
> All,
> 
> The current 1.1.90 code with mandeep's reworked change that was
> accepted yesterday passes all make test and correctly displays the
> android reference image that was showing quality problems with the
> older 1.1.1 androidized proof of concept.
> 
> Given this situation for Linaro's 11.08 release we are going to ship
> the upstream 1.1.90 version. I do not believe we should do any further
> development with the older 1.1.x branch of code.  This works well for
> linux and will make Monday's RC build. (for those on the
> libjpeg-turbo-devel list, Linaro ships a reference image every month)
> 
> For android the situation is a little less clear. Basically we'll need
> to re-forward-port the android specific changes to the 1.1.90 code. It
> will take some time and we will submit this upstream to the
> libjpeg-turbo project of course. I don't want a hack like the POC for
> android was. So for the android team short term I think it's your
> choice, you can continue to include the 1.1.1 POC in your builds but
> for things that might be busted, you get to keep both halves. I think
> the POC has served it's purpose and now's the time to focus on what
> will benefit both the Linaro Android WG, the upstream libjpeg-turbo
> community longer term.
> 
> Thanks!
> 

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Cross-posting to libjpeg-turbo-devel

2011-10-27 Thread DRC
Hi.  If you are attempting to cross-post to both linaro-dev and
libjpeg-turbo-devel, please subscribe to libjpeg-turbo-devel.
Otherwise, every time you CC libjpeg-turbo-devel, the message will
automatically be moderated, and I have to manually log in and approve
each one.

DRC

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: [Libjpeg-turbo-devel] libjpeg8c vs libjpeg-turbo with libjpeg8 compat on

2011-10-27 Thread DRC
On 10/27/11 2:30 PM, Siarhei Siamashka wrote:
> Also huffman decoder optimizations (which are C code, not SIMD) in
> libjpeg-turbo seem to be providing only some barely measurable
> improvement on ARM, while huffman speedup is clearly more impressive
> on x86. This gives libjpeg-turbo more points over IJG jpeg on x86 as a
> result.

In general, the Huffman codec improvements produce a greater speedup on
64-bit vs. 32-bit and a greater speedup when compressing vs.
decompressing.  So, whereas libjpeg-turbo's Huffman codec realizes about
a 25-50% improvement vs. the libjpeg Huffman codec when doing
compression using 64-bit code, it only realizes a few percent speedup
vs. libjpeg when doing decompression using 32-bit code.  The Huffman
algorithm uses a single register as a bit bucket, and the fewer times it
has to shift in new bits to that register, the faster it is.  That's why
it's so much faster on 64-bit vs. 32-bit.

The Huffman codec is probably the single biggest piece of low-hanging
fruit in the entire code base, since it represents something like 40-50%
of total execution time in many cases.  I've spent hundreds of hours
looking at it, and the basic problem with the 32-bit code seems to be
register exhaustion.  After trying many different approaches, the C
code, as currently written, seems to produce the best possible
performance on 32-bit x86 without sacrificing any performance on 64-bit
x86.  However, that doesn't mean that it couldn't be improved upon--
perhaps even dramatically-- by using hand-written assembly.  Other
codecs, such as the Intel Performance Primitives, manage to produce
similar Huffman performance on both 64-bit and 32-bit.  libjpeg-turbo
can mostly match their 64-bit performance but not their 32-bit
performance, which leads me to believe that they're doing something
fundamentally different with their Huffman codec.  Perhaps they are even
using SIMD instructions, although I have spent much time investigating
that as well, and I couldn't manage to find a method that didn't require
moving data back and forth between the SIMD registers and the regular
registers (because you can't branch when using SIMD instructions, and
branching is somewhat critical to the Huffman algorithm.)

If someone could manage to fix, or even improve, the way registers are
used in the 32-bit Huffman codec, it would greatly benefit both ARM and x86.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: Linaro multimedia work group mini-summit minutes and actions

2011-06-15 Thread DRC
Just FYI-- for those interested in the ARM porting effort in
libjpeg-turbo, the iOS and Android camps reached a resolution, and the
patches they agreed upon have been committed to trunk.  Independent
evaluation by any of you who have an interest in this would be very much
appreciated.


On 6/14/11 9:41 AM, Kurt Taylor wrote:
> I'd like to thank everyone who attended the Linaro multimedia work group
> mini-summit last week, especially on such short notice. It was great to
> get to meet everyone face-to-face. I think we made excellent progress on
> the technical challenges for the MMWG team for the next cycle.
> 
> I have attached a consolidated and partially formatted log of the
> minutes and actions from the meeting.  I think they are fairly complete,
> please let me know if there is anything missed that needs to be
> mentioned. You can also access the individual logs from the event
> agenda: http://wiki.linaro.org/Events/2011-06-MMWG
> 
> We will be working the actions of this meeting and may be contacting you
> for assistance. The resulting plan will be initially reviewed at the
> public plan review: http://wiki.linaro.org/Cycles//PublicPlanReview
> 
> The next time we will meet to fully discuss MMWG work topics will be at
> the Linaro Developer Summit at Orlando, a part of the Ubuntu Developer
> Summit. Please join us if youcan. More information on LDS/UDS is
> available here: http://uds.ubuntu.com/  I hope to see you there!
> 
> Regards,
> -- 
> 
> Kurt Taylor (irc krtaylor)
> Linaro Multimedia Team Lead
> 

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev