Hi James,

The old mar345 images were compressed with the "pack" which Bill is
referring to. This is suppoprted in CBFlib.

PNG and jpeg2000 may well do "better" at compression (would like to see
the numbers with this) but are likely to be much slower than something
customised for use with diffraction images. Anything doing complex
mathematical analysis is likely to be slow...

On an example set packed with the scripts in another email I got a
compression ratio with bzip2 of 3.49:1 for 270 frames. This exceeds the
value you quote below, but was from images where some of the detector
was unused, where the packing would probably work better. 

On the question of lossy compression, I think we'd have to ask some data
reduction guru's how much the "noise" would affect the data reduction. I
suspect that the main problem is that the noise added would be
correlated across the image and would therefore affect the background
statistics in a non-trivial way. Although the intensity measurements may
not be badly affected the error estimates on them could be...

Cheers,

Graeme

-----Original Message-----
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
James Holton
Sent: 23 August 2007 18:47
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] diffraction images images/jpeg2000

Well, I know it's not the definitive source of anything, but the
wikipedia entry on JPEG2000 says:
"The PNG (Portable Network Graphics) format is still more
space-efficient in the case of images with many pixels of the same
color, and supports special compression features that JPEG 2000 does
not." 

So would PNG be better?  It does support 16 bit greyscale.  Then again,
so does TIFF, and Mar already uses that.  Why don't they use the LZW
compression feature of TIFF?  The old Mar325 images were compressed
after all. I think only Mar can answer this, but I imagine the choice to
drop compression was because the advantages of compression (a factor or
2 or so in space) are outweighed by the disadvantages (limited speed and
limited compatibility with data processing packages).

How good could lossless compression of diffraction images possibly be?  
I just ran an entropy calculation on the 44968 images on "/data" at the
moment at ALS 8.3.1.  I am using a feature of Andy Hammersley's program
"FIT2D" to compute the entropy.  I don't pretend to completely
understand the algorithm, but I do understand that the entropy of the
image reflects the maximum possible compression ratio.  For these
images, the "theoretical maximum compression ratio" ranged from 1.2 to
4.8 with mean 2.7 and standard deviation 0.7.  The values for Huffmann
encoding ranged from 0.95 to 4.7 with mean 2.4 and standard deviation
1.0.  The correlation coefficient between the Huffmann and "theoretical"

compression ratio was 0.97.  I had a look at a few of the outlier cases.
As one might expect, the best compression ratios are from blank images
(where all the high-order bits are zero).  The #1 hardest-to-compress
image had many overloads, clear protein diffraction and a bunch of ice
rings. 

So, unless I am missing something, I think the best we are going to get
with lossless compression is about 2.5:1.  At least, for individual
frames.  Compressing a data set as a "video" sequence might have
substantial gains since only a few pixels change significantly from
frame-to-frame.  Are there any lossless video codecs out there?  If so,
can they handle 6144x6144 video?

  What about lossy compression?  Yes yes, I know it sounds like a
horrible idea to use lossy compression on scientific data, because it
would change the values of that most precious of numbers: Fobs.  
However, the question I have never heard a good answer to is HOW MUCH
would it change Fobs?  More practically: how much compression can you do
before Fobs changes by more than the value of SIGFobs?  Diffraction
patterns are inherently noisy.  If you take the same image twice, then
photon counting statistics make sure that no two images are exactly the
same.  So which one is "right"?  If the changes in pixel values from a
lossy compression algorithm are always smaller than that introduced by
photon-counting noise, then is lossy compression really such a bad idea?
The errors introduced could be small when compared to errors in say,
scale factors or bulk solvent parameters.  A great deal can be gained in
compression ratio if only "random noise" is removed.  I remember the
days before MP3 when it was lamented that sampled audio files could
never be compressed very well.  Even today bzip2 does not work very well
at all at compressing sampled audio (about 1.3:1), but
mp3 files can be made at a "compression ratio" of 10:1 over CD-quality
audio and we all seem to still enjoy the music.

I suppose the best "lossy compression" is the one that preserves the
features of the image you want and throws out the stuff you don't care
about.  So, in a way, data-reduction programs are probably the best
"lossy compression" we are going to get.  Unfortunately, accurate
"external information" is required (such as the beam center
convention!), so the interface to this "compression algorithm" still a
little more complicated than pkzip.  ;)

Nevertheless, there are still only about 40-50 "formats" of diffraction
images floating around (30 operating beamlines, a few in-house vendors
and some "history").  I would like to collect them all.  So, if anybody
out there has a lysozyme data set (or even just a single image) from
anywhere but ALS 8.3.1, please let me know how I can get a copy of it!

-James Holton
MAD Scientist


Harry Powell wrote:
> Hi
>
> Just to add to this. imgCIF (or CBF, which amounts to pretty well the 
> same thing) has fast and efficient compression built in, and has been 
> developed with protein crystallography (particularly) in mind. There 
> are even (a few) detectors out there which will write these instead of

> (or as well as) the manufacturer's native format, saving the user the 
> trouble of conversion.
>
> If you're looking for a standard format for storing image data in, I 
> wouldn't look any further, since (in principle) imgCIF/CBF can store 
> all the image information you (or a fussy^H^H^H^H^H conscientious 
> reviewer who could be bothered to re-process your dataset) would want 
> about your data collection and you wouldn't need to come up with 
> inventive tags for data items that might be required for other 
> (general purpose) image formats.
>
> There are even conversion programs available to convert to imgCIF/CBF 
> files from some native formats - if your favourite detector isn't one 
> of these, drop Herb Bernstein a line and ask for support ;-)
>
>> I looked at jpeg2000 as a compression for diffraction images for 
>> archiving purposes - it works well but is *SLOW*. It's designed with 
>> the idea in mind of compressing a single image, not the several 
>> hundred typical for our work. There is also no place to put the
header.
>>
>> Bzip2 works pretty much as well and is standard, but again slow. This

>> is what people mostly seem to use for putting diffraction images on 
>> the web, particularly the JCSG.
>>
>> The ccp4 "pack" format which has been around for a very long time 
>> works very well and is jolly quick, and is supported in a number of 
>> data processing packages natively (Mosflm, XDS). Likewise there is a 
>> new compression being used for the Pilatus detector which is quicker
again.
>> These two have the advantage of being designed for diffraction images

>> and with speed in mind.
>>
>> So there are plenty of good compression schemes out there - and if 
>> you use CBF these can be supported natively in the image standard... 
>> So you don't even need to know or care...
>>
>> Just my 2c on this one.
>>
>> Cheers,
>>
>> Graeme
>>
>> -----Original Message-----
>> From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of

>> Maneesh Yadav
>> Sent: 18 August 2007 00:02
>> To: CCP4BB@JISCMAIL.AC.UK
>> Subject: [ccp4bb] diffraction images images/jpeg2000
>>
>> FWIW, I don't agree with storing image data, I don't think they 
>> justify the cost of storage even remotely (some people debate the 
>> value of the structures themselves....)...but if you want to do it 
>> anyway, maybe we should use a format like jpeg2000.
>>
>> Last time I checked, none of the major image processing suites used 
>> it, but it is a very impressive and mature format that (I think) 
>> would be suitable for diffraction images.  If anyone is up for 
>> experimenting, you can get a nice suite of tools from kakadu (just 
>> google kakdu + jpeg2000).
>>
>
> Harry

Reply via email to