Re: [ccp4bb] diffraction images images/jpeg2000

James Holton Thu, 23 Aug 2007 11:56:36 -0700

Well, I know it's not the definitive source of anything, but thewikipedia entry on JPEG2000 says:"The PNG (Portable Network Graphics) format is still morespace-efficient in the case of images with many pixels of the samecolor, and supports special compression features that JPEG 2000 does not."So would PNG be better? It does support 16 bit greyscale. Then again,so does TIFF, and Mar already uses that. Why don't they use the LZWcompression feature of TIFF? The old Mar325 images were compressedafter all. I think only Mar can answer this, but I imagine the choice todrop compression was because the advantages of compression (a factor or2 or so in space) are outweighed by the disadvantages (limited speed andlimited compatibility with data processing packages).

How good could lossless compression of diffraction images possibly be?I just ran an entropy calculation on the 44968 images on "/data" at themoment at ALS 8.3.1. I am using a feature of Andy Hammersley's program"FIT2D" to compute the entropy. I don't pretend to completelyunderstand the algorithm, but I do understand that the entropy of theimage reflects the maximum possible compression ratio. For theseimages, the "theoretical maximum compression ratio" ranged from 1.2 to4.8 with mean 2.7 and standard deviation 0.7. The values for Huffmannencoding ranged from 0.95 to 4.7 with mean 2.4 and standard deviation1.0. The correlation coefficient between the Huffmann and "theoretical"compression ratio was 0.97. I had a look at a few of the outliercases. As one might expect, the best compression ratios are from blankimages (where all the high-order bits are zero). The #1hardest-to-compress image had many overloads, clear protein diffractionand a bunch of ice rings.So, unless I am missing something, I think the best we are going to getwith lossless compression is about 2.5:1. At least, for individualframes. Compressing a data set as a "video" sequence might havesubstantial gains since only a few pixels change significantly fromframe-to-frame. Are there any lossless video codecs out there? If so,can they handle 6144x6144 video?

What about lossy compression? Yes yes, I know it sounds like ahorrible idea to use lossy compression on scientific data, because itwould change the values of that most precious of numbers: Fobs.However, the question I have never heard a good answer to is HOW MUCHwould it change Fobs? More practically: how much compression can you dobefore Fobs changes by more than the value of SIGFobs? Diffractionpatterns are inherently noisy. If you take the same image twice, thenphoton counting statistics make sure that no two images are exactly thesame. So which one is "right"? If the changes in pixel values from alossy compression algorithm are always smaller than that introduced byphoton-counting noise, then is lossy compression really such a badidea? The errors introduced could be small when compared to errors insay, scale factors or bulk solvent parameters. A great deal can begained in compression ratio if only "random noise" is removed. Iremember the days before MP3 when it was lamented that sampled audiofiles could never be compressed very well. Even today bzip2 does notwork very well at all at compressing sampled audio (about 1.3:1), butmp3 files can be made at a "compression ratio" of 10:1 over CD-qualityaudio and we all seem to still enjoy the music.

I suppose the best "lossy compression" is the one that preserves thefeatures of the image you want and throws out the stuff you don't careabout. So, in a way, data-reduction programs are probably the best"lossy compression" we are going to get. Unfortunately, accurate"external information" is required (such as the beam centerconvention!), so the interface to this "compression algorithm" still alittle more complicated than pkzip. ;)

Nevertheless, there are still only about 40-50 "formats" of diffractionimages floating around (30 operating beamlines, a few in-house vendorsand some "history"). I would like to collect them all. So, if anybodyout there has a lysozyme data set (or even just a single image) fromanywhere but ALS 8.3.1, please let me know how I can get a copy of it!


-James Holton
MAD Scientist


Harry Powell wrote:

Hi
Just to add to this. imgCIF (or CBF, which amounts to pretty well thesame thing) has fast and efficient compression built in, and has beendeveloped with protein crystallography (particularly) in mind. Thereare even (a few) detectors out there which will write these instead of(or as well as) the manufacturer's native format, saving the user thetrouble of conversion.
If you're looking for a standard format for storing image data in, Iwouldn't look any further, since (in principle) imgCIF/CBF can storeall the image information you (or a fussy^H^H^H^H^H conscientiousreviewer who could be bothered to re-process your dataset) would wantabout your data collection and you wouldn't need to come up withinventive tags for data items that might be required for other(general purpose) image formats.
There are even conversion programs available to convert to imgCIF/CBFfiles from some native formats - if your favourite detector isn't oneof these, drop Herb Bernstein a line and ask for support ;-)
I looked at jpeg2000 as a compression for diffraction images for
archiving purposes - it works well but is *SLOW*. It's designed with the
idea in mind of compressing a single image, not the several hundred
typical for our work. There is also no place to put the header.

Bzip2 works pretty much as well and is standard, but again slow. This is
what people mostly seem to use for putting diffraction images on the
web, particularly the JCSG.

The ccp4 "pack" format which has been around for a very long time works
very well and is jolly quick, and is supported in a number of data
processing packages natively (Mosflm, XDS). Likewise there is a new
compression being used for the Pilatus detector which is quicker again.
These two have the advantage of being designed for diffraction images
and with speed in mind.

So there are plenty of good compression schemes out there - and if you
use CBF these can be supported natively in the image standard... So you
don't even need to know or care...

Just my 2c on this one.

Cheers,

Graeme

-----Original Message-----
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
Maneesh Yadav
Sent: 18 August 2007 00:02
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] diffraction images images/jpeg2000

FWIW, I don't agree with storing image data, I don't think they justify
the cost of storage even remotely (some people debate the value of the
structures themselves....)...but if you want to do it anyway, maybe we
should use a format like jpeg2000.

Last time I checked, none of the major image processing suites used it,
but it is a very impressive and mature format that (I think) would be
suitable for diffraction images.  If anyone is up for experimenting, you
can get a nice suite of tools from kakadu (just google kakdu +
jpeg2000).
Harry

Re: [ccp4bb] diffraction images images/jpeg2000

Reply via email to