Re: [ccp4bb] AW: Re: [ccp4bb] Basic Crystallography/Imaging Conundrum

Dale Tronrud Fri, 10 Nov 2017 08:07:43 -0800

   A second observation of the same experimental quantity does not
double the amount of "information".  We know from the many discussions
on this forum that the improvement of multiplicity is diminishing with
repetition.


   Measuring "information content" is very hard.  You can't just count
the bytes and say that measures the information content.  My example of
an oversampled map proves the point - The file is much bigger but can be
calculated exactly from the same, relatively small, number of
reflections.  The ultimate extreme is a map calculated from just the
F000 term.  One number can produces a map with gigibytes of data - It
just happens that all the numbers are equal.

   While our Bragg spots are pretty much independent measurements, after
merging, Herman is right about microscopes.  The physical nature of the
instrument introduces relationships between the values of the voxels so
the information content is smaller, perhaps by a lot, than the number of
bytes in the image.  You have to have a deep understanding of the lens
system to work out what is going on.  And a second image of the same
instrument of the same object measured a mSec later will be very highly
correlated with the first and add very little new "information" to the
experiment.

   BTW while we write maps as a set of numbers arranged in the 3D array,
it is not equivalent to an image.  The pixels, or voxels in 3D, indicate
the average value of that region while our map files contain the value
of the density at a particular point.  Our numbers are very distinct,
while pixels can be quite confusing.  In many detectors the area
averaged over is somewhat larger than the spacing of the pixels giving
the illusion of greater detail w/o actually providing more information.
This occurs in our CCD detectors where the X-ray photons are converted
to a lower frequency light by some sort of phosphor and in a microscope
by a poor lens (also as mentioned by Herman).

   Measuring information content is hard, which is why it is usually not
considered a rigorous quantity.  The classic example is the value of
ratio of the circumference of a circle to its diameter.  This number has
an infinite number of digits which could be considered an infinite
amount of information.  I can simply type "Pi", however, and accurately
express that infinity of information.  Just how much information is present?

Dale Tronrud

On 11/10/2017 6:47 AM, Keller, Jacob wrote:
> It seems, then, to be generally agreed that the conversion between voxels and 
> Fourier terms was valid, each containing the same amount of information, but 
> the problem was in the representation, and there was just trickery of the 
> eye. I was thinking and hoping this would be so, since it allows a pretty 
> direct comparison of crystal data to microscopic imaging data. I guess a 
> litmus test would be to decide whether a voxel version of the electron 
> density map would work equivalently well in crystallographic software, which 
> I suspect it would. If so, then the same techniques--so effective in 
> extracting information for the relatively information-poor crystal 
> structures--could be used on fluorescence imaging data, which come in voxels.
> 
> Regarding information-wealth, in Dale's example, the whole hkl set was 4.1 
> MB. One frame in a garden-variety XYZT fluorescence image, however, contains 
> about 2000 x 2000 x 100 voxels at 16-bit, i.e., 400 million bits or 50 MB. In 
> some data sets, these frames come at 10 Hz or more. I suspect that the 
> I/sigma is also much better in the latter. So, with these data, and keeping a 
> data:parameters ratio of ~4, one could model about 100 million parameters. 
> This type of modelling, or any type of modelling for that matter, remains 
> almost completely absent in the imaging world, perhaps because the data size 
> is currently so unwieldy, perhaps also because sometimes people get nervous 
> about model biases, perhaps also because people are still improving the 
> imaging techniques. But just imagine what could be done with some 
> crystallography-style modelling!
> 
> Jacob Keller
> 
> 
> 
> -----Original Message-----
> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Tristan 
> Croll
> Sent: Friday, November 10, 2017 8:36 AM
> To: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] AW: Re: [ccp4bb] Basic Crystallography/Imaging Conundrum
> 
> Or a nice familiar 2D example: the Ramachandran plot with 7.5 degree binning, 
> as a grid (left) or with bicubic smoothing (right). Different visualisations 
> of the same data, but the right-hand image uses it better.
> 
> On 2017-11-10 08:24, herman.schreu...@sanofi.com wrote:
>> In line with Dale's suggestions, I would suggest that you reformat 
>> your voxel map into the format of an electron density map and look at 
>> it with coot. I am sure it will look much better and much more like 
>> the electron density we are used to look at. Alternatively, you could 
>> display an bona fide electron density map as voxel blocks and I am 
>> sure it will look similar to the voxel map you showed in your first 
>> email.
>>
>> Best,
>> Herman
>>
>> -----Ursprüngliche Nachricht-----
>> Von: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] Im Auftrag von 
>> Dale Tronrud
>> Gesendet: Freitag, 10. November 2017 08:08
>> An: CCP4BB@JISCMAIL.AC.UK
>> Betreff: [EXTERNAL] Re: [ccp4bb] Basic Crystallography/Imaging 
>> Conundrum
>>
>>    Ethan and I apparently agree that anomalous scattering is "normal"
>> and Friedel's Law is just an approximation.  I'll presume that your 
>> "unique" is assuming otherwise and your 62,500 reflections only 
>> include half of reciprocal space.  The full sphere of data would 
>> include 125,000 reflections.  Since the cube root of 125,000 is 50 you 
>> get a range of indices from -25 to +25 which would give you 2 A 
>> resolution, which is still far from your hope of 1 A.
>>
>>    For your test case of 1 A resolution with 50 A cell lengths you 
>> want your indices to run from -50 to +50 giving a box of reflections 
>> in reciprocal space 101 spots wide in each direction and a total of
>> 101^3 =
>> 1,030,301 reflections. (or 515,150.5 reflections for your Friedel 
>> unique with the "half" reflection being the F000 which would then be 
>> purely real valued.)
>>
>>    Assuming you can fit your structure factors into 16 bits (You had 
>> better not have many more than 10,000 atoms if you don't want your
>> F000 to overflow.) the information content will be 1,030,301 * 2 * 16 
>> bits (The "2" because they are complex.) giving 32,969,632 bits.
>>
>>    If you spread this same amount of information across real space you 
>> will have 1,030,301 complex density values in a 50x50x50 A space 
>> giving a sampling rate along each axis of 101 samples/unit cell.
>>
>>    Complex density values?  The real part of the density is what we 
>> call the electron density and the imaginary part we call the anomalous 
>> density.  If there is no anomalous scattering then Friedel's Law holds 
>> and the number of unique reflections is cut in half and the density 
>> values are purely real valued - The information content in both spaces 
>> is cut in half and they remain equal.
>>
>>    By sampling your unit cell with 101 samples their rate is half that 
>> of the wavelength of the highest frequency reflection.  (e.q. a 
>> sampling rate of 0.5 A for 1 A resolution data)  This is, of course, 
>> the Nyquist Theorem which states that you have to sample at twice the 
>> frequency of the highest resolution Fourier coefficient.
>>
>>   This is exactly how an FFT works.  It allocates the memory required 
>> to store the structure factors and it returns the map in that same 
>> array - The number of bytes is unchanged.  It also guarantees that the 
>> calculation is reversible as no information is lost in either 
>> direction.
>>
>>    So, why does your blocky image look so bad?  First you have sampled 
>> too coarsely.  You should have twice the sampling rate in each 
>> direction.
>>
>>    The next point is more subtle.  You are displaying each voxel as a 
>> block.  This is not correct.  The sharp lines that occur at the 
>> boundaries between the blocks is a high frequency feature which is not 
>> consistent with a 1 A resolution image.  Your sample points should be 
>> displayed at discrete points since they are not the average density 
>> within a block but the value of the density at one specific point.
>>
>>    What is the density of the map between the sampled points?  The 
>> Fourier series provides all the information needed to calculate them 
>> and you can calculate values for as fine a sampling rate as you like, 
>> just remember that you are not adding any more information because 
>> these new points are correlated with each other.
>>
>>    If you have only the samples of a map and want to calculate Fourier 
>> coefficients there are many sets of Fourier coefficients that will 
>> reproduce the sampled points equally well.  We specify a unique 
>> solution in the FFT by defining that all reflections of resolution 
>> higher than 1 A must be identically equal to zero.  When you calculate 
>> a map from a set of coefficients that only go to 1 A resolution this 
>> is guaranteed.
>>
>>    When you are calculating coefficients from any old map you had 
>> better ensure that the map you are sampling does not contain 
>> information of a higher resolution than twice your sampling rate.
>> This is a problem when calculating Fcalc from an atomic model.  You 
>> calculate a map from the model and FFT it, but you can't sample that 
>> map at 1/2 the resolution of your interest.  You must sample that map 
>> much more finely because an atomic model implies Fourier coefficients 
>> of very high resolution.
>> (Otherwise phase extension would be impossible)  This problem was 
>> discussed in detail in Lynn Ten Eyck's 1976 paper on Fcalc FFT's but 
>> is often forgotten.  Gerard Bricogne's papers on NCS averaging from 
>> the 1970's also discusses these matters in great depth.
>>
>>    In summary, your blocky picture (even with double sampling) is not 
>> a valid representation because it is not blurry like a 1 A resolution 
>> map should be.  To create an accurate image you need to oversample the 
>> map sufficiently to prevent the human eye from detecting aliasing 
>> artifacts such as the straight lines visible in your blocky picture.
>> This requires very fine sampling because the eye is very sensitive to 
>> straight lines.  When using a map for any purpose other than FFTing 
>> you will need to oversample the map by some amount to prevent aliasing 
>> artifacts and the amount of oversampling will depend on what you are 
>> doing to the map (Again, see Gerard's papers.)  Such an oversampled 
>> map will have many more voxels but no more information because the 
>> density values are correlated.
>>
>> Dale Tronrud
>>
>> On 11/9/2017 4:10 PM, Keller, Jacob wrote:
>>> Dear Crystallographers,
>>>
>>>  
>>>
>>> I have been considering a thought-experiment of sorts for a while, 
>>> and wonder what you will think about it:
>>>
>>>  
>>>
>>> Consider a diffraction data set which contains 62,500 unique 
>>> reflections from a 50 x 50 x 50 Angstrom unit cell, with each 
>>> intensity measured perfectly with 16-bit depth. (I am not sure what 
>>> resolution this corresponds to, but it would be quite high even in 
>>> p1, I think--probably beyond 1.0 Angstrom?). Thus, there are 62,500 x 
>>> 16 bits (125 KB) of information in this alone, and there is an HKL 
>>> index associated with each intensity, so that I suppose contains 
>>> information as well. One could throw in phases at 16-bit as well, and 
>>> get a total of 250 KB for this dataset.
>>>
>>>  
>>>
>>> Now consider an parallel (equivalent?) data set, but this time 
>>> instead of reflection intensities you have a real space voxel map of 
>>> the same
>>> 50 x 50 x 50 unit cell consisting of 125,000 voxels, each of which 
>>> has a 16-bit electron density value, and an associated xyz index 
>>> analogous to the hkl above. That makes a total of 250 KB, with each 
>>> voxel a 1 Angstrom cube. It seems to me this level of graininess 
>>> would be really hard to interpret, especially for a static picture of 
>>> a protein structure. (see attached: top is a ~1 Ang/pixel 
>>> down-sampled version of the image below).
>>>
>>>  
>>>
>>> Or, if we wanted smaller voxels still, let's say by half, we would 
>>> have to reduce the bit depth to 2 bits. But this would still only 
>>> yield half-Angstrom voxels, each with only four possible electron 
>>> density values.
>>>
>>>  
>>>
>>> Is this comparison apt? Off the cuff, I cannot see how a 50 x 50 
>>> pixel image corresponds at all to the way our maps look, especially 
>>> at around
>>> 1 Ang resolution. Please, if you can shoot down the analogy, do.
>>>
>>>  
>>>
>>> Assuming that it is apt, however: is this a possible way to see the 
>>> power of all of our Bayesian modelling? Could one use our modelling 
>>> tools on such a grainy picture and arrive at similar results?
>>>
>>>  
>>>
>>> Are our data sets really this poor in information, and we just model 
>>> the heck out of them, as perhaps evidenced by our scarily low 
>>> data:parameters ratios?
>>>
>>>  
>>>
>>> My underlying motivation in this thought experiment is to illustrate 
>>> the richness in information (and poorness of modelling) that one 
>>> achieves in fluorescence microscopic imaging. If crystallography is 
>>> any measure of the power of modelling, one could really go to town on 
>>> some of these terabyte 5D functional data sets we see around here at 
>>> Janelia (and on YouTube).
>>>
>>>  
>>>
>>> What do you think?
>>>
>>>  
>>>
>>> Jacob Keller
>>>
>>>  
>>>
>>> +++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>> Jacob Pearson Keller
>>>
>>> Research Scientist / Looger Lab
>>>
>>> HHMI Janelia Research Campus
>>>
>>> 19700 Helix Dr, Ashburn, VA 20147
>>>
>>> (571)209-4000 x3159
>>>
>>> +++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>  
>>>

Re: [ccp4bb] AW: Re: [ccp4bb] Basic Crystallography/Imaging Conundrum

Reply via email to