Hi Ian,

Unix/Linux uses a single byte 0A (the linefeed character) as the line end 
marker for text, while DOS/Win use two bytes 0D 0A (carriage return + line 
feed). Old Mac systems use 0D. (what a mess!) 

So a simple UNIX to DOS operation should expand every 0A in a file to 0D 0A:
perl -pe 's/\n/\r\n/' input.mtz > output.mtz 

For MTZ file, since the main body of it is an array of REAL*4, apparently this 
operation will generate a lot of ‘insertional mutations’. No wonder the data is 
scrambled and the header cannot be found (its position is shifted, no longer at 
where the pointer points).

During the conversion from UNIX to DOS, there is a complication: there could be 
a few 0D 0A already in the original file. Smarter unix2dos programs will keep 
these 0D 0A unchanged so that there won’t be weird-looking 0D 0D 0A (for 
example http://www.thefreecountry.com/tofrodos/ states this in its source file 
and has this behavior). Some dumber methods will simply convert all of them 
into 0D 0D 0A, for example: 
perl -pe 's/\n/\r\n/' input.mtz > output.mtz


When converting from the DOS format back to UNIX format, in our binary file 
case, the dumber method will work but the smarter programs will cause problems. 
Because for all the 0D 0A in the DOS file generated by the smarter programs, 
there is no way to tell which used to be 0A and which used to be 0D 0A in the 
original, and they are all converted back to 0A. Therefore some 0D in the 
original file will be lost. With the dumber method, all 0D 0A were 0A in the 
original file, so there would be no problem changing them all back to 0A.


In a given MTZ, there will almost certainly be a lot of 0A. But 0D 0A could be 
rare or non-existent. So after a unix-dos then dos-unix conversion, the result 
depends on how many 0D 0A were there in the original file, and how the program 
did the UNIX-DOS conversion.

With things like the following, it should be OK:
perl -pe 's/\n/\r\n/' test.mtz > test1.mtz    
perl -pe 's/\r\n/\n/' test1.mtz > test2.mtz

To test the dumber (perl) method, I used an MTZ file, which contains 7 0D 0A in 
the data section. Here is the result:
test.mtz 2030184 bytes    : MTZdump OK
test1.mtz 2032486 bytes  : MTZdump error
test2.mtz 2030184 bytes  : MTZdump OK
cmp test.mtz test2.mtz     : the two files are identical

With todos/fromdos (http://www.thefreecountry.com/tofrodos/ ):
test.mtz 2030184 bytes : MTZdump OK
todos.mtz 2032479 bytes : MTZdump error (note the size difference compared to 
test1.mtz, the 7 0D 0A in the original file were kept unchanged)
fromdos.mtz 2030177 bytes : MTZdump error (7 original 0D 0A were shrinked to 0A)


Ian, in your test with todos and tounix, it seems that the final MTZ still has 
a header information at the correct location, so that MTZdump could read it. 
But some of the numbers saved in the data array seem damaged, so some of the 
stats in MTZdump were out of range. It would be interesting to read the todos 
and tounix source code to see why that happened. Or with a binary file 
comparison tool we might be able to guess the cause by taking a look at the two 
files.

Zhijie






From: Ian Tickle 
Sent: Friday, March 06, 2015 5:59 AM
To: CCP4BB@JISCMAIL.AC.UK 
Subject: Re: [ccp4bb] how to recover my data

Hi, just for fun and to demonstrate what can go horribly wrong if you blindly 
use utilities that were specifically designed for changing the line terminators 
in an ASCII file, I applied these utilities to an MTZ file.


First I used 'todos' to simulate what I suspect the technician has done:


todos < original.mtz > original+todos.mtz


Then I used 'tounix' in an attempt to recover the original file, as others have 
suggested:

tounix < original+todos.mtz > original+todos+tounix.mtz


What can possibly go wrong?  The log files from mtzdump are attached - see for 
yourself! (the mtzdump on original+todos.mtz went into an infinite loop and I 
had to kill the process).


Cheers


-- Ian





On 6 March 2015 at 07:20, Smith Lee 
<00000459ef8548d5-dmarc-requ...@jiscmail.ac.uk> wrote:

  Dear All,

  For the issue of the recovery of the mtz file, I have tried randomly to use 
excel to open one specific mtz file, however in this way all  the mtz files in 
the computer will have a excel icon (X), although the file extension is still 
.mtz. If I tried further to open one specific mtz file (with excel icon) with 
the notepad, all the mtz files will have the notepad icon. If I tried further 
to open one specific mtz file (with notepad icon) with the wordpad, all the mtz 
files will have the wordpad icon. 

  I hope these cluses can be helpful for you to give me the advise on recovery 
of mtz files.

  Smith



  On Thursday, March 5, 2015 11:51 PM, Robbie Joosten 
<robbie_joos...@hotmail.com> wrote:




  Hi Smith,

  If this is really the problem Ian describes, you can try the Linux programs 
unix2dos and dos2unix the change the line endings. A potential source of the 
problem might be copying the file with certain (S)FTP clients: in 'text-mode' 
they change the line endings to your OS default to be user friendly.

  Cheers,
  Robbie


  > -----Original Message-----
  > From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of
  > Ian Tickle
  > Sent: Thursday, March 05, 2015 14:04
  > To: CCP4BB@JISCMAIL.AC.UK
  > Subject: Re: [ccp4bb] how to recover my data
  > 
  > Hi Smith
  > 
  > 
  > I sympathise with your plight - I have had to do similar things in the past 
for
  > other people!  I think your most fruitful course of action would be to talk 
to
  > the technician who recovered your data because only he knows what he
  > actually did to recover it.
  > 
  > 
  > From your description of your recovery of the PDB file it looks to me like a
  > line terminator issue, i.e. was the original file created in Linux, Windows 
or
  > Mac?  This is relevant because the line terminators are different and it
  > sounds like the technician didn't simply copy the file, he changed the line
  > terminators.  If he did the same with the MTZ file thinking it was a text 
file
  > the additional line terminators would corrupt the binary data making it
  > impossible to read with any of the CCP4 MTZ utilities.  If you can 
understand
  > exactly what the technician did you may be able to reverse it and recover 
the
  > binary data.
  > 
  > 
  > Hope this helps!
  > 
  > 
  > Cheers
  > 
  > 
  > -- Ian
  > 
  > 
  > On 5 March 2015 at 05:36, Smith Lee <00000459ef8548d5-dmarc-
  > requ...@jiscmail.ac.uk> wrote:
  > 
  > 
  > 
  >     Dear All,
  > 
  >     Recently my computer hardware has been broken and all the data
  > has been recovered to movable hardware by technician. However I find the
  > recovered PDB file and the MTZcould not be openned by Coot. Then I open
  > the revovered PDB file by WordPad, and from WordPad I copied it to
  > notepad and save it as pdb file. I find the Coot can open the notepad saved
  > pdb file, thus my pdb files can be succesfully recovered from the hardware.
  > 
  >     But will you please tell me how to have Coot open my mtz file? After
  > data recovery by the technicial, the data size of the mtz file did not 
decrease,
  > thus I think there is a way to have it recovered.
  > 
  >     I have not noticed there were similar or identical posts as mine for
  > recovery data before in the CCP4 mail list.
  > 
  >     Thus I am looking forward to getting a reply from you on how to
  > recover my mtz file.
  > 
  > 
  >     Smith
  > 
  > 



Reply via email to