Successful data recovery (was Re: File signatures??)

matt Fri, 30 Jun 2017 22:53:36 -0700

Max, all,

Thank you for the pointers and help. I'm pleased to say that I seem tohave recovered my data. Still remaining to be found would becustomizations I made to the standard report. But if what I have foundstands up to some checks against known bank balances, etc, then I won'tbe too far off of where I was a month ago.

What I have been trying to sift through (to recap a bit) is the resultof a recovery done with testdisk/photorec, which left a blizzard offiles and file fragments on a multi-terabyte harddrive. By and largethe filenames were lost (though not in all cases) and photorec uses alist of known file signatures to try to append the appropriate fileextension. This largely works, but also not always. Finally, if I hadknown of a definitive file signature *before* I started the recovery,that might have helped. But for text-oriented files (vs JPEGs, PDFs,executables, etc) that's not always reliable or available.

Fortunately, photorec seems to recognize XML and xml.gz formatted files.Diving head first into a pool I hadn't been in before, I came up withbash scripts (this is a Linux machine I'm working on) to do recursivesearches. Basically, I would open a terminal window and

  gedit ~/.bashrc

and add the following to the end:


function odsgrep(){
 term="$1"
 echo Start search : $term
 OIFS="$IFS"
 IFS=$'\n'
 for file in $(find . -name "*.ods"); do
    echo $file;

unzip -p "$file" content.xml | tidy -q -xml 2> /dev/null | grep -i-F "$term" > /dev/null;

    if [ $? -eq 0 ]; then
       echo FOUND FILE $file;
       echo $file;
    fi;
 done
 IFS="$OIFS"
 echo Finished search : $term
}

function mattpdfgrep(){
 term="$1"
 echo Start search : $term
 OIFS="$IFS"
 IFS=$'\n'
 for file in $(find . -name "*.pdf"); do
    #echo $file;

pdftotext -htmlmeta "$file" - | grep --with-filename --label="$file"--color -i -F "$term" ;

    if [ $? -eq 0 ]; then
      echo $file;
      pdfinfo $file;
    fi;
 done
 IFS="$OIFS"
 echo Finished search : $term
}

function mattxlsgrep(){
 term="$1"
 echo Start search : $term
 OIFS="$IFS"
 IFS=$'\n'
 for file in $(find . -name "*.xlsx"); do
    #echo $file;

xlsx2csv "$file" | grep --with-filename --label="$file" --color -i-F "$term" ;

    if [ $? -eq 0 ]; then
      echo $file;
    fi;
 done
 for file in $(find . -name "*.xls"); do
    #echo $file;

xls2csv "$file" | grep --with-filename --label="$file" --color -i -F"$term" ;

    if [ $? -eq 0 ]; then
      echo $file;
    fi;
 done
 IFS="$OIFS"
 echo Finished search : $term
}

function mattxmlgzgrep(){
 term="$1"
 echo Start search : $term
 OIFS="$IFS"
 IFS=$'\n'
 for file in $(find . -name "*.xml.gz"); do
    #echo $file;

gunzip -c "$file" | tidy -q -xml 2> /dev/null | grep -i -F "$term" >/dev/null;

    if [ $? -eq 0 ]; then
       echo FOUND FILE $file;
       #echo $file;
    fi;
 done
 IFS="$OIFS"
 echo Finished search : $term
}

function matttxtgrep(){
 term="$1"
 echo Start search : $term
 OIFS="$IFS"
 IFS=$'\n'
 for file in $(find . -name "*.txt"); do
    #echo $file;
    grep -i -F "$term" "$file"> /dev/null;
    if [ $? -eq 0 ]; then
       echo FOUND FILE $file;
       #echo $file;
    fi;
 done
 IFS="$OIFS"
 echo Finished search : $term
}

These custom commands (built from a 'net search that turned up a variantof the first one) allow for recursive file searches as well assubsequent unzipping and string search operations. Importantly, theyattempt to look inside of spreadsheets and pdfs which aren't otherwise"grep-able".

To find the data, I used the mattxmlgzgrep routine to search *backwards*in time for the following

  <ts:date>2017-06

It found no files, which was expected, since I had last worked on thisaccount in March or April, around US tax season. The next search for

  <ts:date>2017-05

also turned up nothing. But searching <ts:date>2017-04 turned up 1 hitand <ts:date>2017-03 turned up a large number. So even though thetimestamp on the file was dated as of the recovery, by searchingbackwards for entries I was able to narrow things down.

Examining the file in gnucash (it seemed to have been pulled in cleanly)showed all the categories, accounts, data, etc that I expected to see.

It would be great to find the files related to the standard reportcustomizations and I'll spend a little time trying to do that. Not surewhat would be a suitable "marker" yet but I think I have a candidate ortwo. But after that I need to find the other records that made up someof this workflow. Fortunately, they were all digital to begin with andI believe I still have access online.

Thanks again for anyone's help. If there's anything I can share inreturn, let me know.


Matt


On 2017-06-30 14:06, m...@hyre.net wrote:

Dear Matt:

The problem is that the recovery operation (using
Testdisk/Photorec) results in files and file fragments
that may or may not be correctly identified by file
extensions.


   It sounds like what you want is a magic number (file-format ID:
https://en.wikipedia.org/wiki/File_format#Magic_number) for .gnucash
files.  Looking at my file it appears that ``<gnc-v2'' starting at the
41st character in the file whould do it.  (I presumee the `2' in
``-v2'' is a version number, and could change at some future date, but
for now that's not a problem.)

   It would be nice if the recovery program lets you add to the
file-ID list, otherwise you're back to grep.  I hope that it
recognizes gzipped files (possible GNUCash files, compressed), but if
not you want to look for the first two characters = 0x1f 0x8b.  Of
course, then you'll have to unzip them to see whether they're really
what you want.  :-/

   Gurus:  Is this right?  For future-proofing, can we assume the
magic number will always be in position 41?  Is there an actual,
designated, magic number for GNUCash files somewhere?

   Hope this makes sense/helps...


       Best wishes,

           Max Hyre

_______________________________________________
gnucash-user mailing list
gnucash-user@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-user
-----
Please remember to CC this list on all your replies.
You can do this by using Reply-To-List or Reply-All.

Successful data recovery (was Re: File signatures??)

Reply via email to