I'm trying to get my Linux-based NTFS backup drive to pass a CHKDSK and came 
upon this curious situation where CHKDSK finds errors.

It seems to be some issue with how ntfs-3g modifies a directory index when 
renaming many files.

The CHKDSK error always seems to be of the form:
Stage 2: Examining file name linkage ...
The first free byte, 0xc0, and bytes available, 0x150, for root index $I30 in 
file 0x40 are not equal.

I've attached a python script (mkbaddir.py) that creates two (apparently) 
identical directories, one of which reliably causes this CHKDSK error; the 
other doesn't.

How to demonstrate:
        - Format an NTFS partition or thumbdrive using Windows or mkfs.ntfs.
        - Mount the partition on a Linux system.
                I used Mint 20 with ntfs-3g 2017.3.23AR.3 integrated FUSE 28 and
                python 3.8.2.
        - Chdir to the new NTFS partition and run the script:
                /tmp/mkbaddir.py                # creates 'baddir' in current 
dir.
                /tmp/mkbaddir.py -G     # creates 'gooddir' in current dir.
                diff -r baddir gooddir          # no difference
                du -sB1 baddir gooddir  # same size (128K)
        - Boot into Windows (10 v1903) and run (from a terminal) chkdsk X:  
(where X: is the NTFS drive).
                - This will say:
                "Errors found.  CHKDSK cannot continue in read-only mode."
        - Delete baddir (I used cygwin's rm -rf), and run chkdsk X: again.
                - This will now have no errors.

My guess at what's happening:  
The script creates a directory of 410 empty files and then renames them with 
slightly larger names, which as I understand leaves a bunch of unused nodes in 
the b-tree.  The -G option just renames the 410 known files; without the -G 
option, it uses os.walk() to traverse the directory which I'm guessing leaves 
the b-tree in a slightly different state with even more unused nodes.

The 410 was chosen by trial-and-error so that some internal threshhold is just 
exceeded by the baddir but not by the gooddir.   With more than 410 (using the 
-c option; say -c 500), both baddir and gooddir will cause CHKDSK errors.

If I run the script on Windows/cygwin (Python 3.6.9) to create the folders, it 
does not give any CHKDSK errors even with many more files.

So there seems to be some issue with how ntfs-3g modifies the b-tree when 
renaming many files that is causing CHKDSK to complain.


I encountered this issue when trying to get my Linux-based NTFS backup drive to 
consistently pass a CHKDSK.  I use a script to first rename POSIX names to 
valid windows names, replacing '?' with '@@3F', etc so I can reverse the 
renaming afterwards.  I have some website mirror folders with many files of the 
form:  
        details.asp?id=xxxxx&key=val
which gave rise to this issue.   (In the mkbaddir script I use only 
alphanumeric names to be clear this is not an illegal char issue).



--------------------- mkbaddir.py 
------------------------------------------------
#!/usr/bin/python3
import os, re, argparse
def mkname(i): return "detaildetaildetail-N-%04d" % i

parser = argparse.ArgumentParser()
parser.add_argument('-G', '--good', action='store_true')
parser.add_argument( '-c', '--count', action='store', default=410)
opts = parser.parse_args()
count = int(opts.count)
if opts.good: dirname = 'gooddir'
else: dirname = 'baddir'

# Create a dir of files 
os.mkdir(dirname)
for i in range(count):
    f = dirname + '/' + mkname(i)
    open(f,'a').close()   # touch

# rename them
if opts.good:
    for i in range(count):
        f = mkname(i)
        nf = re.sub('N', "KK3F", f)
        os.rename(dirname+'/'+f, dirname+'/'+nf)
else:
    for dirpath, dirs, files in os.walk(dirname, topdown=False):
        for f in files:
            nf = re.sub('N', "KK3F", f)
            os.rename(dirname+'/'+f, dirname+'/'+nf)
#!/usr/bin/python3
import os, re, argparse
def mkname(i): return "detaildetaildetail-N-%04d" % i

parser = argparse.ArgumentParser()
parser.add_argument('-G', '--good', action='store_true')
parser.add_argument( '-c', '--count', action='store', default=410)
opts = parser.parse_args()
count = int(opts.count)
if opts.good: dirname = 'gooddir'
else: dirname = 'baddir'

# Create a dir of files 
os.mkdir(dirname)
for i in range(count):
    f = dirname + '/' + mkname(i)
    open(f,'a').close()   # touch

# rename them
if opts.good:
    for i in range(count):
        f = mkname(i)
        nf = re.sub('N', "KK3F", f)
        os.rename(dirname+'/'+f, dirname+'/'+nf)
else:
    for dirpath, dirs, files in os.walk(dirname, topdown=False):
        for f in files:
            nf = re.sub('N', "KK3F", f)
            os.rename(dirname+'/'+f, dirname+'/'+nf)
_______________________________________________
ntfs-3g-devel mailing list
ntfs-3g-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel

Reply via email to