On Sun, 2008-12-07 at 15:39 +0000, Mick wrote:
> On Friday 05 December 2008, Albert Hopkins wrote:
> > On Thu, 2008-12-04 at 07:10 +0000, Mick wrote:
> > > Almost every time I split a large file >1G into say 200k chunks, then ftp
> > > it to a server and then:
> >
> > That's thousands of files!  Have you gone mad?!
> 
> Ha! small error in units . . . it is 200M (of course this is no disclaimer of 
> me going/gone mad . . .)  I think the server drops the connection above 230M 
> file uploads or something like that, so I tried 200M files and it seems to 
> work.
> 
> > >  cat 1 2 3 4 5 6 7 > completefile ; md5sum -c completefile
> > >
> > > if fails.  Checking the split files in turn I often find 1 or two chunks
> > > that fail on their own md5 checks.  Despite that the concatenated file
> > > often works (e.g. if it is a video file it'll play alright).
> >
> > Let me understand this. Are [1..7] the split files or the checksums of
> > the split files?  
> 
> They are the the split files which I concatenate into the complete file.

Well, unless you made another error in your OP, you are using md5sum
incorrectly.  When you use "-c", md5sum expects a file that is a list of
files/checksums. For example

        $ dd if=/dev/urandom of=bigfile bs=1M count=5
        5+0 records in
        5+0 records out
        5242880 bytes (5.2 MB) copied, 2.29361 s, 2.3 MB/s
        $ md5sum bigfile > checksum # create checksum file
        $ split -b1M bigfile 
        $ rm bigfile 
        $ cat xa* > bigfile
        $ # This is correct
        $ md5sum -c checksum 
        bigfile: OK
        $ # This is wrong!
        $ md5sum -c bigfile 
        md5sum: bigfile: no properly formatted MD5 checksum lines found

[SNIP!]
> > Maybe if you give the exact commands used I might understand this
> > better.
> >
> > I have a feeling that this is not the most efficient method of file
> > transfer.
> 
> split --verbose -b 20000000 big_file
> 
> tnftp -r 45 -u 
> ftp://<username>:<passwd>@<server_name>/htdocs/<directory_path>/ xaa xab xac 
> xad . . .
> 
> The above would fail after xaa was uploaded and about 1/3 or less of xab.  
> So, 
> I split up the individual file upload:
> 
> tnftp -r 45 -u 
> ftp://<username>:<passwd>@<server_name>/htdocs/<directory_path>/ xaa ; sleep 
> 1m ; tnftp -r 45 -u 
> ftp://<username>:<passwd>@<server_name>/htdocs/<directory_path>/ xab ; 
> sleep ... ; etc.
> 
> Does this make sense?

Yes, but if you are truly using "-c" then it would make sense that you
could get a checksum error but the file be ok.

Here's how I would do it.  I'm not saying you should do it this way.
I'd use rsync.  Rsync does file xfer has checksumming built-in.  You say
you split because you get disconnected, right?  I'm not sure if rsync
handles re-connects, but you can write a loop so that if rsync fails you
continue where you left off:

status=30
until [ $status -eq 0 ] ;
do
    rsync --append-verify big_file server_name:/htdocs/<directory_path>/
    status=$?
done

No splitting/concatenating and no need to checksum.



Reply via email to