On Sun, 2008-12-07 at 15:39 +0000, Mick wrote: > On Friday 05 December 2008, Albert Hopkins wrote: > > On Thu, 2008-12-04 at 07:10 +0000, Mick wrote: > > > Almost every time I split a large file >1G into say 200k chunks, then ftp > > > it to a server and then: > > > > That's thousands of files! Have you gone mad?! > > Ha! small error in units . . . it is 200M (of course this is no disclaimer of > me going/gone mad . . .) I think the server drops the connection above 230M > file uploads or something like that, so I tried 200M files and it seems to > work. > > > > cat 1 2 3 4 5 6 7 > completefile ; md5sum -c completefile > > > > > > if fails. Checking the split files in turn I often find 1 or two chunks > > > that fail on their own md5 checks. Despite that the concatenated file > > > often works (e.g. if it is a video file it'll play alright). > > > > Let me understand this. Are [1..7] the split files or the checksums of > > the split files? > > They are the the split files which I concatenate into the complete file.
Well, unless you made another error in your OP, you are using md5sum incorrectly. When you use "-c", md5sum expects a file that is a list of files/checksums. For example $ dd if=/dev/urandom of=bigfile bs=1M count=5 5+0 records in 5+0 records out 5242880 bytes (5.2 MB) copied, 2.29361 s, 2.3 MB/s $ md5sum bigfile > checksum # create checksum file $ split -b1M bigfile $ rm bigfile $ cat xa* > bigfile $ # This is correct $ md5sum -c checksum bigfile: OK $ # This is wrong! $ md5sum -c bigfile md5sum: bigfile: no properly formatted MD5 checksum lines found [SNIP!] > > Maybe if you give the exact commands used I might understand this > > better. > > > > I have a feeling that this is not the most efficient method of file > > transfer. > > split --verbose -b 20000000 big_file > > tnftp -r 45 -u > ftp://<username>:<passwd>@<server_name>/htdocs/<directory_path>/ xaa xab xac > xad . . . > > The above would fail after xaa was uploaded and about 1/3 or less of xab. > So, > I split up the individual file upload: > > tnftp -r 45 -u > ftp://<username>:<passwd>@<server_name>/htdocs/<directory_path>/ xaa ; sleep > 1m ; tnftp -r 45 -u > ftp://<username>:<passwd>@<server_name>/htdocs/<directory_path>/ xab ; > sleep ... ; etc. > > Does this make sense? Yes, but if you are truly using "-c" then it would make sense that you could get a checksum error but the file be ok. Here's how I would do it. I'm not saying you should do it this way. I'd use rsync. Rsync does file xfer has checksumming built-in. You say you split because you get disconnected, right? I'm not sure if rsync handles re-connects, but you can write a loop so that if rsync fails you continue where you left off: status=30 until [ $status -eq 0 ] ; do rsync --append-verify big_file server_name:/htdocs/<directory_path>/ status=$? done No splitting/concatenating and no need to checksum.