Re: identifying sparse files and get ride of them trick available?

Daniel Ouellet Fri, 09 Nov 2007 01:32:08 -0800

Any clue as to how to tackle this problem, or any trick around it?


I really do not understand the problem here. But you might be able to
detect sparse files compartaring the size vs the number of blocks it uses.

Without making a bit writing out of it. Let say that the problem is fornow a storage capacity problem on the destinations servers, a timing onein the extended transfer process and the additional bandwidth requiredat some of the destination point and the volumes of files. Let just saythat if it was syncing 100K files, it would be a piece of cake, but it'smuch bigger.

Just for example, a source file that is sparse badly, don't really haveallocated disk block yet, but when copy over, via scp, or rsync willactually use that space on the destination servers. All the servers areidentical (or suppose to be anyway) but what is happening is the copy ofthem are running out of space at time in the copy process. Like when itis copying them, it may easy use twice the amount of space in theprocess and sadly filling up the destinations then then the sync processstop making the distribution of the load unusable. I need to increasethe capacity yes, except that it will take me times to do so.

Sparse file for database example is a very good thing, but not foreverything however.

The problem is not the sparse file at the source. It sure can stay asis. It's just offset pointers anyway.

The problem is in the sync process between multiple servers using theInternet to sync them and the bandwidth waisted as well as the lack ofspace available at the destination. Plus because the copy is differentin size, then the sync process see it as different files and as suchwill copy them again.

Or it can be copy using -S with rsync, however this process will inflatethe file at the destination and run out of space during the process andmake them smaller at the end. Plus this obviously take a lots more timeand as such, the timely sync process that was good for a long time now,well... Let say, not reliable. Let say, sync without concern for sparseis done just in a few minutes, but then use lots more space on thedestination. Doing it with -S to address the capacity issue fix that,but then it takes a HUGE amount of time more and sadly there is uselesstransfer of null data cause from the sparse source empty space.

I can manage, I find ways to use ls -laR, or du -k and do diff's betweenthem and fine the files that are getting out of wack, replace them andthen continue, but this really is painful.

Obviously when the capacity will be there, it will be a none issue,however I am sadly not at that point yet and it will take me some time.


Not sure if that explain it any better, I hope so.

But I was looking if it was possible to identify these files in a moreefficient way.


If not, I will just deal with it.

It's just going to be painful for sometime that's all.

The issue is really in the transfer process and at the finaldestination. Not at the source.

I hope it make more sense explaining it this way, if not I apologistsfor the lack of better thinking at the moment in explaining it.


Best,

Daniel

Re: identifying sparse files and get ride of them trick available?

Reply via email to