Any clue as to how to tackle this problem, or any trick around it?
I really do not understand the problem here. But you might be able to
detect sparse files compartaring the size vs the number of blocks it uses.
Without making a bit writing out of it. Let say that the problem is for
now a storage capacity problem on the destinations servers, a timing one
in the extended transfer process and the additional bandwidth required
at some of the destination point and the volumes of files. Let just say
that if it was syncing 100K files, it would be a piece of cake, but it's
much bigger.
Just for example, a source file that is sparse badly, don't really have
allocated disk block yet, but when copy over, via scp, or rsync will
actually use that space on the destination servers. All the servers are
identical (or suppose to be anyway) but what is happening is the copy of
them are running out of space at time in the copy process. Like when it
is copying them, it may easy use twice the amount of space in the
process and sadly filling up the destinations then then the sync process
stop making the distribution of the load unusable. I need to increase
the capacity yes, except that it will take me times to do so.
Sparse file for database example is a very good thing, but not for
everything however.
The problem is not the sparse file at the source. It sure can stay as
is. It's just offset pointers anyway.
The problem is in the sync process between multiple servers using the
Internet to sync them and the bandwidth waisted as well as the lack of
space available at the destination. Plus because the copy is different
in size, then the sync process see it as different files and as such
will copy them again.
Or it can be copy using -S with rsync, however this process will inflate
the file at the destination and run out of space during the process and
make them smaller at the end. Plus this obviously take a lots more time
and as such, the timely sync process that was good for a long time now,
well... Let say, not reliable. Let say, sync without concern for sparse
is done just in a few minutes, but then use lots more space on the
destination. Doing it with -S to address the capacity issue fix that,
but then it takes a HUGE amount of time more and sadly there is useless
transfer of null data cause from the sparse source empty space.
I can manage, I find ways to use ls -laR, or du -k and do diff's between
them and fine the files that are getting out of wack, replace them and
then continue, but this really is painful.
Obviously when the capacity will be there, it will be a none issue,
however I am sadly not at that point yet and it will take me some time.
Not sure if that explain it any better, I hope so.
But I was looking if it was possible to identify these files in a more
efficient way.
If not, I will just deal with it.
It's just going to be painful for sometime that's all.
The issue is really in the transfer process and at the final
destination. Not at the source.
I hope it make more sense explaining it this way, if not I apologists
for the lack of better thinking at the moment in explaining it.
Best,
Daniel