Kevin Day wrote:
Shachar-
I think Wayne is mostly pointing you to the correct location here. If
you look at the code where the checksum is computer, you'll find that
the rolling checksum actually consists of two parts - one is the sum
of all of the byte values in the window, the other is the
Shachar-
I think Wayne is mostly
pointing you to the correct location here.
If you look at the code where the checksum
is computer, you'll find that the rolling
checksum actually consists of two parts -
one is the sum of all of the byte values
in the window, the other is the offset weig
On Sat, Mar 05, 2005 at 10:58:12PM +0200, Shachar Shemesh wrote:
> What about my proposal, though? Should I send in a patch so it can be
> evaluated (hoping I'll manage to find my way around, that is).
I'm always open potential patches. If we can make things faster without
taking up large chunks
Wayne Davison wrote:
But how will you find it there? If you are going to have 740K blocks
(i.e. - 740,000 strong hashes) in a 16bit hash table, you are going to
have lots of collisions there (190 per bucket, on average), and you
gained nothing.
You'll note that a no-match bitmap has no effe
On Sat, Mar 05, 2005 at 08:07:20PM +0200, Shachar Shemesh wrote:
> Definitely not [block-size]! I was talking about the hash table load.
Ah yes, I can see that I misread what the "it" was referring to. Thanks
for the clarification.
> But [the bitmap lookup] only works if the checksum function a
> Wayne Davison wrote:
> >He's talking about potentially losing an even distribution of values
> >if the lowest order bits aren't random enough.
On Sat, Mar 05, 2005 at 08:10:37PM +0200, Shachar Shemesh wrote:
> Even for N which is not a power of 2?
In the case of the weak checksum, it is compute
Wayne Davison wrote:
On Sat, Mar 05, 2005 at 02:18:01PM +0200, Shachar Shemesh wrote:
However, if you choose to start with the 32 bit rolling hash and mod
that, you will have problems. The rolling checksum has two distinct
parts, and modding will only pull info from the low order bits,
Wayne Davison wrote:
On Thu, Mar 03, 2005 at 10:18:01AM +0200, Shachar Shemesh wrote:
And I'm suggesting making it static, by adjusting the hash table's
size according to the number of blocks.
The block-size?
Definitely not! I was talking about the hash table load. I.e. - the
ratio bet
[I accidentally sent this reply only to Lars instead of the list.]
On Fri, Mar 04, 2005 at 06:44:48PM +0100, Lars Karlslund wrote:
> I read somewhere about the default blocksize of 700 bytes. Now I'm
> told the blocksize is calculated automatically. Which is it?
700 is the minimum block-size. Th
On Thu, Mar 03, 2005 at 10:18:01AM +0200, Shachar Shemesh wrote:
> And I'm suggesting making it static, by adjusting the hash table's
> size according to the number of blocks.
The block-size? No thanks. The smaller the block-size for a particular
file size, the more checksum data needs to be gen
On Sat, Mar 05, 2005 at 02:18:01PM +0200, Shachar Shemesh wrote:
> That's not what I read into it. It seems to me that the checksum
> function gives a 32bit result, and we are squashing that into a 16bit
> hash table. Can you point me to the code? Wayne?
Look in match.c. The hash table is "tag_
Kevin Day wrote:
As a quick FYI, the block size absolutely has an impact on the
effectiveness of the checksum table - larger blocks means fewer
blocks, which means fewer hash colissions.
Since you wouldn't expect that many blocks in the file, a 32bit weak
checksum would only produce about 4 or 5
Lars Karlslund wrote:
And I'm suggesting making it static, by adjusting the hash table's size
according to the number of blocks. Just do
"hashtablesize=(numblocks/8+1)*10;", and you should be set.
Or maybe it should really be dynamic.
I'm talking about the hash table load. I.e. - the ratio b
Shachar-
I agree with what you are
saying about trying to identify opportunities
to improve the algorithm - sorry
it seemd like a troll - that was certainly
not my intention. And certainly, my
apologies for getting the identities mixed
up!
As a quick FYI, the block
size absolutely ha
Hi,
On tor, 2005-03-03 at 10:18 +0200, Shachar Shemesh wrote:
Thanks for all the attention you are giving my problem. I'm really glad you all seem so interested in this problem.
My lack of responses is the combination of having lots of work to do this last week, and also a realisation of whe
Kevin Day wrote:
Shachar-
True enough - with one additional thought - if the block size is set
to be the square root of the file size, then the load factor on the
hash table becomes dynamic in and of itself (bigger block size = less
master table entries = fewer hash collisions).
And I'm sugges
Shachar-
True enough - with one
additional thought - if the block size is
set to be the square root of the file size,
then the load factor on the hash table becomes
dynamic in and of itself (bigger block size
= less master table entries = fewer hash
collisions).
In the case of relative
Kevin Day wrote:
I would *strongly* recommend that you dig into the thesis a bit (just
the section that describes the rsync algorithm itself).
I tried a few weeks ago. I started to print it, and my printer ran out
of ink :-). I will read it electronically eventually (I hope).
Now, if you have hu
Lars Karlslund writes:
> Also the numbers speak for themselves, as the --whole-file option is
> *way* faster than the block-copy method on our setup.
At the risk of jumping into the middle of this thread without
remembering everything that was discussed...
Remember that by default rsync writes a
On Mon, Feb 28, 2005 at 08:33:52PM +0200, Shachar Shemesh wrote:
> If so, we can probably make it much much (much much much) more
> efficient by using a hash table instead.
That's what "tag_table" is -- it's an array of 65536 pointers into
the checksum data sorted by weak checksum. The code then
Shachar,It does use a hash table.
rsync adds the two components of the
rolling checksum together to come up with
a 16 bit hash, and performs a table lookup.
The table contains an offset into the
sorted rsync checksum table (which contains
both weak and strong checksums), and does
a line
Wayne Davison wrote:
However, you should be sure to have measured what is causing the
slowdown first to know how much that will help. If it is not memory
that is swapping on the sender, it may be that the computing of the
checksums in maxing out your CPU, and removing the caching of the
remote che
Shachar Shemesh wrote:
No, because the rsync algorithm can detect single byte moves of this
700 bytes block.
I will just mention that I opened the ultimate documentation for rsync
(the source), and it says that the default block size is the rounded
square root of the file's size. This means that
Lars Karlslund wrote:
Maybe I didn't express myself thoroughly enough :-)
Or me.
Yes, a block is a minimum storage unit, which is considered for transfer.
In size, yes. Not in position.
But it's a fact that the rsync algorithm as it is now checks to see if
a block should have moved. And in that c
On man, 2005-02-28 at 12:23 +0200, Shachar Shemesh wrote:
> Also as far as I could read, the default block size is 700 bytes? What
> kind of application would default to moving data around 700 bytes at a
> time internally in a file? I'm not criticizing rsync, merely
> questioning the funct
Lars Karlslund wrote:
Also as far as I could read, the default block size is 700 bytes? What
kind of application would default to moving data around 700 bytes at a
time internally in a file? I'm not criticizing rsync, merely
questioning the functionality of this feature.
I believe you may have m
Hi everyone,
Thank you for your replies.
On tor, 2005-02-24 at 17:59 +0100, Paul Slootman wrote:
> It would certainly be possible to change the algorithm to not cache the
> data (and thus only allow the current block to be compared), but I don't
> think that idea has general enough interest
On Thu 24 Feb 2005, Wayne Davison wrote:
>
> It would certainly be possible to change the algorithm to not cache the
> data (and thus only allow the current block to be compared), but I don't
> think that idea has general enough interest for me to work on for
> inclusion in rsync. You might want
On Thu, Feb 24, 2005 at 02:01:59PM +0100, Lars Karlslund wrote:
> As I understand it, the way rsync works is:
> - remote server calculates all checksum blocks and transmits these to client
> - local client calculates all checksum blocks
> - local client compares local blocks to remote blocks, check
Hello everyone,
I have a question regarding syncing of large files. We have a setup where
we need to sync a 500GB file with changes in random blocks.
The changes are made in a manner, so moving blocks is not probable
(actually it will never happen).
As I understand it, the way rsync works is:
-
30 matches
Mail list logo