On Wed, Dec 17, 2003 at 03:43:40AM -0800, [EMAIL PROTECTED] wrote: > jw schultz writes: > > > On Tue, Dec 16, 2003 at 03:18:15PM -0600, John Van Essen wrote: > > > On Mon, 15 Dec 2003, jw schultz <[EMAIL PROTECTED]> wrote: > > > > Hard-link handling > > > > > > > > At the moment hardlink handling is very expensive, so it's off by > > > > default. It does not need to be so. > > > > > > > > Since most of the solutions are rather intertwined with the file > > > > list it is probably better to fix that first, although fixing > > > > hardlinks is possibly simpler. > > > > > > > > We can rule out hardlinked directories since they will probably > > > > screw us up in all kinds of ways. They simply should not be used. > > > > > > > > At the moment rsync only cares about hardlinks to regular files. I > > > > guess you could also use them for sockets, devices and other beasts, > > > > but I have not seen them. > > > > > > > > When trying to reproduce hard links, we only need to worry about > > > > files that have more than one name (nlinks>1 && !S_ISDIR). > > > > > > It would be very helpful if file_struct.flags could have a bit set to > > > indicate that the node count was greater than 1. This info could be > > > used later to optimize the hardlink search by only considering those > > > flist entries with this flag bit set. > > > > > > It'd be nice to implement this bit setting in this protocol number so > > > it can be widely distributed before 2.6.1 is released which could have > > > the code to actually make use of it. I'd be interested in doing the > > > later changes, but if Martin or jw could at least get the bit set... > > > It doesn't even have to be --hwlink option dependent. Just examine > > > the node count and set the bit. > > > > I'm not keen on squeezing that in at this time. Lets get it > > out the door, hardlink performance improvements can be made > > in a minor release. I'm also a bit more inclined to pass > > nlinks (IFF non-zero and ~IS_DIR). > > The nlinks > 1 optimization would be a good one to add, but after > the next release. > > For hardlinks it would be great to only send the device and inode > information in the file list IFF nlinks > 1 and ~IS_DIR. Currently > --hard-links sends device and inode for every file. This causes a > lot of unnecessary data to be sent, and also means the receiver has > to store and search inode information for every file, rather than > just candidate hardlinks. > > Unfortunately all the bits in the flag byte in the file list are used, > so we need to figure out some other way to indicate which files include > (dev,inode) data. > > The goal would be to make --hard-links have little impact on network > traffic, memory and speed. > > However, this would require a protocol bump.
It might be time to increase the size of flags to 16 bits; protocol dependant of course. We'll need more bits anyway if we ever add ACLs and EAs. We'd want to use at least two bits (rough approximation) in send_file_entry(): if (protocol_version >= 28 && !S_ISDIR(file->mode) && preserve_hard_links && file->st_nlink > 1) { flags |= HARD_LINKED; if (file->st_dev == last_dev) flags |= SAME_DEV; } ... if (protocol_version < 28 && preserve_hard_links && S_ISREG(file->mode)) { if (protocol_version < 26) { /* 32-bit dev_t and ino_t */ write_int(f, (int) file->dev); write_int(f, (int) file->inode); } else { /* 64-bit dev_t and ino_t */ write_longint(f, file->dev); write_longint(f, file->inode); } } else if (flags & HARD_LINKED) { if (!(flags & SAME_DEV)) write_longint(f, file->dev); write_longint(f, file->inode); } and in recv_file_entry (even rougher): if (flags & HARD_LINKED) { if (!(flags & SAME_DEV)) last_dev = read_longint(f); file->dev = last_dev; file->inode = read_longint(f); } else { file->dev = file->inode = 0; } Using 0 for inode to indicate no hardlinks also fixes the problem of erroneously trying to preserve links from filesystems that not supporting inode numbers report 0 for all inodes as was recently reported. -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: [EMAIL PROTECTED] Remember Cernan and Schmitt -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html