On Mon, Jan 05, 2004 at 02:46:52PM -0800, Wayne Davison wrote: > On Mon, Jan 05, 2004 at 01:12:16PM -0800, jw schultz wrote: > > - init_hard_links() sets nlinks in file_struct. > > I assume you mean by counting adjacent runs of identical inodes after > the sort is done (while eliminating single-item runs from the list at
Yes. That is what i meant. > the same time). I'd like to avoid adding the nlinks variable to the > per-file structure that would add more memory when --hard-link wasn't > used, though. One way to do this and to cut down on all the adjacent- > entry comparisons would be to create a parallel array (after the sort) > of counts back to the first entry. This way, whenever we have a match, > we know exactly how many items back the first item is in the array. It > would also let you scan to the end of each grouping by stopping when you > get down to an item with a 0 offset (which would require an extra item > at the end of the list for a terminating 0). I don't like that idea at all. It puts us back into the business of doing a link lookup for every regular file. > > - change the return value of check_hard_link() from int to > > file_struct** so it returns the pointer to the first entry > > in the hlink_list array that matches, or NULL if false. > > Just to be sure we're on the same page: we need three states: we need > to know "master linked" (with pointer to first item) "slave linked" (so > we can skip it until the end-processing) and "not linked". If the first > two states return a pointer to the first item, we can differentiate them > by checking if the returned value is "us" or not. > > > - basis_path() now can check nlinks and if !0 iterate over the paths > > provided by the list returned from check_hard_link() looking for the > > first that exists in dest or compare_dest. > > This seems like a good way to go to me. Your comments have promted a bit more thought and i'm thinking we might eliminate check_hard_link(). If we walk the hlink list after sorting we have enough info to push the data back up to the file_list and eliminate searching altogether. One way of doing that would be to have struct hlinks { int count; /* Count of entries in links list */ file_struct **link_list; /* array of links */ } Put a pointer to this in the file_struct, NULL means no links. Put into perspective, this reduces hlink_list by 68bytes per entry, eliminates non-linked entries from hlink_list, adds 4 bytes per file in file_struct and adds another 8 bytes per link set. In sum, the worst case (every file has two links and no !IS_REG) this saves 60 bytes per file and eliminates the binary searches. If you really want to save some per-file space when not preserving links (hand written): struct file_struct { - INO64_T inode; - /** Device this file lives upon */ - DEV64_T dev; + hlink_info *links } union links { struct idev { INO64_T inode; DEV64_T dev; } struct hlinks { int count; file_struct **link_list; } } Or use casts with file_struct.links being void so access would use link_set = (hlinks)(file->links). Once we have identified the link sets we no longer have any use for the device and inode numbers so we overwrite it with struct hlinks. We can free file->links if nlinks == 1. Don't even populate file->links if we aren't linking or if !IS_REG. Also no point populating file->links if dev and inode are NULL; Freeing a 16byte structure interleaved with file_struct and string space is unlikely to be much benefit so overwriting the union may be better than a free+malloc pair. This saves another 12bytes per file when no -H and when we do -H it adds 4 bytes/file which should be offset by the savings on non-linked files. -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: [EMAIL PROTECTED] Remember Cernan and Schmitt -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html