Hi, As a gentoo-user i frequently run the emerge sync command, which in turn does a rsync with the mainserver. The 'problem' is that the portage directory tree contains about 19.000 directories and 96.000 files. So building the filelist takes a pretty long time, because of the many disk accesses that are neccesary. On the server side the disk-io problem is probably less worse since after the first time the whole tree is cached in the OS disk cache. (but still a lot of cpu resources in all the syscalls i think)
My idea is to create a patch for something like a --cache option that will use a cached version of the filelist: This way instead of creating the filelist every time (100.000's of system calls, diskaccesses), we can now load the filelist in one instance. This is even more usefull for rsync-servers, that are usually read-only. (like the gentoo mirrors or kernel.org which always has a +100 load it seems ;) I see the following problem with this: The cache will become 'out of sync' if something manually changes the local files. So using the cache option wouldn't be recommended for users that don't know whats going on. However it can be enabled manually under the right cicumstances. Maybe it's even possible to do some extra checks on directory ctimes in the maindir or some other checks. -What are the opinions of other people on this list? -Would it be easy to implement, or would it give too much trouble? -What are the most likely problems i would run into when i try to implement this? -Any ideas on WHERE to store such a cache? (a magic hidden file in the directory that is being builded perhaps?) Thanks, Edwin -- //||\\ Edwin Eefting || || || DatuX, Linux solutions and innovations \\||// http://www.datux.nl Nieuw Amsterdamsestraat 40 7814 VA Emmen Tel. 0591-857037 Fax. 0591-633001 -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html