An explanation for the kernel team: certain types of processes seem to freeze the entire user desktop for long periods of time (many seconds). I don't know all the factors that are involved, but programs that start up and immediately make a large number of I/O calls exhibit this behavior. One of these is the startup process of a typical apt frontend: on startup, it scans the apt database to build some additional information in memory.
For reasons that are unclear to me, this only occurs when the process is running as root, and only when it's run via "sudo" from an X session. I'm not 100% positive that this is a kernel bug, but I don't think it's an apt bug and I think we need help from someone who knows more about the kernel For your convenience, I've attached the three test programs I wrote to reproduce this behavior. They use the apt cache by default, but I imagine you can point them at any large cache and see the same behavior. It looks to me like the kernel isn't doing a very good job of allowing interactive processes to run while a newly started process is doing a lot of I/O. I'm not surprised by a little bit of jitter, but I've seen the X server become unresponsive for over 30 seconds when running the "test2" program I mocked up to demonstrate this problem. That seems wrong to me. On Fri, Mar 21, 2008 at 12:10:44PM -0400, Justin Pryzby <[EMAIL PROTECTED]> was heard to say: > Thanks for the analysis. Why does it only affect aptitude sometimes > (after an upgrade is aborted)? Is that due to some cache? Why isn't > apt/itude using mmap (?)? They *are* using mmap; I wrote the example without mmap to check whether the problem was specific to mmap or independent of it (e.g., not using mmap might -- I'm not up to date on how the kernel is organized -- eliminate some of the work the memory management subsystem has to do). I don't know whether caching effects are involved, but it wouldn't surprise me. OTOH, neither would finding out that caching effects aren't involved. The only pattern I can see is that I'm more likely to see a freeze if I've recently run a program that reads the package cache. That would suggest that maybe the problem is more likely to occur when the package cache is loaded into the system cache. That makes me wonder whether the sheer volume of requests for buffers is overloading the system somehow (but then why does it only happen with UID 0?). > Out of curiousity, what kernel and hardware are you using? Kernel 2.6.26-1-686 on a Fujitsu P7120. I haven't tried this with other kernel versions. > Apparently, all that's necessary is to loop around lseek with a > nonzero "offset". Ooh, that's right. Nice catch. I've attached a test3.c that does just this. I'm going to reassign this to the kernel+apt -- I'm not sure what's going on, but I don't think I have the expertise to track it down fully, and I think it *may* be some sort of kernel misbehavior. Daniel
#include <fcntl.h> #include <sys/mman.h> #include <sys/stat.h> #include <sys/types.h> #include <unistd.h> int main(int argc, char **argv) { int fd = open("/var/cache/apt/pkgcache.bin", O_RDONLY); int fd2 = open("/var/cache/apt/srcpkgcache.bin", O_RDONLY); struct stat buf1, buf2; char *where1, *where2; int i; int tmp = 0; unsigned int loc = 0; if(fstat(fd, &buf1) != 0) return -1; if(fstat(fd2, &buf2) != 0) return -1; where1 = (char*)mmap(NULL, buf1.st_size, PROT_READ, MAP_SHARED, fd, 0); where2 = (char*)mmap(NULL, buf2.st_size, PROT_READ, MAP_SHARED, fd, 0); if(where1 == NULL || where2 == NULL) return -1; for(i = 0; i < buf1.st_size; ++i) { loc = ((loc + 10) * (buf1.st_size - 1)) % buf1.st_size; tmp += where1[loc]; } for(i = 0; i < buf2.st_size; ++i) { loc = ((loc + 10) * (buf2.st_size - 1)) % buf2.st_size; tmp += where2[loc]; } return 0; }
#include <fcntl.h> #include <sys/mman.h> #include <sys/stat.h> #include <sys/types.h> #include <unistd.h> int main(int argc, char **argv) { int fd = open("/var/cache/apt/pkgcache.bin", O_RDONLY); int fd2 = open("/var/cache/apt/srcpkgcache.bin", O_RDONLY); struct stat buf1, buf2; int i; unsigned int loc = 0; int tmp = 0; char buf[1]; if(fstat(fd, &buf1) != 0) return -1; if(fstat(fd2, &buf2) != 0) return -1; for(i = 0; i < buf1.st_size; ++i) { loc = ((loc + 10) * (buf1.st_size - 1)) % buf1.st_size; lseek(fd, loc, SEEK_SET); read(fd, buf, 1); tmp += buf[0]; } for(i = 0; i < buf2.st_size; ++i) { loc = ((loc + 10) * (buf2.st_size - 1)) % buf2.st_size; lseek(fd, loc, SEEK_SET); read(fd, buf, 1); tmp += buf[0]; } return 0; }
#include <fcntl.h> #include <sys/mman.h> #include <sys/stat.h> #include <sys/types.h> #include <unistd.h> int main(int argc, char **argv) { int fd = open("/var/cache/apt/pkgcache.bin", O_RDONLY); int fd2 = open("/var/cache/apt/srcpkgcache.bin", O_RDONLY); struct stat buf1, buf2; int i; unsigned int loc = 0; if(fstat(fd, &buf1) != 0) return -1; if(fstat(fd2, &buf2) != 0) return -1; for(i = 0; i < buf1.st_size; ++i) { loc = ((loc + 10) * (buf1.st_size - 1)) % buf1.st_size; lseek(fd, loc, SEEK_SET); } for(i = 0; i < buf2.st_size; ++i) { loc = ((loc + 10) * (buf2.st_size - 1)) % buf2.st_size; lseek(fd, loc, SEEK_SET); } return 0; }