I need to mantain a filesystem where I'll keep only the most recently used (MRU) files; least recently used ones (LRU) have to be removed to leave space for newer ones. The filesystem in question is a clustered fs (glusterfs) which is very slow on "find" operations. To add complexity there are more than 10^6 files in 2 levels: 16³ dirs with equally distributed number of files inside.
My first idea was to "os.walk" the filesystem, find oldest files and remove them until I reach the threshold. But find proves to be too slow. My second thought was to run find -atime several times to remove the oldest ones, and repeat the process with most recent atime until threshold is reached. Again, this needs several walks through the fs. Then I thought about tmpwatch, but it needs, as find, a date to start removing. The ideal way is to keep a sorted list if files by atime, probably in a cache, something like updatedb. This list could be also be built based only on the diratime of the first level of dirs, seek them in order and so on, but it still seems expensive to get his first level of dir sorted. Any suggestions of how to do it effectively? -- http://mail.python.org/mailman/listinfo/python-list