I've got about 45TB (wasn't that once considered 'large'?) of data on a GPFS filesystem.
I'm looking for efficient ways to find files, based on metadata. Running "find / -ls" is not a good option anymore. :) I'd like to be able to query some kind of stored index of name, path, owner, size, modification timestamp, and ideally a checksum. I don't want to run desktop-oriented tools like updatedb or Nepomuk&Strigi, due to concerns about overhead. Real-time indexing is not a requirement; it's fine if metadata was scanned a fairly long intervals (weekly?) for updates to keep the impact on the filesystem lower. Regex queries would be great but not required. Statistical queries (a histogram of filesizes, etc.) would be great, but not required. I would like the ability to restrict some search paths (ie, don't index /human_resources/employee_complaints_by_name/) Has anyone seen or used an enterprise-level tool for this kind of search? I read a paper about Spyglass[1] which looked great, but I can't find the software or evidence that it's actually in use. Thanks, Mark [1] https://www.usenix.org/legacy/events/fast09/tech/full_papers/leung/leung_html/ _______________________________________________ Tech mailing list Tech@lists.lopsa.org https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/