I've got about 45TB (wasn't that once considered 'large'?) of data on
a GPFS filesystem.

I'm looking for efficient ways to find files, based on metadata.

Running "find / -ls" is not a good option anymore. :)

        I'd like to be able to query some kind of stored index of name,
        path, owner, size, modification timestamp, and ideally a checksum.

        I don't want to run desktop-oriented tools like updatedb or
        Nepomuk&Strigi, due to concerns about overhead.

        Real-time indexing is not a requirement; it's fine if metadata
        was scanned a fairly long intervals (weekly?) for updates to
        keep the impact on the filesystem lower.

        Regex queries would be great but not required.

        Statistical queries (a histogram of filesizes, etc.) would be
        great, but not required.

        I would like the ability to restrict some search paths (ie,
        don't index /human_resources/employee_complaints_by_name/)

Has anyone seen or used an enterprise-level tool for this kind of search?

I read a paper about Spyglass[1] which looked great, but I can't
find the software or evidence that it's actually in use.

Thanks,

Mark

[1] 
https://www.usenix.org/legacy/events/fast09/tech/full_papers/leung/leung_html/


_______________________________________________
Tech mailing list
Tech@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to