Re: [lopsa-tech] meta-data searching on medium+ filesystems

David Lang Tue, 12 Nov 2013 16:47:07 -0800

On Tue, 12 Nov 2013, berg...@merctech.com wrote:

I've got about 45TB (wasn't that once considered 'large'?) of data on
a GPFS filesystem.


I'm looking for efficient ways to find files, based on metadata.

Running "find / -ls" is not a good option anymore. :)

        I'd like to be able to query some kind of stored index of name,
        path, owner, size, modification timestamp, and ideally a checksum.

        I don't want to run desktop-oriented tools like updatedb or
        Nepomuk&Strigi, due to concerns about overhead.

        Real-time indexing is not a requirement; it's fine if metadata
        was scanned a fairly long intervals (weekly?) for updates to
        keep the impact on the filesystem lower.

        Regex queries would be great but not required.

        Statistical queries (a histogram of filesizes, etc.) would be
        great, but not required.

        I would like the ability to restrict some search paths (ie,
        don't index /human_resources/employee_complaints_by_name/)

Just thinking about the problem here. You are going to have to either dosomething like find periodically to update your indexes, or you are going tohave to hook into the filesystem code (*notify) to detect changes to thefilesystem.

the *notify approach adds overhead continually, while the periodic scan addsoverhead when it runs.

If you can do the periodic scan, then using something like updatedb should besomething to at least try (you may find it doesn't work for you, but you will atleast have a baseline to compare everything else against)

Once updatedb has done it's scan, I don't know what other overhead you would runinto when using it (unless it's particularly inefficient in storing themetadata.


David Lang

_______________________________________________
Tech mailing list
Tech@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/

Re: [lopsa-tech] meta-data searching on medium+ filesystems

Reply via email to