Joe Shaw wrote:
Hi Prakash,

Thanks for your reply!

On Thu, 2006-11-16 at 12:15 -0800, Prakash Sangappa wrote:
As it has been previously discussed, watching for file events on an entire
filesystem or a directory tree can be a scalability issue. So the File events API may not be suitable for an application like the desktop search.

I beg to differ, as we're already using inotify pretty effectively on
Linux for just this purpose.

It's important to maintain a sense of scope.  I fear that we as software
developers try to find a perfect general solution at the expense of
real, more limited use cases.

In the previous thread I recall someone saying that this won't scale on
a 10 TB filesystem, and that person is probably right.  However, I use
Beagle on my 120 GB home directory and there definitely isn't a torrent
of activity most of the time.

Beagle is a bit of a pathological case, though; a more realistic case
would be GNOME VFS, which monitors directories based on folders that are
open in the Nautilus file manager.

I understand.  You can certainly use the file events API to do that.
But this may not be the efficient way.

For a desktop search system like Beagle or spotlight, it appears that a better and an useful method would be for the filesystem to provide an interface using which we could efficiently collect all the changes
that have occurred since some given time.

This would be an incredibly useful feature -- the initial crawl of the
filesystem in Beagle is expensive and painful -- but it's only truly
useful when used in tandem with a file notification system.  Otherwise
we're still stuck in the world of polling, just with a nicer API.
Beagle will be forced to run that files_changed_since_time() function
once a second in a loop to pick up changes.  Is that really more
efficient?
We can certainly have a notification type added under the file events API for these
file system level changes once it is defined and implemented.
The approach we are taking for the file events notification API, is to address the needs where applications have to repeatedly stat files/directories for changes. Now using the file events API, applications have to just wait for file events that are sent when a file or directory status changes.

This is essentially what Beagle wants too, just for all the files under
your home directory. :)

File events interface:

Event types:
      * FILE_ACCESS          /* Monitored file/directory was accessed */
      * FILE_MODIFIED        /* Monitored file/directory was modified */
* FILE_ATTRIB /* Monitored file/directory's ATTRIB was changed */


Exception events:
      * FILE_DELETE       /* Monitored file/directory was deleted */
      * FILE_RENAME_TO    /* Monitored file/directory was renamed */
      * FILE_RENAME_FROM  /* Monitored file/directory was renamed */
      * UNMOUNTED         /* Monitored file system got unmounted */

How is file creation handled?  It seems like you would need an event
analogous to FILE_DELETE here to notice any newly added files.

The file creation will result in a FILE_MODIFIED event on the directory. So,
the application need to be watching the directory.
The application can only watch the following events. The exception events
are reported as they occur. They don't have to be watched for.

FILE_ACCESS,
FILE_MODIFIED,
FILE_ATTRIB.

typedef struct file_obj {
        timestruc_t     atime;          /* Access time got from stat(2) */
        timestruc_t     mtime;          /* Modification time from stat(2) */
        timestruc_t     ctime;          /* Change time from stat(2) */
        char            *name;          /* Null terminated file name */
} file_obj_t;

Does watching a directory imply that all the contained files are
watched?  That is, if I watch /home/joe, will touching /home/joe/foo
cause a FILE_MODIFIED event to be thrown for /home/joe/foo?  The two use
cases I can think of (Beagle and gnome-vfs) are more interested in
watching all the files in a directory rather than individual files.
(Although there are certainly use cases for individual files as well.)

No, not with the current implementation. The FILE_MODIFIED event on the directory only represents creates, deletes. renames of files under that directory. i.e operations that would update the modification time of the directory, otherwise this would be inconsistent
with respect to directory modification definition.

Also, a file could be linked under many directories. To be correct, every time a file gets modified, all its parent directories that are being watched will have to be notified. In order to do that, we will need to have a file -> directories mapping, which is generally not available. I know this
is an implementation detail.

To activate  monitoring(watching) a file, it needs to be registered
Upon delivering an event, the file monitor is disabled. It needs to be
re-registered again to reactivate the monitor and receive further events.

Why is this?  Usually apps that monitor a file want to do it on their
own terms, because they have the state they need to determine when to
watch a file or not.  This constraint seems like it will just be a
burden on programmers.

This behavior aids proper multi threaded programing. The overhead
is just to re-register which will enable the file monitor. Note that with the current approach events don't get queued up. The queuing issues have been discussed before. The main aim is to keep the kernel implementation simple with out exposing the system to scalability problems that can have a potential for denial of service attacks(DOS).

-Prakash.
Joe


_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to