Darren Kenny wrote:
Hi Prakash,

I don't think it's the implementation that bridges between the kernel and the user spaces that's important to JDS, and probably most other people - it's the ultimate API that people will have to write to, and I think from this perspective that sysevents is not what we want - it may be part of the implementation of the final API, but it shouldn't really be the main API that developers have to write to. At the moment there are two projects, both with the same ABI (AFAIK), that are prevalent in the Linux developer community:

I understand. It is the API. We should be implementing to support some standard API for file events notification. Unfortunately there is no standard API nor what the requirements are. Hence the discussion is around understanding what requirements are for such a feature.


   * libfam - http://oss.sgi.com/projects/fam/
   * gamin - http://www.gnome.org/~veillard/gamin/index.html

They are compared (with some bias of course) at: http://www.gnome.org/~veillard/gamin/differences.html

It's an API like these that we need, so I think any solution proposed (albeit sysevents underneath) should expose one of these APIs to the user, if the same people produce this API as those that expose things via sysevents, then things should work at their optimal. Whether it's the SAME API is up for debate, but what ever we expose should at least provide the same functionality.

When you say 'sysevents' what are you referring to?

Support for a commonly used API, can be  provided means of a library.
The implementation will provide a native set of interfaces to support the desired functionality/requirements, like using the Event ports interfaces. The library can
use these native interfaces and expose a common API.

One think that people have mentioned before as well is the handling of distributed file-systems - how do you propose we handle these - this is especially important on Solaris given that the majority of our customers use NFS for their home directories. From what I understand of the File Event Mechanism (FEM) in the kernel, we are using this in NFSv4, can this provide us with the ability to see changes to files that occur on a NFSv4 filesystem mounted on my desktop, for instance?

The proposed solution will not provide file events generated on a distributed filesystem from a remote node. But it certainly can provide file events generated locally on this distributed filesystem(Ex NFS client side). I don't think the 'FEM' framework in the kernel, that is used for the NFSv4 delegation can provide support for events from remote nodes.
Clearly, this should be transparent to the API.

It appears that the distributed filesystem implementation should provide necessary means to collect such events. I think the responsibility falls squarely on the distributed file system implementation. I don't of if there are any distributed file system implementation which can
do that.

-Prakash.
Thanks,

Darren.


Prakash Sangappa wrote:
Glynn Foster wrote:

Yeah, currently Beagle only indexes a relatively small amount of per-user data, generally in $HOME - however, as has been mentioned, it's probably one of the first proper use cases of inotify.

I'd suggest that it's definitely worth looking at what inotify does - given that there seems to have been a lot of churn with dnotify/inotify/FAM/gamin/ etc.., there's probably some implementation lessons to be learned from our Linux neighbours [some of http://kerneltrap.org/node/3847 might be interesting reading].

FILE_CLOSE_WRITE is missing from the list of events posted previously that is apparently useful, as suggested by some GNOME developers.

Yes, I have looked at some of the issue with 'dnotify' that got addressed
by 'inotify' and also some of the issue with inotify.

Here is another pointer which lists some issues with 'inotify'. These may have
been addressed already with the later version of inotify.
http://manic.desrt.ca/inotify


Proposed interfaces:

- Unlike the 'inotify' interfaces, which uses 'ioctls' to a device, our interfaces will be based on the existing Event ports framework. The events
  will be delivered to a specified 'Event port'(which is an fd).

   The Event port can receive events from multiple sources. Currently
   available event sources in Solaris  'poll, aio, timer,  user,
   message queue(this was recently integrated)'.

   More event sources can be added.

  Ex,
if the application needs to wait on a 'poll' event and also receive file events notifications, it can receive both these types of events on one Event port.

- The application does not have to open the file/directory being watched.
   Therefore it will not prevent the file from being deleted  or
   unmounting the  filesystem. These where issues with 'dnotify'.

- If the file system gets unmounted, it will  automatically de-register
   the event notification and send an event indicating that.
It can also send a 'file deleted event' when watching a file which gets deleted. - A pointer to some user data can be passed in when registering file event notication. The user data will be returned along with the event notification. This was something
  missing with the inotify interface.

I think 'inotify' returns something called 'watch descriptors' when registering the watches. They had some issue with 'watch descriptors' being reused that would cause confusion in identifying the received events. This issue may have been addressed already with their
 new version.

This problem should not exist with the event ports interface since a user data pointer will be returned with the event. The application can differentiate the events based on that.

- To de-register or to re-register, the object(file) needs to be passed again. But the file name could disappear(get removed) from the directory. To accommodate this, we could change the passed in object to a structure which will include the filename. Once it gets registered, the system can return an unique id. So subsequent actions
 (de-register, re-register), we depend on this 'id'.

Ex.
   file_event {
uintptr_t id; /* id returned after registering */
               int          len;         /* length of the file name */
               char fname[0];     /* filename */
   }fobj;


  So, now the interface will be

  port_associate(port, PORT_SOURCE_FILE,  (uintptr_t)&fobj,  ... )

  So this 'fobj' structure can be passed to de-register(port_dissociate)
  or re-register(port_associate again) the file events.


FILE_CLOSE_WRITE - What is the purpose of this event?  If it is
found to be useful, we could include it.

Rgds,
-Prakash.


Glynn


Glynn

Does this mean we don't get told /what/ got created? Is an application that wants to know "what files are disappearing/appearing under /foo/bar/?" going to have to readdir() the whole directory every time it gets an event?
Otherwise we get into the queued event problem; what happens if the
application is watching a directory w/ a million files, and someone does rm * in that directory? Do we generate a million events? Clearly there
are limits to the number of events we can queue in the kernel, esp.
since the application isn't obligated to read them in a timely fashion.

Forcing a (recursive!) readdir() every time can't scale either; it just pushes the problem out all the userspace apps. Perhaps a compromise approach would work, so at least the readdir() cost is amortized; i.e. give names up to a
particular limit.

Or how do you expect Beagle to be able to work nicely? Is this just going to
remain something explicitly unsupportable?

I'd rather have a model like signals; multiple file writes are combined into one event until that event is read by the application; any subsequent writes generate another event.

Would work fine for modifications, yes.

I see this as very useful to avoid the {sleep(); stat() } loops we often
see.  It's not a mechanism to insert an application as an synchronous
interposer into the filesystem VOPS.

I wasn't trying to suggest it was. Synchronisation is neither needed nor
wanted.

The nscd could use this to watch for modifications to configuration
files rather than stat'ing them before each cache lookup.

I wasn't suggesting that a non-recursive approach doesn't solve a whole class of such situations; it does. In fact, I was merely trying to raise awareness of what applications like Beagle actually need in terms of notifications. If it's
really too hard to do, that's a pity.

regards,
john
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to