Hi Joe,
As it has been previously discussed, watching for file events on an entire
filesystem or a directory tree can be a scalability issue. So the File
events API
may not be suitable for an application like the desktop search.
For a desktop search system like Beagle or spotlight, it appears that a
better and
an useful method would be for the filesystem to provide an interface
using which
we could efficiently collect all the changes that have occurred since
some given time.
The ZFS file system should be able to provide such information very
efficiently.
Bart Smaalders can comment more on this.
The approach we are taking for the file events notification API, is to
address the needs
where applications have to repeatedly stat files/directories for
changes. Now using the
file events API, applications have to just wait for file events that are
sent when a file or
directory status changes.
The interface is going to be as described below. As mentioned in the
previous
discussions, these will be based on the event ports interface.
--------------------------------------------------------------------------------------------------
File events interface:
Event types:
* FILE_ACCESS /* Monitored file/directory was accessed */
* FILE_MODIFIED /* Monitored file/directory was modified */
* FILE_ATTRIB /* Monitored file/directory's ATTRIB was
changed */
Exception events:
* FILE_DELETE /* Monitored file/directory was deleted */
* FILE_RENAME_TO /* Monitored file/directory was renamed */
* FILE_RENAME_FROM /* Monitored file/directory was renamed */
* UNMOUNTED /* Monitored file system got unmounted */
The application can only watch the following events. The exception events
are reported as they occur. They don't have to be watched for.
FILE_ACCESS,
FILE_MODIFIED,
FILE_ATTRIB.
typedef struct file_obj {
timestruc_t atime; /* Access time got from stat(2) */
timestruc_t mtime; /* Modification time from stat(2) */
timestruc_t ctime; /* Change time from stat(2) */
char *name; /* Null terminated file name */
} file_obj_t;
- To create a port where events will be delivered.
port = port_create()
- To register/re-register a file monitor.
port_associate(int port, PORT_SOURCE_FILE, (uintptr_t)&fobj, events, user)
The 'events' specifies the events type requested(FILE_ACCESS,
FILE_MODIFIED,
FILE_ATTRIB). When re-registering, the events passed in will replace
the previously registered events.
- To de-register the file monitor.
port_dissociate(int port, PORT_SOURCE_FILE, (uintptr_t)&fobj)
- To collect the events
port_get(port, port_event_t *pe, timespec_t *timeout)
port_getn(port, port_event_t list[], uint_t *nget, timespec_t *timeout)
Refer man pages of the event ports interfaces for more details.
--------------------------------------------------------------------------------------------------
To activate monitoring(watching) a file, it needs to be registered
Upon delivering an event, the file monitor is disabled. It needs to be
re-registered again to reactivate the monitor and receive further events.
To ensure that no events are missed between the time a file is processed
and a
monitor(watch) is registered, the timestamps are used. The timestamps,
collected from a stat(2) call before processing the file, need to be
passed in
the 'file_obj_t' structure when registering. The timestamps will be compared
with the file or directory's current timestamp. If the time has changed then
it will immediately generate an event.
a mtime change - results in FILE_MODIFIED event
a ctime change - results in FILE_ATTRIB event.
a atime change - results in FILE_ACCESS event.
Disabling the monitor after an event is delivered will enable
proper multi threaded(MT) programing using these API.
Example:
An MT application can have a pool of threads to process files. That means
any thread from this pool should be able to collect events and process a
file
But at any time only one thread should be processing a file. The
following code
snippet can be executed by these threads. Only one thread will be able to
collect an event from a file at a time and proceed to process that file.
These threads, processing events, can call wait_fileevents().
/*
* To initiate watching a file, this function can be called
* once. The fobj_t structure is initialized with the file name.
* The fobj pointer will be passed as the user pointer to be returned
* with the event.
*/
int
watchfile(int port, file_obj_t *fobj, events) {
struct stat sbuf;
stat(fobj->name, &sbuf);
<process_file>(fobj, events);
fobj->atime = sbuf.atim;
fobj->mtime = sbuf.mtim;
fobj->ctime = sbuf.ctim;
return(port_associate(port, PORT_SOURCE_FILE, (uintptr_t)fobj,
events, fobj));
}
/*
* application threads processing the events call this function.
* The file name is in the file_obj_t. This 'fobj' can be passed as the
* 'user pointer' to be returned along with the event.
*/
void
wait_for_fileevents(int port, events) {
port_event_t pe;
While (1) {
struct file_obj *(fobj;
if (port_get(port, &pe, NULL) == -1)
return;
/*
* Check for exception events and process file.
*/
if (!(pe.portev_events & (FILE_EXCEPTION))) {
fobj = (file_obj_t)pe.portev_user;
if (watchfile(port, fobj, events) == -1)
return;
}
}
}
I have an implementation based on this, that is being tested.
-Prakash.
Joe Shaw wrote:
[ Potential repost, I wasn't subscribed to the list before. -j ]
Hi,
I recently came across this thread from May about a file event
notification API in Solaris:
http://mail.opensolaris.org/pipermail/perf-discuss/2006-May/000540.html
And there was a fair amount of discussion about the Beagle desktop
search system in there. I'm the maintainer of Beagle and I'd love to
see it run well on Solaris. At present it has to continuously recrawl
the file system for changes, which is decidedly suboptimal.
I'm not sure how relevant that thread is anymore, can someone give an
update on the progress of the work?
Thanks,
Joe
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org