Hi, just came back from vacation, sorry for the delay.
On Sat, 2006-08-12 at 23:59 +0530, Suparna Bhattacharya wrote: > BTW, if anyone would like to be dropped off this growing cc list, please > let us know. > > On Fri, Aug 11, 2006 at 12:45:55PM -0700, Ulrich Drepper wrote: > > Sébastien Dugué wrote: > > > aio completion notification > > > > I looked over this now but I don't think I understand everything. Or I > > don't see how it all is integrated. And no, I'm not looking at the > > proposed glibc code since would mean being tainted. > > Oh, I didn't realise that. > I'll make an attempt to clarify parts that I understand based on what I > have gleaned from my reading of the code and intent, but hopefully Sebastien, > Ben, Zach et al will be able to pitch in for a more accurate and complete > picture. > > > > > > > > Details: > > > ------- > > > > > > A struct sigevent *aio_sigeventp is added to struct iocb in > > > include/linux/aio_abi.h > > > > > > An enum {IO_NOTIFY_SIGNAL = 0, IO_NOTIFY_THREAD_ID = 1} is added in > > > include/linux/aio.h: > > > > > > - IO_NOTIFY_SIGNAL means that the signal is to be sent to the > > > requesting thread > > > > > > - IO_NOTIFY_THREAD_ID means that the signal is to be sent to a > > > specifi thread. > > > > This has been proved to be sufficient in the timer code which basically > > has the same problem. But why do you need separate constants? We have > > the various SIGEV_* constants, among them SIGEV_THREAD_ID. Just use > > these constants for the values of ki_notify. > > > > I am wondering about that too. IIRC, the IO_NOTIFY_* constants are not > part of the ABI, but only internal to the kernel implementation. I think > Zach had suggested inferring THREAD_ID notification if the pid specified > is not zero. But, I don't see why ->sigev_notify couldn't used directly > (just like the POSIX timers code does) thus doing away with the > new constants altogether. Sebestian/Laurent, do you recall? As I see it, those IO_NOTIFY_* constants are uneeded and we could use ->sigev_notify directly. I will change this so that we use the same mechanism as the POSIX timers code. > > > > > > The following fields are added to struct kiocb in include/linux/aio.h: > > > > > > - pid_t ki_pid: target of the signal > > > > > > - __u16 ki_signo: signal number > > > > > > - __u16 ki_notify: kind of notification, IO_NOTIFY_SIGNAL or > > > IO_NOTIFY_THREAD_ID > > > > > > - uid_t ki_uid, ki_euid: filled with the submitter credentials > > > > These two fields aren't needed for the POSIX interfaces. Where does the > > requirement come from? I don't say they should be removed, they might > > be useful, but if the costs are non-negligible then they could go away. > > I'm guessing they are being used for validation of permissions at the time > of sending the signal, but maybe saving the task pointer in the iocb instead > of the pid would suffice ? IIRC, Ben added these for that exact reason. Is this really needed? Ben? > > > > > > > > - check whether the submitting thread wants to be notified directly > > > (sigevent->sigev_notify_thread_id is 0) or wants the signal to be sent > > > to another thread. > > > In the latter case a check is made to assert that the target thread > > > is in the same thread group > > > > Is this really how it's implemented? This is not how it should be. > > Either a signal is sent to a specific thread in the same process (this > > is what SIGEV_THREAD_ID is for) or the signal is sent to a calling > > process. Sending a signal to the process means that from the kernel's > > POV any thread which doesn't have the signal blocked can receive it. > > The final decision is made by the kernel. There is no mechanism to send > > the signal to another process. > > The code seems to be set up to call specific_send_sig_info() in the case > of *_THREAD_ID , and __group_send_sig_info() otherwise. So I think the > intended behaviour is as you describe it should be (__group_send_sig_info > does the equivalent of sending a signal to the process and so any thread > which doesn't have signals blocked can receive it, while > specific_send_sig_info > sends it to a particular thread). > > But, I should really leave it to Sebestian to confirm that. That's right, but I think that part needs to be reworked to follow the same logic as the POSIX timers. > > > listio support > > > > > > > I really don't understand the kernel interface for this feature. > > I'm sorry this is confusing. This probably means that we need to > separate the external interface description more clearly and completely > from the internals. > > > > > > > > Details: > > > ------- > > > > > > An IOCB_CMD_GROUP is added to the IOCB_CMD enum in > > > include/linux/aio_abi.h > > > > > > A struct lio_event is added in include/linux/aio.h > > > > > > A struct lio_event *ki_lio is added to struct iocb in > > > include/linux/aio.h > > > > So you have a pointer in the structure for the individual requests. I > > assume you use the atomic counter to trigger the final delivery. I > > further assume that if lio_wait is set the calling thread is suspended > > until all requests are handled and that the final notification in this > > case means that thread gets woken. > > > > This is all fine. > > > > But how do you pass the requests to the kernel? If you have a new > > lio_listio-like syscall it'll be easy. But I haven't seen anything like > > this mentioned. > > > > The alternative is to pass the requests one-by-one in which case I don't > > see how you create the reference to the lio_listio control block. This > > approach seems to be slower. > > The way it works (and better ideas are welcome) is that, since the io_submit() > syscall already accepts an array of iocbs[], no new syscall was introduced. > To implement lio_listio, one has to set up such an array, with the first iocb > in the array having the special (new) grouping opcode of IOCB_CMD_GROUP which > specifies the sigev notification to be associated with group completion > (a NULL value of the sigev notification pointer would imply equivalent of > LIO_WAIT). The following iocbs in the array should correspond to the set of > listio aiocbs. Whenever it encounters an IOCB_CMD_GROUP iocb opcode, the > kernel would interpret all subsequent iocbs[] submitted in the same > io_submit() call to be associated with the same lio control block. > > Does that clarify ? > > Would an example help ? > > > > > If all requests are passed at once, do you have the equivalent of > > LIO_NOP entries? So far, LIO_NOP entries are pruned by the support library (libposix-aio) and never sent to the kernel. > > > > Good question - we do have an IOCB_CMD_NOOP defined, and I seem to even > recall a patch that implemented it, but am wondering if it ever got merged. > Ben/Zach ? > > > > > How can we support the extension where we wait for a number of requests > > which need not be all of them. I.e., I submit N requests and want to be > > notified when at least M (M <= N) notified. I am not yet clear about > > the actual semantics we should implement (e.g., do we send another > > notification after the first one?) but it's something which IMO should > > be taken into account in the design. > > > > My thought here was that it should be possible to include M as a parameter > to the IOCB_CMD_GROUP opcode iocb, and thus incorporated in the lio control > block ... then whatever semantics are agreed upon can be implemented. > > > > > Finally, and this is very important, does you code send out the > > individual requests notification and then in the end the lio_listio > > completion? I think Suparna wrote this is the case but I want to make sure. > > Sebestian, could you confirm ? If (and only if) the user did setup a sigevent for one or more individual requests then those requests completion will trigger a notification and in the end the list completion notification is sent. Otherwise, only the list completion notification is sent. -- ----------------------------------------------------- Sébastien Dugué BULL/FREC:B1-247 phone: (+33) 476 29 77 70 Bullcom: 229-7770 mailto:[EMAIL PROTECTED] Linux POSIX AIO: http://www.bullopensource.org/posix http://sourceforge.net/projects/paiol ----------------------------------------------------- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html