It probably isn't worth using a lot of your time to pursue this as there are 
ways for us to get around it, and frankly, we probably should anyway.

Set aside the details of that specific scenario. The issue really centers 
around progressing the event library while already inside an event handler. In 
the prior release series, this was allowed, although perhaps not intentionally.

It appears that the new warning plus error return was intended to ensure that 
people don't do this any more, and I can see some of the issues. I think there 
are ways to resolve those problems (e.g., locking only active events instead of 
the entire base), but they also involve (possibly substantial) overhead.

Remember that we didn't used to have different event bases, so our code was 
written to work within a single base. While we can and will revise the code to 
use multiple bases, that does pose its own issues - e.g., tracking which events 
are sitting on what bases so we know what to progress when necessary. In a very 
complex, disperse code such as OMPI, this can become difficult.

What we will likely do instead is a hybrid where we use only a couple of event 
bases, and then rework the code so that actual work is done in 
"progress_callback" functions. In other words, we loop the event lib, let each 
triggered event handler collect any messages onto appropriate message lists, 
leave the event handler, and then cycle thru a list of registered functions 
that process any data on their respective lists. Thus, if those functions need 
to progress the event lib, they can do so from outside of an event handler.

So I think we have a way of making it work. I was mainly just wanting to 
confirm the change in behavior before we embark on the rewrite, and not really 
asking libevent to change something.

HTH
Ralph

On Oct 26, 2010, at 8:16 PM, Nick Mathewson wrote:

> On Sat, Oct 23, 2010 at 5:14 AM, Ralph Castain <r...@open-mpi.org> wrote:
>> Hi folks
>> 
>> I successfully updated our libevent integration in Open MPI, but have 
>> encountered a problem with one use-case that used to work and now doesn't. 
>> Before proceeding to devise a fix, I just wanted to confirm that I 
>> accurately understand the issue.
>> 
>> The problem arises from this scenario:
> 
> Hi, Ralph!  I'm going over this again to try to figure out what to do.
> I think that the short term answer, since you're already shipping a
> patched libevent, and you aren't calling event_base_dispatch on a
> single base from more than thread at once, is for you to remove the
> entire "if" block that checks for reentrant invocation, warns, and
> returns.  If the behavior you get now works for you, that's probably a
> fine workaround for now.  I am pretty sure that it was never actually
> planned to work as it does, but there's no sense in you rewriting your
> code for future 2.1 semantics until they are nailed down.  (And
> there's not much chance of the semantics of reentrant event_base_loop
> invocation getting settled in a 2.0 timeframe IMO).
> 
> 
> That said, I want to ask you a few questions about your use case to
> see if this is actually the best way for Libevent to do what you need,
> or if there's some other piece of functionality that could let you
> implement what you want more cleanly.
> 
>> 
>> 1. we receive a command via a message that we receive in a file descriptor 
>> event. We "push" the command message into a timer event (duration zero time) 
>> to help break a threading issue, and then return from the file descriptor 
>> event.
> 
> So I'm confused here.  You say "to break a threading issue", but in a
> later message you say "After all, we are running single-threaded".
> 
> Also, I'm assuming you know about event_active() and dummy events
> (fd=-1, events=0), and that you're using this zero-duration timeout
> trick for some other reason.  Why, and should there be a better way to
> do that?
> 
>> 2. the event library is called with LOOP_ONCE, causing the timer event to 
>> fire.
>> 
>> 3. from within the timer event, the command causes us to execute a procedure 
>> that results in us having to wait for another event to occur. We "block" in 
>> that position, running a loop that includes a call to progress the event 
>> library (i.e., a call to event_loop(LOOP_ONCE)).
> 
> But, why with the same event_base?  That's the part that confuses me.
> When you block in the callback invoked in 3, you stop executing _all_
> other active event callbacks that might be waiting to execute.  Then
> later the first time  you call event_loop once, you run them.  Was
> that what you had in mind?  I am not getting the architecture here.
> Maybe some pseudocode would make me understand. :/
> 
> yrs,
> -- 
> Nick
> ***********************************************************************
> To unsubscribe, send an e-mail to majord...@freehaven.net with
> unsubscribe libevent-users    in the body.

***********************************************************************
To unsubscribe, send an e-mail to majord...@freehaven.net with
unsubscribe libevent-users    in the body.

Reply via email to