Re: [RADIATOR] radiator Timeout handling

David Zych Wed, 06 Apr 2011 12:22:35 -0700

On 9/20/2010 12:00 PM, Michael <ri...@vianet.ca> wrote:
>>>>>> >>>>> I'm having a couple issues with<AuthBy SQL>. Maybe it would be 
>>>>>> >>>>> considered a bug i'm not sure.
>>>>>> >>>>>
>>>>>> >>>>> 1. the Timeout handling.
>>>>>> >>>>> ------------------------
>>>>>> >>>>>            
>>>>>>> >>>>>>  From my testing, it appears that radiator times out at this 
>>>>>>> >>>>>> value, but seems to retry the sql query a second time, creating 
>>>>>>> >>>>>> in another timeout count.
>>>>>>> >>>>>>              
>>>>>> >>>>> eg debug:
>>>>>> >>>>> Tue Sep 14 12:48:21 2010: DEBUG: Handling accounting with 
>>>>>> >>>>> Radius::AuthSQL
>>>>>> >>>>> Tue Sep 14 12:48:21 2010: DEBUG: do query is: 'insert into 
>>>>>> >>>>> `acct`<snip>
>>>>>> >>>>> Tue Sep 14 12:48:25 2010: ERR: do failed for 'insert into 
>>>>>> >>>>> `acct`<snip>   SQL Timeout
>>>>>> >>>>> Tue Sep 14 12:48:29 2010: ERR: do failed for 'insert into 
>>>>>> >>>>> `acct`<snip>   SQL Timeout
>>>>>> >>>>> Tue Sep 14 12:48:29 2010: DEBUG: AuthBy SQL result: IGNORE, 
>>>>>> >>>>> Database failure
>>>>>> >>>>>
>>>>>> >>>>> Timeout is set for 4 seconds...
>>>>>> >>>>> so, query executed at 12:48:21, ERR timed out 4 seconds later, 
>>>>>> >>>>> appeared to re-try but didn't say anything, and another ERR 
>>>>>> >>>>> timeout 4 seconds after that.  That's 8 seconds of course.  It 
>>>>>> >>>>> doubles the Timeout value.
>>>>>> >>>>>
>>>>>> >>>>> This is no good, for me.  If I set my SQL timeout value for 4 
>>>>>> >>>>> seconds, and my NAS timeout for 5 seconds, I expect my radiator to 
>>>>>> >>>>> timeout before my NAS re-transmits.  my NAS will retry after 5 
>>>>>> >>>>> seconds because radiator hasn't responded.  And, radiator hasn't 
>>>>>> >>>>> obeyed the timeout so it's still waiting for 8 seconds.  This 
>>>>>> >>>>> causes the same accounting packet to enter radiator again, and 
>>>>>> >>>>> causing another 8 seconds delay, and of course duplicate entries 
>>>>>> >>>>> in the accounting logging since I'm also using 
>>>>>> >>>>> AcctFailedLogFileName so the packet will eventually end up in the 
>>>>>> >>>>> SQL table.
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> 2. SQL Timeout issue #2.
>>>>>> >>>>> ------------------------
>>>>>> >>>>> using the same debug example above, when the SQL query times out, 
>>>>>> >>>>> it doesn't seem to use the FailureBackoffTime value. It only seems 
>>>>>> >>>>> to use FailureBackoffTime when there is a connection failure, not 
>>>>>> >>>>> a timeout.  So, every query is still presented to the SQL server.  
>>>>>> >>>>> If the timeout is due to lets say a write lock, when the lock 
>>>>>> >>>>> releases, all the queued insert statements are executed creating 
>>>>>> >>>>> in sometimes up to 10 duplicate accounting entries.
>> > Hello Michael -
>> >
>> > The behaviour you observe is in fact what the code does - the manual does 
>> > not correctly describe this behaviour.
>> >
>> > The manual has been amended for the next release.
>> >
>> > Thanks for letting us know.
>> >
>> > regards
>> >
>> > Hugh
>> >
> Can I suggest an option to disable this behavior? In my case, I would
> prefer radiator to only allow one timeout, when a timeout occurs,
> respect the FailureBackoffTime. If it doesn't, radiator creates a very
> undesirable situation when it continues to try for every packet, the sql
> server that timed out. It basically bottlenecks my whole radius system
> since all radiator servers connect to the same accounting mysql server,
> and all nas's eventually mark each radius server "RADIUS_DEAD" and then
> all authentication seems to stop.
>
> Mike


I just ran into this same problem; my DB got into a state where
DBI->connect was working fine but actual INSERTs were timing out, and
the non-observance of FailureBackoffTime in this situation resulted in
both of my RADIUS servers being effectively stalled for 10 minutes (one
INSERT Timeout at a time) until the DB issue was resolved.

I would like to second Michael's request for a way to alter this behavior.

It appears that right now SqlDb.pm has a single $self->{backoff_until}
timer that applies collectively to all configured DBSources (i.e. it is
set only when all DBSources fail DBI->connect in sequence, and when set
it causes none of them to be tried again in reconnect() until the set
time).  Would it perhaps make more sense that:

1. each configured DBSource gets its own individual backoff_until timer
that is set when that DBSource fails DBI->connect, and when set causes
that DBSource to be skipped in reconnect() until the set time.

2. individual statement timeouts, such as the one in SqlDb::do(), could
also set the backoff_until timer for the individual DBSource currently
in use.  If this is judged not to be desirable in the general case, it
could be controlled by a separate configuration parameter
("TimeoutBackoffTime", perhaps?).

I'm half tempted to try to implement this myself, but I'm not confident
that I fully understand all the potential repercussions for other parts
of Radiator, and I know I'm not in a good position to test it thoroughly.

Thanks,
David
_______________________________________________
radiator mailing list
radiator@open.com.au
http://www.open.com.au/mailman/listinfo/radiator

Re: [RADIATOR] radiator Timeout handling

Reply via email to