On 9/20/2010 12:00 PM, Michael <ri...@vianet.ca> wrote: >>>>>> >>>>> I'm having a couple issues with<AuthBy SQL>. Maybe it would be >>>>>> >>>>> considered a bug i'm not sure. >>>>>> >>>>> >>>>>> >>>>> 1. the Timeout handling. >>>>>> >>>>> ------------------------ >>>>>> >>>>> >>>>>>> >>>>>> From my testing, it appears that radiator times out at this >>>>>>> >>>>>> value, but seems to retry the sql query a second time, creating >>>>>>> >>>>>> in another timeout count. >>>>>>> >>>>>> >>>>>> >>>>> eg debug: >>>>>> >>>>> Tue Sep 14 12:48:21 2010: DEBUG: Handling accounting with >>>>>> >>>>> Radius::AuthSQL >>>>>> >>>>> Tue Sep 14 12:48:21 2010: DEBUG: do query is: 'insert into >>>>>> >>>>> `acct`<snip> >>>>>> >>>>> Tue Sep 14 12:48:25 2010: ERR: do failed for 'insert into >>>>>> >>>>> `acct`<snip> SQL Timeout >>>>>> >>>>> Tue Sep 14 12:48:29 2010: ERR: do failed for 'insert into >>>>>> >>>>> `acct`<snip> SQL Timeout >>>>>> >>>>> Tue Sep 14 12:48:29 2010: DEBUG: AuthBy SQL result: IGNORE, >>>>>> >>>>> Database failure >>>>>> >>>>> >>>>>> >>>>> Timeout is set for 4 seconds... >>>>>> >>>>> so, query executed at 12:48:21, ERR timed out 4 seconds later, >>>>>> >>>>> appeared to re-try but didn't say anything, and another ERR >>>>>> >>>>> timeout 4 seconds after that. That's 8 seconds of course. It >>>>>> >>>>> doubles the Timeout value. >>>>>> >>>>> >>>>>> >>>>> This is no good, for me. If I set my SQL timeout value for 4 >>>>>> >>>>> seconds, and my NAS timeout for 5 seconds, I expect my radiator to >>>>>> >>>>> timeout before my NAS re-transmits. my NAS will retry after 5 >>>>>> >>>>> seconds because radiator hasn't responded. And, radiator hasn't >>>>>> >>>>> obeyed the timeout so it's still waiting for 8 seconds. This >>>>>> >>>>> causes the same accounting packet to enter radiator again, and >>>>>> >>>>> causing another 8 seconds delay, and of course duplicate entries >>>>>> >>>>> in the accounting logging since I'm also using >>>>>> >>>>> AcctFailedLogFileName so the packet will eventually end up in the >>>>>> >>>>> SQL table. >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> 2. SQL Timeout issue #2. >>>>>> >>>>> ------------------------ >>>>>> >>>>> using the same debug example above, when the SQL query times out, >>>>>> >>>>> it doesn't seem to use the FailureBackoffTime value. It only seems >>>>>> >>>>> to use FailureBackoffTime when there is a connection failure, not >>>>>> >>>>> a timeout. So, every query is still presented to the SQL server. >>>>>> >>>>> If the timeout is due to lets say a write lock, when the lock >>>>>> >>>>> releases, all the queued insert statements are executed creating >>>>>> >>>>> in sometimes up to 10 duplicate accounting entries. >> > Hello Michael - >> > >> > The behaviour you observe is in fact what the code does - the manual does >> > not correctly describe this behaviour. >> > >> > The manual has been amended for the next release. >> > >> > Thanks for letting us know. >> > >> > regards >> > >> > Hugh >> > > Can I suggest an option to disable this behavior? In my case, I would > prefer radiator to only allow one timeout, when a timeout occurs, > respect the FailureBackoffTime. If it doesn't, radiator creates a very > undesirable situation when it continues to try for every packet, the sql > server that timed out. It basically bottlenecks my whole radius system > since all radiator servers connect to the same accounting mysql server, > and all nas's eventually mark each radius server "RADIUS_DEAD" and then > all authentication seems to stop. > > Mike
I just ran into this same problem; my DB got into a state where DBI->connect was working fine but actual INSERTs were timing out, and the non-observance of FailureBackoffTime in this situation resulted in both of my RADIUS servers being effectively stalled for 10 minutes (one INSERT Timeout at a time) until the DB issue was resolved. I would like to second Michael's request for a way to alter this behavior. It appears that right now SqlDb.pm has a single $self->{backoff_until} timer that applies collectively to all configured DBSources (i.e. it is set only when all DBSources fail DBI->connect in sequence, and when set it causes none of them to be tried again in reconnect() until the set time). Would it perhaps make more sense that: 1. each configured DBSource gets its own individual backoff_until timer that is set when that DBSource fails DBI->connect, and when set causes that DBSource to be skipped in reconnect() until the set time. 2. individual statement timeouts, such as the one in SqlDb::do(), could also set the backoff_until timer for the individual DBSource currently in use. If this is judged not to be desirable in the general case, it could be controlled by a separate configuration parameter ("TimeoutBackoffTime", perhaps?). I'm half tempted to try to implement this myself, but I'm not confident that I fully understand all the potential repercussions for other parts of Radiator, and I know I'm not in a good position to test it thoroughly. Thanks, David _______________________________________________ radiator mailing list radiator@open.com.au http://www.open.com.au/mailman/listinfo/radiator