[Emu] Notes from the Interactive EAP implementation front

David Mitton Tue, 22 Sep 2009 14:27:51 -0700

Over the previous year, I was the Project Leader for a Windows Vista 
implementation of the RSA SecurID OTP EAP methods (15  & 32 - Protected OTP).   
During this project we had a number of setbacks, and I achieved a new 
enlightenment about where some long suffering problems with our user 
interactive EAP methods actually come from.


The primary goal of this project was to port our XP client to Vista and use the 
new Windows EAP API, EAPHost.
The first problem we found is that Vista EAPHost doesn't support RAS 
connections (PPP and VPN), only 802.1X connections. This has been "fixed" in 
Windows 7.   However, we needed to support those connection methods, and we 
shelved the EAPHost version in favor of upgrading the existing RasEap API 
implementation (which all Microsoft methods use anyways).

A major problem we have run into with SecurID EAP is timeouts by access points 
during the authentication process.
As I've discussed before 
(http://www.ietf.org/proceedings/66/slides/emu-4/sld1.htm) some access point 
implementations have a very short timer on EAP authentication turnaround.

A typical SecurID token generates a new One Time Password (OTP) value once 
every minute.  A row of indicators indicates where the token is in the cycle, 
and some users prefer to wait for a new token code, instead of starting to make 
an entry and having it change before they are done.   Furthermore, if the user 
is more than 3 minutes out of sync with the server, they may also have to wait 
for a second code and enter it.   This leads to user input times on the order 
of a minute easy to possibly 3 minutes total.   Independently, the system Admin 
may also force a PIN code change, (similar to a password change) and this 
process needs PIN input as well as another token value, before authentication 
is granted.


Given our experience, both of our EAP methods have designed-in protocol 
messages to generate EAP keep-alive traffic while the user interacting with 
their token.  Our next major problem was that Windows Vista changed the way 
that EAP user interfaces were supported, and our implementation that supported 
this was utterly broken.


In order to both display a dialog to the user, and send network keep-alive 
messages, we had to get two things going at once.  This is not trivially 
possible inside the Windows EAP environment as most of the code is of a 
single-threaded callback and return nature.  Our XP client would create an 
extra thread in the EAP UI environment, and support the user dialog from there. 
 When a keep-alive message was needed, it would return to the network code to 
send it.  And when the response returned, it would call back to the UI host 
code, which was still running in the same multithreaded server process, and 
re-establish a shared context.

Vista broke this by moving the UI host environment to a single threaded COM 
apartment, and terminates the entire process when it returns status to the 
network component.

I first tried to just end around the Windows EAP UI and invoke my own UI 
process.  At first that worked just fine, until I let 60 seconds go by.   Then 
the session was shutdown by the Windows 802.1X state machine.  This is not even 
an inactivity timer.  It would kill the session after one minute even if it had 
just seconds before sent a response to the server. 

Further tracing and an open support ticket revealed that the Windows 802.1X 
state machine would not let an authentication stay open for longer than 1 
minute unless it had a UI active.  If I had a UI active, it would allow 5 
periods.  I asked, but was not offered any interfaces that could manipulate 
this timer or alter it's sense of the UI state.  Microsoft's recommendation was 
to re-build our XP solution but using external processes instead of 
multi-threading their new UI infrastructure.

I ended up implementing a separate UI service process that ran the input 
dialogs.  This process was started from the EAP UI context when an 
authentication began, and would be contacted every keep-alive cycle, until the 
user input was complete.  Debugging the interprocess communication and state 
machines took a little bit more time.


While I was trying to understand the issues involved in the client, I started 
experimenting with alternate approaches in the server.  The RasEap API offers 
two transmit message calls to the EAP server (sorry - Authenticator); one 
EAPACTION_SendWithTimeoutInteractive which was what the code was using or 
EAPACTION_SendWithTimeout, which might have offered a way for the server to 
catch a timeout and act on it.  To get more information I sent Microsoft a 
support request to explain the differences between the two.   The answer I got 
was that EAPACTION_SendWithTimeoutInteractive had a 30 second timeout, and 
EAPACTION_SendWithTimeout had a 60 second timeout.  What I was looking for was 
any way the server could catch the timeout and resend the previous message.   I 
noticed that the code to do that was there, but the event never happened.

So, I tried the SendWithTimeout to see if an extra 30 seconds would get me some 
more breathing room.  Instead the authentication failed immediately.   I 
finally busted out Kismet and Wireshark to find out WTF was going on.
I wanted to know how an internal EAP server call was affecting the client EAPOL 
AP authentication in the air.

I discovered that the Windows IAS server was using the API action code to set a 
Session-Timeout value in the RADIUS message carrying the EAP request message.  
For the SendWithTimeoutInteractive it was setting the value to 30.  For the 
SendWithTimeout it was setting it to 6, not 60 as I had been told.  

Furthermore, the access point I use, the Cisco Aironet 1230 [v12.3(8)JEC2] 
would take those timeout values and treat them as a session expiration timer, 
instead of a re-transmit timer.  If a message wasn't received in that period, 
it would abort the authentication.   I do not think this is what is intended by 
RFC 3580, sect 3.17, page 11:

   When sent in an Access-Challenge, this attribute represents the
   maximum number of seconds that an IEEE 802.1X Authenticator should
   wait for an EAP-Response before retransmitting.  

Hunting around for an explanation of 1200 AP behavior was difficult to find.  
At one time I found a Meetinghouse website that mentioned that you increase the 
"Session timeout" for interactive protocols, but that disappeared.   And on my 
AP's management GUI for this particular parameter does not show on that page.   
I was finally able to locate in a Cisco command reference that recent versions 
of IOS that lets you extend the timeout default to a max of 120 seconds and (on 
some platforms) allow it to ignore the RADIUS  Session-Timeout value.

  dox1x timeout supp-response 120 local

This command doesn't seem to have a GUI equivalent on my AP.

120 seconds, 2 minutes, is not as much as I would like, but this is only for 
one EAP message request/response.  Normally a user won't take that long for one 
step of the authentication.  I still don't know why it doesn't retransmit, or 
if it can at all.  The 802.1X specs are extremely vague as to what the values 
in this area should be, but it seems that retransmission would be highly 
desirable for better reliability while authenticating.


Eventually we were able to put the whole thing together.  There are Release 
Notes that discuss the issues and parameters that allow you to tune it as 
desired.   Our new EAP UI process can ride out the keep-alive cycles in it's 
own execution context and it tries to keep the MS dialog transitions happy by 
generating clicks on the annoying “a response is required” dialogs or balloons. 
  An extra feature is a short term username/identity cache to allow a quick and 
silent IdentityResponse if the first didn't get the session started.

I am now testing this on Windows 7, and have only found one EAP interface 
problem to date.


Design/Architecture problems I see:

- Windows 802.1X has a fixed timeout (5x60s),  there really should be some sort 
of override on a per-method basis somewhere.  A one size fits all timer seems 
rather presumptuous.

- The timeout should be sensitive to activity (or progress), not just time or 
UI state.
Our worse case authentication could have both a PIN change and a Token resync 
which would require 3 token codes (3 minutes) not including user typing and 
reaction time.

- The Windows EAP infrastructure doesn't easily support methods that need to do 
two things at once.
It would be useful to queue events or wake up threads based on timers.

- Access Points (Cisco in this case) should not be setting short timers for 
user interactive methods.
Human input really requires timeouts on the order of several minutes.

- Why is the AP killing sessions instead of retransmitting?

- Does any AP retransmit?  Why not?  It's in the 802.1X state machine.
The MS PPTP tunnel server does it frequently and my code tolerates it well.

- Note that given the AP behavior, if the RADIUS Server gave the EAP method 
authenticator a timeout event (which I've yet to observe), for a potential 
retransmission, the AP would not accept the message, because it's using the 
value for a different purpose.  

Another potential way to solve the keep-alive problem would have been for the 
client to use a retransmitted request (and the network context thread that 
receives and processes it) as a way to send a keep-alive message.   But given 
the AP and RADIUS behavior, I was not able to follow that path.


Dave Mitton,
RSA Security, Division of EMC
Bedford, MA


PS: I forgot about the EAP Notification message bug in Vista.   If you use the 
EapRas API method, wireless EAP Notification messages you might receive are 
discarded silently and without a response to the authenticator. (for some 
reason RAS EAP connections actually give you the message and let you respond)  
If you have a server that sends Notification messages, your protocol will break 
at this point. On XP even if the wireless method didn't receive the message, 
the underlying code would send a response.  There is now a patch for this 
KB967802. Evidently EAPHost will let you get your message.
_______________________________________________
Emu mailing list
Emu@ietf.org
https://www.ietf.org/mailman/listinfo/emu

[Emu] Notes from the Interactive EAP implementation front

Reply via email to