Re: [OPSAWG] RADIUS Extension, Getting Started

Alan DeKok Mon, 06 Jul 2020 13:17:06 -0700

On Jul 6, 2020, at 11:40 AM, Massameno, Dan <[email protected]> wrote:
> 
> Dear Ops and Management Area WG,
> 
> There have been a number of great suggestions on where to post this document 
> (Thanks Stefan!).   I'm now emailing [email protected] and cc'ing 
> [email protected].  
> 
> The draft is posted here... 
> https://tools.ietf.org/html/draft-massameno-radius-lb-00
> 
> Do I need an official IETF sponsor?  Would it help to try and get a vendor 
> interested in implementing the protocol?  Cisco is our primary vendor at Yale 
> University.  I was wondering if there is anyone on either of these working 
> groups that communicates with Cisco people on a regular basis?


  There are lots of Cisco people.  I'm sure some of those people have opinions.

  There are a few things ongoing here.  The RADEXT WG is still alive, but 
inactive.  There is a still an open discussion in the WG about previous 
documents which addressed shortcomings in the base RADIUS protocol.  The polite 
version is that the long-term RADEXT participants supported one draft.  Many 
people who had never been involved with RADEXT supported a competing draft.  
There was never any resolution.  The RADEXT WG was left in limbo.

  I'll avoid the subject of the best place for this draft.

  Overall, I would oppose this draft.  For the simple reason that brains belong 
in the RADIUS server, not in the NAS.  NAS vendors haven't even implemented the 
retransmission behaviours suggested in RFC 5080 Section 2.1, which is well over 
a decade old at this point.  If they can't do that, how can they implement an 
even more complex load-balancing system?

  Further, there is no standard in RADIUS for load-balancing with fail-over and 
fail-back.  Every NAS vendor and RADIUS vendor does something different.  While 
this document tries to address that issue, it does not address a number of key 
points.


  I do have a review, as follows.  Many of these issues given below are nits, 
and can be fixed by rewording the document, or moving to an approach which 
follows existing standards (RFC 6158, RFC 6929, RFC 8044).  However, for 
reasons given above, I don't think it's useful to move ahead with this document.


* Section 2 - Message Summary

   I recommend deleting this part.  The RADIUS packet format is well known.  
There's no need to repeat the definition here.  Just refer to RFC 5997.

* Section 2.2 - nitpick: 191 is not defined, so it's likely best to leave this 
attribute value as TBD, even if IANA is likely to choose 191.

  Further, RADIUS definitions use consistent names for attributes.  The names 
do not change when they're used in different packet types.  The attribute 
should just be called something like "Load-Balance-Status"

* Section 2.3.2 - the text here is simply wrong.  The Request-Authenticator is 
not "hashed with the identity material from the calling-station
   and the RADIUS shared secret."

   RFC 5997 Section 5 already mandates the use of Message-Authenticator with 
Status-Server.

  I suggest deleting 2.3.2 entirely.

* Section 3 - the definition of the LB-Request does not satisfy the 
requirements of RFC 6929 Section 5.2.  There is no reason to use a 16-bit data 
type here.

  https://tools.ietf.org/html/rfc6929#section-6.2

  RFC 8044 also defines data types which are appropriate for use in RADIUS.  
That RFC also suggests that there is no longer need for ASCII art in most 
cases, as attribute definitions can just use predefined data types.

  Also in general, reserved bits should go at the top of an integer attribute.  
Status bits should be in the lower bits.

* Section 4.1 redefines attribute 191 to contain completely different data.  
This definition violates the prohibition on polymorphic attributes given in RFC 
6159 Section 3.4

https://tools.ietf.org/html/rfc6158#section-3.4

  I suggest using two different attributes, one for signalling client to 
server, and another for responding server to client.

  The text in this section is also unclear:

Server_Info
     The String field is one or more server_info PDUs.  Each PDU defines
     the status of a single server and its defining characteristics.

  What does that mean?  Is the intent to carry the SVR-Record attributes inside 
of attribute 191?  If so, then using attribute 191 as both a 16-bit bitfield, 
and as a TLV carrier is very odd.  It's safe to say that nothing will ever 
support this.  I suggest using two different attrributes.

  I suggest simply relying on the existence of one or more SRV-Record 
attributes in the reply.  There is no need to repeat the LB-* attribute.

* Section 5 The SRV-Record TLV

  This section does not define any TLV attribute.  It's not clear what this 
means.

* Section 5.1 and 5.2

 It's not clear why two different attributes are needed.  Why not just use 
SRV-Record?

  And there's no need to repeat ASCII art.  RFC 8044 Section 3.13 already 
defines the TLV data type, and format.  You should just reference that.

* Section 5.3 defines a table of attributes, including attributes not 
previously seen in the document.

  I suggest moving the table of attributes to later in the document, *after* 
the relevant attributes have been defined.

* Section 6

 There's no need for ASCII art.  RFC 8044 Section 2.1.2 makes this clear.  Just 
reference the data types.

* Section 7.1

  The document says:

   ...  Under these
   Status-Server-LB rules the NAS MUST send all RADIUS messages relative
   to a particular session to the same RADIUS server to maximize the
   probability of a cache-hit.

  What then, is the utility of land balancing?  If one RADIUS server goes down, 
cannot the NAS send packets for a session to a different, but *equivalent* 
RADIUS server?  This recommendation seems to drastically increase the 
likelihood of losing accounting packets, which seems bad.

  In addition, the discussion of "caching" seems odd, and uncommon.  Databases 
do user replication, not RADIUS servers.  We should not expose implementation 
details in a standard document.

  i.e. the RADIUS servers should be treated as identical, and local balancing 
should be done on a packet by packet basis.

* Section 7.2

  The document says:

   The weight field specifies a relative weight for entries with the
   same priority.  Larger weights SHOULD be given a proportionately
   higher probability of being used for AAA services. 

  Ok, *how* is this to be done?  Specifying this behaviour seems important.  
Perhaps adding up all weights of a similar priority, and choosing a "W_i" at 
random, with probability W_i / Sum(W_0. ... W_n)

* Missing: discussion of shared secrets, authentication vs accounting, ports, 
TLS, TCP, etc.

  What shared secret is used for these different RADIUS servers?  The same one 
as used in the original Status-Server?

  What ports should the NAS send packets to?  Does this load balancing work for 
all packet types?

  What about TCP, UDP, TLS, etc.?  How does this proposal interact with dynamic 
discovery? (RFC 7585)

* Section 8

  The document says:

   RADIUS Server Load Balancing has many of the same security
   implications as the base RADIUS protocol.

  There are many additional security issues opened up by this new capability.  
They should be discussed.

* Section 10

  The document says:

   The Vendor Specified Attribute 26 may be used to encapsulate the LB-
   Request and LB-Response PDU where no vendor interoperability is
   required.

   It is very odd to put standards text inside of an "IANA considerations" 
section.  Further, this text appears to be confused about the purpose of the 
RADIUS attribute numbers.  These numbers are location-sensitive. i.e. vendor 
attribute "191" has no relation whatsoever to attribute "191" in the main 
RADIUS space.  Similarly, it has no relation to an attribute "191" from another 
vendor.

  This text is wrong and should be deleted.

Summary:

  Even if the various nits and and issues above were fixed, this proposal would 
have serious security issues.  Even if those security issues were addressed, I 
believe that load balancing is simply not appropriate for the NAS.  Even if it 
was appropriate for the NAS, the vendors have spoken: NAS implementations are 
simple, and server implementations are complex.

  As such, load balancing more properly belongs in the server.

  Alan DeKok.

_______________________________________________
OPSAWG mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/opsawg

Re: [OPSAWG] RADIUS Extension, Getting Started

Reply via email to