[nvo3] [Shepherding AD review] review of draft-ietf-nvo3-geneve-oam-12

Gunter van de Velde (Nokia) Mon, 25 Nov 2024 08:54:16 -0800

# Gunter Van de Velde, RTG AD, comments for draft-ietf-nvo3-geneve-oam-12

# the referenced line numbers are derived from the idnits tool:
https://author-tools.ietf.org/api/idnits?url=https://www.ietf.org/archive/id/draft-ietf-nvo3-geneve-oam-12.txt


# Many thanks for this write-up. Having proper OAM tools for Geneve is useful 
and operationally high important. Thank you for this work.

# Many thanks to the shepherd write-up from Matthew Bocci and the directorate 
reviews from Stig Venaas, Paul Kyzivat and Himanshu Shah

#GENERIC COMMENTS
#================

## The document handles about "active OAM", while the title of the draft just 
mentions "OAM". Maybe the draft title can be corrected?

## When reading the document i got confused between the asserted difference 
between active oam for geneve and STAMP or BFD. What are the differences and 
what is the reason of existence of current document? Could a small section be 
inserted to provide some guidelines on where the specific value for Active OAM, 
and compare with other mechanisms?

## Is the active OAM a novel way to say icmp ping (and icmp trace)? is there 
anything additional? Maybe this needs to be explicitly mentioned somewhere 
early in this draft? and references added to the technologies asserted to be 
utilized for active OAM. I also found that it is at the very end (i.e. section 
3) where ICMP and ICMPv6 is discussed for its impact with Geneve. This seems 
rather late in the document. If the Active OAM is solely about ICMP(v6) then 
maybe a helicopter perspective earlier in the document helps document readers 
to understand the objectives better? 

## As i read the document for the first time, i got confused by some sections 
to be formal procedures or informational sections. This could of been due to my 
lesser experience with OAM protocols and judge accordingly. Maybe making it 
more obvious for generalists like me could help making the document more easy 
to process?

#DETAILED COMMENTS
#=================

15      Abstract
16
17         This document lists a set of general requirements for active OAM
18         protocols in the Geneve overlay network.  Based on these
19         requirements, the IP encapsulation of active Operations,
20         Administration, and Maintenance protocols in the Geneve protocol is
21         defined.  Considerations for using ICMP and UDP-based protocols are
22         discussed.

GV> What about the following alternative abstract:

"
Geneve (Generic Network Virtualization Encapsulation) is a flexible and 
extensible network virtualization overlay protocol designed to encapsulate 
network packets for transport across underlying physical networks. This 
document specifies the requirements and provides a framework for Operations, 
Administration, and Maintenance (OAM) in Geneve networks. It outlines the OAM 
functions necessary to monitor, diagnose, and troubleshoot Geneve overlay 
networks to ensure their proper operation and performance. The document aims to 
guide the implementation of OAM mechanisms within the Geneve protocol to 
support network operators in maintaining reliable and efficient virtualized 
network environments.
"

75      1.  Introduction
76
77         Geneve [RFC8926] is intended to support various scenarios of network
78         virtualization.  In addition to carrying multiple protocols, e.g.,
79         Ethernet, IPv4/IPv6, the Geneve message includes metadata.
80         Operations, Administration, and Maintenance (OAM) protocols support
81         fault management and performance monitoring functions necessary for
82         comprehensive network operation.  Active OAM protocols, as defined in
83         [RFC7799], use specially constructed packets that are injected into
84         the network.  To ensure that a performance metric or a detected
85         failure are related to a particular Geneve flow, it is critical that
86         these OAM test packets share fate with overlay data packets for that
87         flow when traversing the underlay network.
88
89         A set of general requirements for active OAM protocols in the Geneve
90         overlay network is listed in Section 2.  IP encapsulation conforms to
91         these requirements and is a suitable encapsulation of active OAM
92         protocols in a Geneve overlay network.  Active OAM in a Geneve
93         overlay network are exchanged between two Geneve tunnel endpoints,
94         which may be an NVE (Network Virtualization Edge) or another device
95         acting as a Geneve tunnel endpoint.  For simplicity, an NVE is used
96         to represent the Geneve tunnel endpoint.  Please refer to [RFC7365]
97         and [RFC8014] for detailed definitions and descriptions of an NVE.
98         The IP encapsulation of Geneve OAM defined in this document applies
99         to an overlay service by introducing a Management Virtual Network
100        Identifier (VNI) that could be used in combination with various
101        values of the Protocol Type field in the Geneve header, i.e.,
102        Ethertypes for IPv4 or IPv6.  The analysis and definition of other
103        types of OAM encapsulation in Geneve are outside the scope of this
104        document.

GV> What is unclear to me is how this aligns with ICMP, assuming active OAM is 
ICMP?

GV> idnits rewrite:

"
Geneve [RFC8926] is designed to support various scenarios of network 
virtualization. It encapsulates multiple protocols, such as Ethernet and 
IPv4/IPv6, and includes metadata within the Geneve message.

Operations, Administration, and Maintenance (OAM) protocols provide fault 
management and performance monitoring functions necessary for comprehensive 
network operation. Active OAM protocols, as defined in [RFC7799], utilize 
specially constructed packets injected into the network. To ensure that 
performance metrics or detected failures are accurately related to a particular 
Geneve flow, it is critical that these OAM test packets share fate with the 
overlay data packets of that flow when traversing the underlay network.

Section 2 of this document lists the general requirements for active OAM 
protocols in the Geneve overlay network. IP encapsulation meets these 
requirements and is suitable for encapsulating active OAM protocols within a 
Geneve overlay network. Active OAM messages in a Geneve overlay network are 
exchanged between two Geneve tunnel endpoints, which may be a Network 
Virtualization Edge (NVE) or another device acting as a Geneve tunnel endpoint. 
For simplicity, this document uses an NVE to represent the Geneve tunnel 
endpoint. Refer to [RFC7365] and [RFC8014] for detailed definitions and 
descriptions of an NVE.

The IP encapsulation of Geneve OAM defined in this document applies to an 
overlay service by introducing a Management Virtual Network Identifier (VNI), 
which can be used in combination with various values of the Protocol Type field 
in the Geneve header, such as Ethertypes for IPv4 or IPv6. The analysis and 
definition of other types of OAM encapsulation in Geneve are outside the scope 
of this document.
"

110        *  In-band OAM is an active OAM or hybrid OAM method ([RFC7799]) that
111           traverses the same set of links and interfaces receiving the same
112           QoS treatment as the monitored object, i.e., a Geneve tunnel as a
113           whole or a particular tenant flow within given Geneve tunnel.

GV> In a later section (section 2) this paragraph is used as reference to the 
in-band need for REQUIREMENT#1. It is a bit unusual that a terminology section 
acts a reference into formal requirement. Maybe that should be documented in a 
different place.

GV> From readability perspective, fixing some idnits:

"
In-band OAM is an active or hybrid OAM method, as defined in [RFC7799], that 
traverses the same set of links and interfaces and receives the same Quality of 
Service (QoS) treatment as the monitored object. In this context, the monitored 
object refers to either the Geneve tunnel as a whole or a specific tenant flow 
within a given Geneve tunnel.
"

123     1.1.3.  Acronyms
124
125        Geneve: Generic Network Virtualization Encapsulation
126
127        NVO3: Network Virtualization Overlays
128
129        OAM: Operations, Administration, and Maintenance
130
131        VNI: Virtual Network Identifier

GV> potentially missing acronyms: VNE, ICMP, ICMPv6
GV> more accurate NVO3: Network Virtualization over Layer 3

133     2.  Active OAM Protocols in Geneve Networks

GV> Would it be justified to rename this header to "Requirements for Active OAM 
Protocols in Geneve Networks"

GV> What seems missing from this discussion why only "Active" is discussed and 
not "Passive"?  The draft title is "OAM for use in GENEVE" and that leaves 
potentially room for both active and passive OAM. is it possible to expand a 
little bit about this?

146           Requirement#1: Geneve OAM test packets MUST share the fate with
147           data traffic of the monitored Geneve tunnel, i.e., be in-band
148           (Section 1.1.1) with the monitored traffic, follow the same
149           overlay and transport path as packets with data payload, in the
150           forward direction, i.e. from ingress toward egress endpoint(s) of
151           the OAM test.

GV> rewrite correcting idnits:

"
Requirement 1: Geneve OAM test packets MUST share the same fate as the data 
traffic of the monitored Geneve tunnel. Specifically, the OAM test packets MUST 
be in-band (see Section 1.1.1) with the monitored traffic and follow the same 
overlay and transport path as packets carrying data payloads in the forward 
direction-from the ingress toward the egress endpoint(s) of the OAM test.
"

153        An OAM protocol MAY be used to monitor the particular Geneve tunnel
154        as a whole.  In that case, test packets could be in-band relative to
155        a sub-set of tenant flows transported over the Geneve tunnel.  If the
156        goal is to monitor the condition experienced by the flow of a
157        particular tenant, the test packets MUST be in-band with that
158        specific flow in the Geneve tunnel.  Both scenarios are discussed in
159        detail in Section 2.1.

GV> What does "Geneve tunnel as a whole" exactly mean. Can this be accuratly 
described or referenced?
Would the following describe what was intended to be outlined:

"
An OAM protocol MAY be employed to monitor an entire Geneve tunnel. In this 
case, test packets could be in-band relative to a subset of tenant flows 
transported over the Geneve tunnel. If the goal is to monitor the conditions 
experienced by the flow of a particular tenant, the test packets MUST be 
in-band with that specific flow within the Geneve tunnel. Both scenarios are 
discussed in detail in Section 2.1.
"

161           Requirement#2: The encapsulation of OAM control messages and data
162           packets in the underlay network MUST be indistinguishable from
163           each other from an underlay network IP forwarding point of view.
164
165           Requirement#3: The presence of an OAM control message in the
166           Geneve packet MUST be unambiguously identifiable to Geneve
167           functionality, e.g., at endpoints of Geneve tunnels.
168
169           Requirement#4: OAM test packets MUST NOT be forwarded to a tenant
170           system.

GV> Fixing small idnits:

"
Requirement 2: The encapsulation of OAM control messages and data packets in 
the underlay network MUST be indistinguishable from each other from the 
underlay network IP forwarding point of view.

Requirement 3: The presence of an OAM control message in a Geneve packet MUST 
be unambiguously identifiable to Geneve functionality, such as at endpoints of 
Geneve tunnels.

Requirement 4: OAM test packets MUST NOT be forwarded to a tenant system.
"

172        A test packet generated by an active OAM protocol, either for a
173        defect detection or performance measurement, according to
174        Requirement#1, MUST be in-band (Section 1.1.1) with the tunnel or
175        data flow being monitored.  In an environment where multiple paths
176        through the domain are available, underlay transport nodes can be
177        programmed to use characteristic information to balance the load
178        across known paths.  It is essential that test packets follow the
179        same route, i.e., traverses the same set of nodes and links, as a
180        data packet of the monitored flow.  Thus, the following requirement
181        to support OAM packet fate-sharing with the data flow:
182
183           Requirement#5: It MUST be possible to express entropy for underlay
184           Equal Cost Multipath in the Geneve encapsulation of OAM packets.

GV> I am not sure why here also the (Section 1.1.1) is referenced. This section 
is a terminology section, It looks unusual to be a justification for a formal 
procedure. 

GV> idnits fixing:

"
A test packet generated by an active OAM protocol, whether for defect detection 
or performance measurement, MUST be in-band with the tunnel or data flow being 
monitored, as specified in Requirement 1. In environments where multiple paths 
through the domain are available, underlay transport nodes can be programmed to 
use characteristic information to balance the load across known paths. It is 
essential that test packets follow the same route-that is, traverse the same 
set of nodes and links-as a data packet of the monitored flow. Therefore, the 
following requirement supports OAM packet fate-sharing with the data flow:

Requirement 5: It MUST be possible to express entropy for underlay Equal-Cost 
Multipath in the Geneve encapsulation of OAM packets.
"

186     2.1.  Defect Detection and Troubleshooting in Geneve Network with Active
187           OAM
188
189        This section considers two scenarios where active OAM is used to
190        detect and localize defects in a Geneve network.  Figure 1 presents
191        an example of a Geneve domain.
192
193            +--------+                                             +--------+
194            | Tenant +--+                                     +----| Tenant |
195            | VNI 28 |  |                                     |    | VNI 35 |
196            +--------+  |          ................           |    +--------+
197                        |  +----+  .              .  +----+   |
198                        |  | NVE|--.              .--| NVE|   |
199                        +--| A  |  .              .  | B  |---+
200                           +----+  .              .  +----+
201                           /       .              .
202                          /        .     Geneve   .
203            +--------+   /         .    Network   .
204            | Tenant +--+          .              .
205            | VNI 35 |             .              .
206            +--------+             ................
207                                          |
208                                        +----+
209                                        | NVE|
210                                        | C  |
211                                        +----+
212                                          |
213                                          |
214                                =====================
215                                  |               |
216                              +--------+      +--------+
217                              | Tenant |      | Tenant |
218                              | VNI 28 |      | VNI 35 |
219                              +--------+      +--------+
220
221                    Figure 1: An example of a Geneve domain
222
223        In the first case, consider when a communication problem between
224        Network Virtualization Edge (NVE) device A and NVE C exists.  Upon
225        the investigation, the operator discovers that the forwarding in the
226        IP underlay network is working accordingly.  Still, the Geneve
227        connection is unstable for all NVE A and NVE C tenants.  Detection,
228        troubleshooting, and localization of the problem can be done
229        regardless of the VNI value.
230
231        In the second case, traffic on VNI 35 between NVE A and NVE B has no
232        problems, as on VNI 28 between NVE A and NVE C.  But traffic on VNI
233        35 between NVE A and NVE C experiences problems, for example,
234        excessive packet loss.
235
236        The first case can be detected and investigated using any VNI value,
237        whether it connects tenant systems or not; however, to conform to
238        Requirement#4 (Section 2) OAM test packets SHOULD be transmitted on a
239        VNI that doesn't have any tenants.  Such a Geneve tunnel is dedicated
240        to carrying only control and management data between the tunnel
241        endpoints, hence it is referred to as a Geneve control channel and
242        that VNI is referred to as the Management VNI.  A configured VNI MAY
243        be used to identify the control channel, but it is RECOMMENDED that
244        the default value 1 be used as the Management VNI.  Encapsulation of
245        test packets using the Management VNI is discussed in Section 2.2.
246
247        The control channel of a Geneve tunnel MUST NOT carry tenant data.
248        As no tenants are connected using the control channel, a system that
249        supports this specification, MUST NOT forward a packet received over
250        the control channel to any tenant.  A packet received over the
251        control channel MUST be forwarded if and only if it is sent onto the
252        control channel of the concatenated Geneve tunnel.  Else, it MUST be
253        terminated locally.  The Management VNI SHOULD be terminated on the
254        tenant-facing side of the Geneve encapsulation/decapsulation
255        functionality, not the DC-network-facing side (per definitions in
256        Section 4 of [RFC8014]) so that Geneve encap/decap functionality is
257        included in its scope.  This approach causes an active OAM packet,
258        e.g., an ICMP echo request, to be decapsulated in the same fashion as
259        any other received Geneve packet.  In this example, the resulting
260        ICMP packet is handed to NVE's local management functionality for the
261        processing which generates an ICMP echo reply.  The ICMP echo reply
262        is encapsulated in Geneve as specified in Section 2.2. for forwarding
263        back to the NVE that sent the echo request.  One advantage of this
264        approach is that a repeated ping test could detect an intermittent
265        problem in Geneve encap/decap hardware, which would not be tested if
266        the Management VNI were handled as a "special case" at the DC-
267        network-facing interface.
268
269        The second case is when a test packet is transmitted using the VNI
270        value associated with the monitored service flow.  By doing that, the
271        test packet experiences network treatment as the tenant's packets.
272        Details of that use case are outside the scope of this specification.

GV> Clarification: Are the two scenarios discussed examples or are they formal 
procedures outlined?

GV> The text usually references the REQUIREMENTS with their sections. Is such 
needed? There is just few requirements. Maybe they could at least within this 
document be just referred to as requirement 1, 2, 3 and 4?

GV> I got confused by reading "but it is RECOMMENDED that the default value of 
1 be used as the Management VNI". Why is that? Is that specified in Geneve that 
"1" is the recommended control channel? if yes, maybe add reference.

GV> It is written that "The control channel of a Geneve tunnel MUST NOT carry 
tenant data." While this seems rather intuitive, is there a normative 
reference? or is this a procedure specified by this document above and beyond 
the Geneve encapsulation specification?

GV> Note that RFC8014 speaks about "Facing the Tenant System" and "Facing the 
Data-Center Network" terminology. In the current text in this document 
different terminology is used i.e."data-center-network-facing" and 
"tenant-facing side"

GV> It is written "so that Geneve encap/decap functionality is included in its 
scope". Not sure what this exactly means? It could be that i am confused about 
the paragraph. Is the complete section an example or is this intended as a 
formal procedure? I am not sure i understand the 'its' in the text "included in 
its scope" refers towards. 

GV> Line269-272 details a use-case outside of scope. What is the use of this 
paragraph within this document? I am sure we can think of many more potential 
use-cases that are outside the scope of this document

274     2.2.  OAM Encapsulation in Geneve
275
276        Active OAM over a Management VNI in the Geneve network uses an IP
277        encapsulation.  Protocols such as BFD [RFC5880] and STAMP [RFC8762]
278        use UDP transport.  The destination UDP port number in the inner UDP
279        header (Figure 2) identifies the OAM protocol.  This approach is
280        well-known and has been used, for example, in MPLS networks
281        [RFC8029].  To use IP encapsulation for an active OAM protocol, the
282        Protocol Type field of the Geneve header MUST be set to the IPv4
283        (0x0800) or IPv6 (0x86DD) value.

GV> This section could use some details of the claims of UDP vs BFD vs STAMP vs 
active OAM.
for example, it the assumption that active OAM is different from STAMP or BFD 
or is it the same?

GV> Maybe Figure 2 from RFC9521 could be added to detail BFD over Geneve?

318           Destination IP: The IP address MUST be set to the loopback address
319           127.0.0.1/32 for IPv4, or the loopback address ::1/128 for IPv6
320           [RFC4291].

GV> RFC6890 specifies 127.0.0.0/8 as the loopback range. Why constrain to the 
exact /32? 
Would it not be more logical to indicate that the address MUST be a loopback 
prefix 127.0.0.0/8 and SHOULD be 127.0.0.1/32? 

324           TTL or Hop Limit: MUST be set to 255 per [RFC5082].

GV> Is TTL/hop limit not set to something different when iptrace is used?

GV> I am confused what security the alleged GTSM supposed to provide?

326     3.  Echo Request and Echo Reply in Geneve Tunnel
327
328        ICMP and ICMPv6 ([RFC0792] and [RFC4443] respectively) provide
329        required on-demand defect detection and failure localization.  ICMP
330        control messages immediately follow the inner IP header encapsulated
331        in Geneve.  ICMP extensions for Geneve networks use mechanisms
332        defined in [RFC4884].

GV> Is this the summary intend of this document to provide a formal procedure 
to transport ICMP echo reply/response of a Geneve tunnel?


Many thanks again for this document,

Kind Regards,
Gunter Van de Velde,
RTG AD

_______________________________________________
nvo3 mailing list -- nvo3@ietf.org
To unsubscribe send an email to nvo3-le...@ietf.org

[nvo3] [Shepherding AD review] review of draft-ietf-nvo3-geneve-oam-12

Reply via email to