[rtgwg] Re: Request for Another Round of Directorate Review for draft-ietf-rtgwg-net2cloud-problem-statement

Chongfeng Xie Thu, 23 Jan 2025 20:13:02 -0800

Hi  rtgwg WG,

Using VPN to connect resources of third-party cloud service provider is one 
typical provisioning method for operators to provide service to enterprises. 
This document links multiple individual IETF drafts, providing a holistic view 
of the potential mitigation approaches, I think the document is valuable to 
describe a comprehensive problem statement covering networking challenges that 
enterprises face when integrating existing VPN infrastructures with in 
third-party cloud data centers (Cloud DCs).  it is especially useful for Telcom 
service providers, so I hope it can be published soon.


I also have minor suggestions to its authors,

1. Section 3.4  mentions "Network Issues for 5G Edge Clouds and Mitigation 
Methods",
    In our practices, Edge Clouds are also deployed in wireline metro network, 
not only 5G networks, so is it possible to expand the scope of this case?

2. Section 3.6 disucsses "NAT Practices for Accessing Cloud Services"
    I think IPv6 is the one approach, maybe the most fundermental one, to solve 
NAT issues for accessing cloud serivces, so can the authors add one sentence in 
this section?

Best regards
Chongfeng


 
From: Yingzhen Qu
Date: 2025-01-21 02:09
To: Linda Dunbar
CC: rtgwg-chairs; [email protected]
Subject: [rtgwg] Re: Request for Another Round of Directorate Review for 
draft-ietf-rtgwg-net2cloud-problem-statement
Hi Linda,

Let's allow the WG some time to review and comment on the latest version.

Also please make sure you address the comments from the IESG reviews. A couple 
of examples below (not a complete list):
https://mailarchive.ietf.org/arch/msg/rtgwg/zJpp9slvS6IcViOh5HQmVqwXmzQ/
https://mailarchive.ietf.org/arch/msg/rtgwg/XVkwbzg-OrdLlwTaFIxUIa3taiQ/

Thanks,
Yingzhen


On Fri, Jan 17, 2025 at 9:32 AM Linda Dunbar <[email protected]> wrote:
Ying Zhen and Jeff, 
The IESG returned draft-ietf-rtgwg-net2cloud-problem-statement-41 to the 
working group for further review and revisions. The document has since been 
updated to address the comments raised during the IESG review: 
https://datatracker.ietf.org/doc/draft-ietf-rtgwg-net2cloud-problem-statement/ 
To ensure that the latest revision fully resolves the concerns raised, I would 
like to request another round of directorate review. This will help confirm 
that all feedback has been adequately addressed before proceeding further in 
the publication process.
Please let me know if this can be arranged and if there are any additional 
steps required from my end.
Thank you very much, 
Linda 
 
 
From: James Guichard <[email protected]> 
Sent: Thursday, September 19, 2024 11:07 AM
To: Linda Dunbar <[email protected]>; jmh.direct 
<[email protected]>
Subject: Re: Net2cloud problem statement
 
Yes, that is the whole point. The WG needs to have that discussion. However, I 
would talk with the chairs first to see how they want to handle it.
 
Jim
 
From: Linda Dunbar <[email protected]>
Date: Thursday, September 19, 2024 at 2:06 PM
To: James Guichard <[email protected]>, jmh.direct 
<[email protected]>
Subject: RE: Net2cloud problem statement
Jim, 
 
As the document is returned to the WG, is it okay  for me to get feedback from 
the WG if  the proposed resolutions can address your reasons for returning back 
to the WG? 
 
Linda 
 
From: James Guichard <[email protected]> 
Sent: Thursday, September 19, 2024 11:00 AM
To: Linda Dunbar <[email protected]>; jmh.direct 
<[email protected]>
Subject: Re: Net2cloud problem statement
 
I don’t think that is necessary as the document is no longer under IESG 
evaluation.
 
Jim
 
From: Linda Dunbar <[email protected]>
Date: Thursday, September 19, 2024 at 1:59 PM
To: James Guichard <[email protected]>, jmh.direct 
<[email protected]>
Subject: RE: Net2cloud problem statement
Jim, 
 
Should I still send the reply to John to show that his comments & suggestions 
are not ignored? 
 
Linda
 
From: James Guichard <[email protected]> 
Sent: Thursday, September 19, 2024 10:29 AM
To: Linda Dunbar <[email protected]>; jmh.direct 
<[email protected]>
Subject: Re: Net2cloud problem statement
 
Hi Linda,
 
I have returned the document to the WG with comments. There are some 
fundamental issues that the WG needs to consider before trying to put the 
document through the IESG again. 
 
Thanks!
 
Jim
 
From: Linda Dunbar <[email protected]>
Date: Thursday, September 19, 2024 at 1:27 PM
To: James Guichard <[email protected]>, jmh.direct 
<[email protected]>
Subject: RE: Net2cloud problem statement
Jim, 
 
Thank you very much for the advice. 
 
Could you please review the following replies to ensure they adequately address 
John's comments and concerns? If not, I would appreciate any advice on how to 
better alleviate his concerns. 
 
Linda
 
----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------
 
Much of this document seems to be a high-level outline of particular commercial
offerings, which among other problems, will not age well. Other parts outline
challenges that are already solved, using existing IETF technologies or general
remarks about best practices for operating networks. Yet other parts provide
brief sketches of other SDOs' technologies or architectures. Overall, I don't
think this is a valuable document for the IETF to be publishing as part of the
RFC series, and as such I expect to eventually ballot Abstain.
 
[Linda] This document is intended as an informational RFC, aimed at providing a 
comprehensive understanding of the challenges and mitigation methods (developed 
by IETF) for enterprises connecting their existing VPNs to cloud networks. 
While it acknowledges related work from other SDOs as references, its focus 
remains on addressing specific concerns within the IETF’s scope. The individual 
IETF solution drafts derived from this document focus on smaller, specific 
aspects of enterprise-cloud connectivity. However, this document links those 
aspects together, providing a holistic view of the entire problem space.
 
 
I do, however, have a few concerns about the document which warrant a
DISCUSSion, first.
 
## DISCUSS
 
### This isn't a requirements document; I think that should be made clearer
 
 
Sometimes the IETF publishes requirements documents, which when issued as RFCs
are seen as having some standing to establish that a given technology must be
developed or advanced. The present document introduces itself as a problem
statement document, but Section 6 is called "requirements".
 
[Linda] Does remove the Section 6 (requirement section) address your concern?
 
My concern arises because throughout the document there are pointers to places
in the IETF (WGs, drafts) where there is work in progress. I would prefer to
avoid any ambiguity down the road, as to whether these citations are just for
the information of the reader as examples, or something more. I'm open to
solutions, but perhaps something like this, as a final paragraph of the
introduction?
 
[Linda] The citations of IETF drafts throughout the document are intended 
purely for informational purposes, providing examples and context for the 
reader.
 
 
NEW:
   This document provides references to IETF working groups and Internet
   Drafts that relate to the subject. These references are provided as
   examples and for the information of the reader, and should not be
   interpreted as requiring the adoption or implementation of any
   particular solution. Certain high-level requirements are presented
   in Section 6; these requirements are agnostic as to what solutions
   should fulfill them.
[Linda] Thank you for the suggestion. We can add your proposed wording to 
Section 1 (Introduction)
 
 
To be clear, my concern is that the document can easily be read as privileging
a certain set of solutions. Those might be the best solutions, I don't know,
but I don't think it is the place of a problem statement or requirements
document to mandate solutions.
 
[Linda]  Do you think adding the following sentence to the Introduction section 
can address your concern? 
“While this document discusses potential mitigation practices, it does not aim 
to prescribe or mandate specific solutions; rather, it presents various options 
that may address the identified challenges, allowing further exploration and 
consideration of alternative approaches.”
 
### Inscrutable paragraph in Section 3.1
 
2. Section 3.1 includes the following paragraph:
 
     - A Cloud DC GW typically has multiple eBGP sessions with various
        clients and sets a route limit for each one. Therefore, on-
        premises data center gateways with eBGP sessions to the Cloud
        DC GW should configure default routes and filter out as many
        routes as possible, replacing them with a default route in
        their eBGP advertisements. This approach minimizes the number
        of routes exchanged with the Cloud DC eBGP peers.
 
I simply can't understand what this paragraph is telling me to do. This would
be partly remedied -- and the document improved overall -- if there were an
earlier section providing a reference model and defining terms such as "Cloud
DC GW", and illustrating the flow of routing information between elements.
Since there is no such model, and since the prose quoted isn't clear, the
reader is left to use their imagination, which is the opposite of what we
strive for in our RFCs.
 
I would suggest a rewrite but I can't discern even enough of your intent to
offer one, I'm sorry. I guess my imagination has failed me.
 
[Linda]  Do you think the following rewrite is clearer? 
 
“-A Cloud DC GW typically establishes multiple eBGP sessions with many clients. 
Each session is configured with a maximum number of routes it can handle. To 
avoid exceeding this limit, which could lead to the Cloud DC GW dropping 
routes, on-premises data center gateways should simplify their route 
advertisements by filtering unnecessary routes and using a default route 
instead. This practice minimizes the volume of routing information exchanged 
between on-premises data centers and Cloud DCs, thereby preventing the unwanted 
dropping of routes when the configured maximum for a client is exceeded.”
 
### Section 3.2, no IGP
 
   As described in [RFC7938], a Cloud DC might not have an IGP to route
   around link/node failures within its domain.
 
Are you saying that because there's no IGP the Cloud DC can't route around
failures? Surely not, this is the opposite of what RFC 7938 describes. But it's
sure what it sounds like.
 
[Linda] no, the intent is to say that when IGP is not running, other methods 
are used by Cloud DC to detect the faults. 
 
How about changing the paragraph to the following? 
 
Old: 
“As described in [RFC7938], a Cloud DC might not have an IGP to route around 
link/node failures within its domain. When a site failure happens, the Cloud DC 
GW visible to clients is running fine; therefore, the site failure is not 
detectable by the clients using Bidirectional Forwarding Detection 
(BFD)[RFC5880].”  
 
New: 
“As described in [RFC7938], a Cloud DC may not run IGP within its domain, 
instead, it relies on internal methods to detect and report faults, which 
differ from standardized protocols like BFD or IGP. In the event of a site 
failure, while the Cloud DC GW visible to clients continues to operate 
normally, the failure remains undetected by clients relying on BFD [RFC5880]. 
Since BFD is not running within the Cloud DC, the GW cannot simply extend or 
concatenate BFD sessions to external peers.”
 
 
 
   When a site failure
   happens, the Cloud DC GW visible to clients is running fine;
   therefore, the site failure is not detectable by the clients using
   Bidirectional Forwarding Detection (BFD)[RFC5880].
 
This doesn't make any sense to me. Again, perhaps a reference model showing the
relationship of a "Cloud DC GW", a "site", where BFD would be running, etc,
might have helped.
 
[Linda] are the above proposed new text more clear? 
 
----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------
 
## COMMENT
 
### Section 3.1, Capability Mismatch
 
I don't understand what this means:
 
   Capability
   mismatch can cause BGP sessions not being adequately established.
 
The "mitigation practices" basically amount to "follow the relevant standards".
Is the quoted text trying to say something like "implementations that have bugs
or don't follow the standards may not work right"? Generally, we don't need an
RFC to say that, it's akin to the classic "MUST NOT write bugs".
 
[Linda] (by the co-author from Azure:) Cloud DCs often peer with a 
significantly larger number of enterprises, some of which may lack sufficient 
BGP expertise. As a result, they frequently encounter BGP peering errors, 
including capability mismatches, unwanted route leaks, missing Keepalives, and 
session resets.  Simply saying "follow IETF standards" is not enough. For 
example, when a Cloud GW receives inbound routes exceeding the maximum routes 
from a peer, the current practice is to generate out-of-band alerts (e.g., 
Syslog entries) via the management system or to terminate the BGP session. 
Those practices are not preferred. A more operation-friendly approach would be 
for peers to reduce the number of routes they are advertising, like "route 
threshold crossing" alert mechanism. 
 
 
### Section 3.2, Huge number... problem
 
   When a site failure occurs, many services can be impacted. When the
   impacted services' IP prefixes in a Cloud DC are not aggregated
   nicely, which is common, one single site failure can trigger a huge
   number of BGP UPDATE messages. There are proposals, such as
   [METADATA-PATH], to enhance BGP advertisements to address this
   problem.
 
Is there some supporting evidence that the O(N) nature of BGP convergence is a
"problem" in this context? I mean, sure, O(1) is nicer than O(N), but there are
many O(N) operations we choose not to optimize because they don't need
optimizing. I haven't seen evidence presented that convinces me this needs
optimizing.
 
[Linda] It is more about huge number of BGP UPDATE messages triggered by a POD 
losing power or encountering fiber cut. 
 
Rather than debate this point, one possible way to address it would be to
reword in some more factual way, such as,
 
NEW:
   When a site failure occurs, many services can be impacted. When the
   impacted services' IP prefixes in a Cloud DC are not aggregated
   nicely, which is common, one single site failure can trigger multiple
   BGP UPDATE messages. There are proposals, such as
   [METADATA-PATH], to enhance BGP advertisements to reduce the number
   of messages required.
 
[Linda] Thank you for the suggested wording. We can change per your suggestion. 
 
 
### Section 3.4, UEs can move
 
   Here are some network problems with connecting to the services in
   the 5G Edge Clouds:
...
       3) Source (UEs) can ingress from different LDN Ingress routers
          due to mobility.
 
How is that a "problem"?
 
[Linda] This is a routing problem when the ingress router changes for the same 
service. Do you think the following wording is better? 
“Due to user mobility, sources (UEs) can ingress from different LDN Ingress 
routers, presenting a routing challenge.”
 
### Section 6, IPSec requirement
 
     - Should support scalable IPsec key management among all nodes
        involved in DC interconnect schemes.
 
But you don't say that it's a requirement for a solution to be IPSec-based at
all. For a solution that isn't IPSec-based, this requirement is moot. Perhaps,
 
NEW:
     - Should support scalable IPsec key management among all nodes
        involved in DC interconnect schemes, if IPSec is used as
        a VPN technology.
 
[Linda] Thank you very much for the suggested wording. We will change per your 
suggestion. 
 
 
### Section 6, AZ
 
     - Should support traffic steering to distribute loads across
        regions/AZs based on performance/availability of workloads in
 
You've never defined "AZ". Please do, or remove.
 
[Linda] AZ for Availability Zone. 
 
 
### Section 7, anti-DDoS
 
        a) Potential DDoS (Distributed Denial of Service) attack to the
        ports facing the untrusted network (e.g., the public internet),
        which may propagate to the cloud edge resources. To mitigate
        such security risk, it is necessary for the ports facing
        internet to enable Anti-DDoS features.
 
Can you be specific about what "anti-DDoS features" are? You make it sound as
though there's some way to configure "port xyz1/2 no ddos" and the problem goes
away. To my knowledge, such "anti-DDoS features" don't exist. If they do,
please cite examples. If they don't, something about this needs to change;
minimally, delete the "to mitigate" sentence.
 
[Linda] Do you think adding the following detailed the Anti-DDoS practices 
would be sufficient? 
“There are many Anti-DDoS features to consider. Some examples include: Rate 
Limiting, Access Control Lists (ACLs), Deep Packet Inspection (DPI), 
Blackholing and Sinkholing (which route malicious traffic to a non-existent IP 
address or a system that safely absorbs or analyzes the traffic), Traffic 
Scrubbing, and Geo-IP Blocking. “
 
 
 
From: James Guichard <[email protected]> 
Sent: Thursday, September 19, 2024 5:42 AM
To: jmh.direct <[email protected]>
Cc: Linda Dunbar <[email protected]>
Subject: Re: Net2cloud problem statement
 
One further thing and this worried me when I did my review. I think that all 
requirements need to be taken out of the document and refocus it to problem 
statement only. Once all the reviews are in we should probably have a call to 
discuss.
 
Jim
 
From: James Guichard <[email protected]>
Date: Thursday, September 19, 2024 at 8:26 AM
To: jmh.direct <[email protected]>
Cc: Linda Dunbar <[email protected]>
Subject: Re: Net2cloud problem statement
Hi Joel,
 
I think he raises good points and in fact I raised the same during my review. I 
think if the authors provide some clarity on his DISCUSS points they should be 
able to overcome his objections. If he abstains because he does not the think 
the document is valuable then that is fine.
 
Jim
 
From: jmh.direct <[email protected]>
Date: Wednesday, September 18, 2024 at 10:38 PM
To: James Guichard <[email protected]>
Cc: Linda Dunbar <[email protected]>
Subject: Net2cloud problem statement
I am not at all sure what to make of John and Pail's concerns.  Suggestions?
Tganks,
Joel
 
 
 
Sent via the Samsung Galaxy S20 FE 5G, an AT&T 5G smartphone

_______________________________________________
rtgwg mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[rtgwg] Re: Request for Another Round of Directorate Review for draft-ietf-rtgwg-net2cloud-problem-statement

Reply via email to