John Scudder's Discuss on draft-ietf-rtgwg-net2cloud-problem-statement-41: (with DISCUSS and COMMENT)

John Scudder via Datatracker Wed, 18 Sep 2024 17:53:14 -0700

John Scudder has entered the following ballot position for
draft-ietf-rtgwg-net2cloud-problem-statement-41: Discuss


When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to 
https://www.ietf.org/about/groups/iesg/statements/handling-ballot-positions/ 
for more information about how to handle DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-rtgwg-net2cloud-problem-statement/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

Much of this document seems to be a high-level outline of particular commercial
offerings, which among other problems, will not age well. Other parts outline
challenges that are already solved, using existing IETF technologies or general
remarks about best practices for operating networks. Yet other parts provide
brief sketches of other SDOs' technologies or architectures. Overall, I don't
think this is a valuable document for the IETF to be publishing as part of the
RFC series, and as such I expect to eventually ballot Abstain.

I do, however, have a few concerns about the document which warrant a
DISCUSSion, first.

## DISCUSS

### This isn't a requirements document; I think that should be made clearer

Sometimes the IETF publishes requirements documents, which when issued as RFCs
are seen as having some standing to establish that a given technology must be
developed or advanced. The present document introduces itself as a problem
statement document, but Section 6 is called "requirements".

My concern arises because throughout the document there are pointers to places
in the IETF (WGs, drafts) where there is work in progress. I would prefer to
avoid any ambiguity down the road, as to whether these citations are just for
the information of the reader as examples, or something more. I'm open to
solutions, but perhaps something like this, as a final paragraph of the
introduction?

NEW:
   This document provides references to IETF working groups and Internet
   Drafts that relate to the subject. These references are provided as
   examples and for the information of the reader, and should not be
   interpreted as requiring the adoption or implementation of any
   particular solution. Certain high-level requirements are presented
   in Section 6; these requirements are agnostic as to what solutions
   should fulfill them.

To be clear, my concern is that the document can easily be read as privileging
a certain set of solutions. Those might be the best solutions, I don't know,
but I don't think it is the place of a problem statement or requirements
document to mandate solutions.

### Inscrutable paragraph in Section 3.1

2. Section 3.1 includes the following paragraph:

     - A Cloud DC GW typically has multiple eBGP sessions with various
        clients and sets a route limit for each one. Therefore, on-
        premises data center gateways with eBGP sessions to the Cloud
        DC GW should configure default routes and filter out as many
        routes as possible, replacing them with a default route in
        their eBGP advertisements. This approach minimizes the number
        of routes exchanged with the Cloud DC eBGP peers.

I simply can't understand what this paragraph is telling me to do. This would
be partly remedied -- and the document improved overall -- if there were an
earlier section providing a reference model and defining terms such as "Cloud
DC GW", and illustrating the flow of routing information between elements.
Since there is no such model, and since the prose quoted isn't clear, the
reader is left to use their imagination, which is the opposite of what we
strive for in our RFCs.

I would suggest a rewrite but I can't discern even enough of your intent to
offer one, I'm sorry. I guess my imagination has failed me.

### Section 3.2, no IGP

   As described in [RFC7938], a Cloud DC might not have an IGP to route
   around link/node failures within its domain.

Are you saying that because there's no IGP the Cloud DC can't route around
failures? Surely not, this is the opposite of what RFC 7938 describes. But it's
sure what it sounds like.

   When a site failure
   happens, the Cloud DC GW visible to clients is running fine;
   therefore, the site failure is not detectable by the clients using
   Bidirectional Forwarding Detection (BFD)[RFC5880].

This doesn't make any sense to me. Again, perhaps a reference model showing the
relationship of a "Cloud DC GW", a "site", where BFD would be running, etc,
might have helped.


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

## COMMENT

### Section 3.1, Capability Mismatch

I don't understand what this means:

   Capability
   mismatch can cause BGP sessions not being adequately established.

The "mitigation practices" basically amount to "follow the relevant standards".
Is the quoted text trying to say something like "implementations that have bugs
or don't follow the standards may not work right"? Generally, we don't need an
RFC to say that, it's akin to the classic "MUST NOT write bugs".

### Section 3.2, Huge number... problem

   When a site failure occurs, many services can be impacted. When the
   impacted services' IP prefixes in a Cloud DC are not aggregated
   nicely, which is common, one single site failure can trigger a huge
   number of BGP UPDATE messages. There are proposals, such as
   [METADATA-PATH], to enhance BGP advertisements to address this
   problem.

Is there some supporting evidence that the O(N) nature of BGP convergence is a
"problem" in this context? I mean, sure, O(1) is nicer than O(N), but there are
many O(N) operations we choose not to optimize because they don't need
optimizing. I haven't seen evidence presented that convinces me this needs
optimizing.

Rather than debate this point, one possible way to address it would be to
reword in some more factual way, such as,

NEW:
   When a site failure occurs, many services can be impacted. When the
   impacted services' IP prefixes in a Cloud DC are not aggregated
   nicely, which is common, one single site failure can trigger multiple
   BGP UPDATE messages. There are proposals, such as
   [METADATA-PATH], to enhance BGP advertisements to reduce the number
   of messages required.

### Section 3.4, UEs can move

   Here are some network problems with connecting to the services in
   the 5G Edge Clouds:
...
       3) Source (UEs) can ingress from different LDN Ingress routers
          due to mobility.

How is that a "problem"?

### Section 6, IPSec requirement

     - Should support scalable IPsec key management among all nodes
        involved in DC interconnect schemes.

But you don't say that it's a requirement for a solution to be IPSec-based at
all. For a solution that isn't IPSec-based, this requirement is moot. Perhaps,

NEW:
     - Should support scalable IPsec key management among all nodes
        involved in DC interconnect schemes, if IPSec is used as
        a VPN technology.

### Section 6, AZ

     - Should support traffic steering to distribute loads across
        regions/AZs based on performance/availability of workloads in

You've never defined "AZ". Please do, or remove.

### Section 7, anti-DDoS

        a) Potential DDoS (Distributed Denial of Service) attack to the
        ports facing the untrusted network (e.g., the public internet),
        which may propagate to the cloud edge resources. To mitigate
        such security risk, it is necessary for the ports facing
        internet to enable Anti-DDoS features.

Can you be specific about what "anti-DDoS features" are? You make it sound as
though there's some way to configure "port xyz1/2 no ddos" and the problem goes
away. To my knowledge, such "anti-DDoS features" don't exist. If they do,
please cite examples. If they don't, something about this needs to change;
minimally, delete the "to mitigate" sentence.



_______________________________________________
rtgwg mailing list -- rtgwg@ietf.org
To unsubscribe send an email to rtgwg-le...@ietf.org

John Scudder's Discuss on draft-ietf-rtgwg-net2cloud-problem-statement-41: (with DISCUSS and COMMENT)

Reply via email to