John Scudder has entered the following ballot position for draft-ietf-rtgwg-net2cloud-problem-statement-41: Discuss
When responding, please keep the subject line intact and reply to all email addresses included in the To and CC lines. (Feel free to cut this introductory paragraph, however.) Please refer to https://www.ietf.org/about/groups/iesg/statements/handling-ballot-positions/ for more information about how to handle DISCUSS and COMMENT positions. The document, along with other ballot positions, can be found here: https://datatracker.ietf.org/doc/draft-ietf-rtgwg-net2cloud-problem-statement/ ---------------------------------------------------------------------- DISCUSS: ---------------------------------------------------------------------- Much of this document seems to be a high-level outline of particular commercial offerings, which among other problems, will not age well. Other parts outline challenges that are already solved, using existing IETF technologies or general remarks about best practices for operating networks. Yet other parts provide brief sketches of other SDOs' technologies or architectures. Overall, I don't think this is a valuable document for the IETF to be publishing as part of the RFC series, and as such I expect to eventually ballot Abstain. I do, however, have a few concerns about the document which warrant a DISCUSSion, first. ## DISCUSS ### This isn't a requirements document; I think that should be made clearer Sometimes the IETF publishes requirements documents, which when issued as RFCs are seen as having some standing to establish that a given technology must be developed or advanced. The present document introduces itself as a problem statement document, but Section 6 is called "requirements". My concern arises because throughout the document there are pointers to places in the IETF (WGs, drafts) where there is work in progress. I would prefer to avoid any ambiguity down the road, as to whether these citations are just for the information of the reader as examples, or something more. I'm open to solutions, but perhaps something like this, as a final paragraph of the introduction? NEW: This document provides references to IETF working groups and Internet Drafts that relate to the subject. These references are provided as examples and for the information of the reader, and should not be interpreted as requiring the adoption or implementation of any particular solution. Certain high-level requirements are presented in Section 6; these requirements are agnostic as to what solutions should fulfill them. To be clear, my concern is that the document can easily be read as privileging a certain set of solutions. Those might be the best solutions, I don't know, but I don't think it is the place of a problem statement or requirements document to mandate solutions. ### Inscrutable paragraph in Section 3.1 2. Section 3.1 includes the following paragraph: - A Cloud DC GW typically has multiple eBGP sessions with various clients and sets a route limit for each one. Therefore, on- premises data center gateways with eBGP sessions to the Cloud DC GW should configure default routes and filter out as many routes as possible, replacing them with a default route in their eBGP advertisements. This approach minimizes the number of routes exchanged with the Cloud DC eBGP peers. I simply can't understand what this paragraph is telling me to do. This would be partly remedied -- and the document improved overall -- if there were an earlier section providing a reference model and defining terms such as "Cloud DC GW", and illustrating the flow of routing information between elements. Since there is no such model, and since the prose quoted isn't clear, the reader is left to use their imagination, which is the opposite of what we strive for in our RFCs. I would suggest a rewrite but I can't discern even enough of your intent to offer one, I'm sorry. I guess my imagination has failed me. ### Section 3.2, no IGP As described in [RFC7938], a Cloud DC might not have an IGP to route around link/node failures within its domain. Are you saying that because there's no IGP the Cloud DC can't route around failures? Surely not, this is the opposite of what RFC 7938 describes. But it's sure what it sounds like. When a site failure happens, the Cloud DC GW visible to clients is running fine; therefore, the site failure is not detectable by the clients using Bidirectional Forwarding Detection (BFD)[RFC5880]. This doesn't make any sense to me. Again, perhaps a reference model showing the relationship of a "Cloud DC GW", a "site", where BFD would be running, etc, might have helped. ---------------------------------------------------------------------- COMMENT: ---------------------------------------------------------------------- ## COMMENT ### Section 3.1, Capability Mismatch I don't understand what this means: Capability mismatch can cause BGP sessions not being adequately established. The "mitigation practices" basically amount to "follow the relevant standards". Is the quoted text trying to say something like "implementations that have bugs or don't follow the standards may not work right"? Generally, we don't need an RFC to say that, it's akin to the classic "MUST NOT write bugs". ### Section 3.2, Huge number... problem When a site failure occurs, many services can be impacted. When the impacted services' IP prefixes in a Cloud DC are not aggregated nicely, which is common, one single site failure can trigger a huge number of BGP UPDATE messages. There are proposals, such as [METADATA-PATH], to enhance BGP advertisements to address this problem. Is there some supporting evidence that the O(N) nature of BGP convergence is a "problem" in this context? I mean, sure, O(1) is nicer than O(N), but there are many O(N) operations we choose not to optimize because they don't need optimizing. I haven't seen evidence presented that convinces me this needs optimizing. Rather than debate this point, one possible way to address it would be to reword in some more factual way, such as, NEW: When a site failure occurs, many services can be impacted. When the impacted services' IP prefixes in a Cloud DC are not aggregated nicely, which is common, one single site failure can trigger multiple BGP UPDATE messages. There are proposals, such as [METADATA-PATH], to enhance BGP advertisements to reduce the number of messages required. ### Section 3.4, UEs can move Here are some network problems with connecting to the services in the 5G Edge Clouds: ... 3) Source (UEs) can ingress from different LDN Ingress routers due to mobility. How is that a "problem"? ### Section 6, IPSec requirement - Should support scalable IPsec key management among all nodes involved in DC interconnect schemes. But you don't say that it's a requirement for a solution to be IPSec-based at all. For a solution that isn't IPSec-based, this requirement is moot. Perhaps, NEW: - Should support scalable IPsec key management among all nodes involved in DC interconnect schemes, if IPSec is used as a VPN technology. ### Section 6, AZ - Should support traffic steering to distribute loads across regions/AZs based on performance/availability of workloads in You've never defined "AZ". Please do, or remove. ### Section 7, anti-DDoS a) Potential DDoS (Distributed Denial of Service) attack to the ports facing the untrusted network (e.g., the public internet), which may propagate to the cloud edge resources. To mitigate such security risk, it is necessary for the ports facing internet to enable Anti-DDoS features. Can you be specific about what "anti-DDoS features" are? You make it sound as though there's some way to configure "port xyz1/2 no ddos" and the problem goes away. To my knowledge, such "anti-DDoS features" don't exist. If they do, please cite examples. If they don't, something about this needs to change; minimally, delete the "to mitigate" sentence. _______________________________________________ rtgwg mailing list -- rtgwg@ietf.org To unsubscribe send an email to rtgwg-le...@ietf.org