Gunter Van de Velde has entered the following ballot position for draft-ietf-rtgwg-net2cloud-problem-statement-41: Discuss
When responding, please keep the subject line intact and reply to all email addresses included in the To and CC lines. (Feel free to cut this introductory paragraph, however.) Please refer to https://www.ietf.org/about/groups/iesg/statements/handling-ballot-positions/ for more information about how to handle DISCUSS and COMMENT positions. The document, along with other ballot positions, can be found here: https://datatracker.ietf.org/doc/draft-ietf-rtgwg-net2cloud-problem-statement/ ---------------------------------------------------------------------- DISCUSS: ---------------------------------------------------------------------- # Gunter Van de Velde, RTG AD, comments for draft-ietf-rtgwg-net2cloud-problem-statement-41 # Thanks for writing up this work to make cloud DCs more mainstream and connected to enterprises and to open the discussion on routing aspects. # Please find the following blocking DISCUSS observations when processing the draft and some non-blocking comments. #DISCUSS #======= # I support the DISCUSS from John and Paul. (1) requirements do not belong in a use-case document (2) there is information in the document which will not age well # [DISCUSS1] Section 3 is not complete from a Connecting to Cloud DC Routing issues perspective. Connecting to cloud data centers presents various routing challenges, including scalability, security, latency, routing policy consistency, and multi-cloud complexity. Enterprises need to carefully plan and manage their routing architecture to ensure reliable, efficient, and secure connections between on-premises infrastructure and cloud data centers. Solutions like dedicated connections, BGP security enhancements, and dynamic routing policies can help mitigate some of these challenges, but they also add complexity to the overall network architecture. I believe that a use-case document should address or at least position all of these. When focused on a small subset then the bigger picture may be lost. Connecting to and between Cloud DCs is a multi-dimensional complex routing aware problem space. See my note [DISCUSS1] below # [DISCUSS2] Cloud DC implications on security considerations is not complete. There are many aspects to consider. See note [DISCUSS2]. Topics are for example Encryption of Data in Transit, Authentication and Access Control, Secure Routing Protocols, Network Segmentation, Data Encryption at Rest, Visibility and Monitoring, DDoS Protection, Firewalls and Security Groups, Zero Trust Security Model, Compliance and Regulatory Considerations, Network Access Control, Patch Management and Vulnerability Scanning, Distributed Workloads and Traffic Control and an Incident Response Plan # [DISCUSS3] Vendor specific Cloud DC products and explicit behaviors are documented in this document. IETF documents should be vendor agnostic, especially when very specific behaviors are documented. Vendor behavior will change over time making the information provided in the draft stale, outdated and potentially harmful to the referenced (unaware) cloud DC vendors ---------------------------------------------------------------------- COMMENT: ---------------------------------------------------------------------- #DETAILED COMMENTS #================= ## classified as [minor] and [major] 72 3. Issues and Mitigation Methods of Connecting to Cloud DCs.......4 73 3.1. Increased BGP Peering Errors and Mitigation Methods.......4 74 3.2. Site Failures and Methods to Minimize Impacts.............6 75 3.3. Limitations of DNS-based Cloud DC Location Selection......6 76 3.4. Network Issues for 5G Edge Clouds and Mitigation Methods..7 77 3.5. DNS Practices for Hybrid Workloads........................8 78 3.6. NAT Practices for Accessing Cloud Services................9 79 3.7. Cloud Discovery Practices................................10 80 4. Dynamic Connecting Enterprise Sites with Cloud DCs............10 81 4.1. Sites to Cloud DC........................................11 82 4.2. Inter-Cloud Connection...................................13 83 4.3. Extending Private VPNs to Hybrid Cloud DCs...............14 84 5. Methods to Scale IPsec Tunnels to Cloud DCs...................15 85 5.1. Scale IPsec Tunnels Management...........................16 86 5.2. CPEs Interconnection Over the Public Internet............16 ..... 98 1. Introduction 99 With the advent of widely available Cloud data centers (DCs) 100 providing services in various geographic locations and advanced 101 tools for monitoring and predicting application behaviors, it is 102 tempting for enterprises to instantiate applications and workloads 103 in Cloud DCs. Some enterprises prefer specific applications to be 104 located close to the end users accessing these services, as the 105 proximity can improve end-to-end latency. In addition, applications 106 and workloads in Cloud DCs can be shut down or moved along with end 107 users in motion thereby modifying the networking connection of 108 subsequently relocated applications and workloads. 109 Cloud services are typically on-demand and designed to be scalable, 110 highly available, and billed based on usage. Most Cloud Operators 111 offer various network functions, such as virtual Firewall services, 112 virtual private clouds services, and virtual Private Branch eXchange 113 (PBX) services, including voice and video conferencing systems. A 114 Cloud DC is a shared infrastructure that hosts services for multiple 115 customers. 116 This document describes the network-related problems enterprises 117 face at the time of writing this document when interconnecting their 118 branch offices with dynamic workloads in Cloud DCs and the 119 mitigation practices to get around those problems. [major] Cloud data centers offer numerous benefits, but they also have several downsides or challenges that organizations need to consider. While cloud data centers offer scalability, flexibility, and cost-efficiency, organizations must weigh these benefits against potential downsides such as security risks, unpredictable costs, limited control, and regulatory compliance challenges. It is not just about network access to Cloud DCs. Some of the suggested key downsides are not strictly of a routing technical nature, while others are, and these have not been adequately addressed in Section 3. Including these issues in the document, along with an explicit indication of which are within scope and which are outside scope, will provide greater clarity to readers and enhance their understanding of the problem space being discussed: 1. Security and Privacy Concerns: * Data Breaches: Storing sensitive data in cloud environments increases the risk of data breaches, as cloud data centers are prime targets for cyberattacks. Organizations may face issues of unauthorized access if cloud security is compromised. * Shared Responsibility: In cloud environments, security is a shared responsibility between the cloud provider and the customer. Misconfigurations or failures on either side can lead to vulnerabilities. * Data Sovereignty: Data stored in cloud data centers may be subject to the laws and regulations of the country where the data center is located, which can lead to compliance issues regarding data privacy. 2. Downtime and Availability: * Service Outages: Even the most reliable cloud providers can experience downtime, which can lead to service disruptions for organizations relying on cloud infrastructure. High availability is typically guaranteed, but 100% uptime is rarely achieved. * Network Latency: Cloud data centers are remote, so applications that require low-latency performance might face challenges, especially if the data center is far from end-users. 3. Cost Management: * Unpredictable Costs: While cloud services are often marketed as cost-effective, costs can quickly add up if resources are not properly managed. Unexpected charges for data egress, scaling, or additional services can lead to budget overruns. * Long-Term Costs: Over time, running workloads in the cloud might be more expensive than on-premises solutions, particularly for organizations with steady and predictable workloads. 4. Lack of Control: * Limited Customization: Cloud services typically offer standardized environments, which may limit an organization’s ability to customize infrastructure or configurations to meet specific needs. This lack of control can be problematic for highly specialized applications. * Vendor Lock-In: Many cloud providers offer proprietary services that can make it difficult or costly to migrate to another provider or move workloads back on-premises. 5. Data Transfer and Performance Issues: * Data Transfer Costs: Transferring large volumes of data to and from the cloud can be expensive and time-consuming, particularly when dealing with bandwidth limitations or the cost of data egress. * Performance Variability: In multi-tenant cloud environments, performance can fluctuate depending on the overall usage of resources by other clients. This can impact critical workloads if performance varies unexpectedly. 6. Compliance and Legal Issues: * Regulatory Compliance: Organizations in highly regulated industries (e.g., healthcare, finance) must ensure that their use of cloud services complies with specific regulations such as GDPR, HIPAA, or PCI-DSS. Ensuring compliance can be complicated by the global nature of cloud data centers. * Data Jurisdiction: Storing data in foreign cloud data centers might expose organizations to jurisdictional issues, where the data becomes subject to foreign laws and regulations. 7. Dependence on Internet Connectivity: * Connectivity Issues: Cloud services require reliable internet access. If an organization experiences internet outages or slow connectivity, access to cloud-hosted applications and data may be compromised, impacting productivity. 8. Complexity in Hybrid Environments: * Integration Challenges: Managing hybrid cloud environments (where some resources are in the cloud and others on-premises) can be complex, especially when it comes to data synchronization, security policies, and monitoring. 150 SD-WAN An overlay connectivity service that optimizes transport 151 of IP Packets over one or more Underlay Connectivity 152 Services by recognizing applications (Application Flows) 153 and determining forwarding behavior by applying Policies 154 to them. [MEF-70.1] [major] fails to say that SD-WAN stands for "Software-Defined Wide Area Network" Maybe the following could be added to describe what SD-WAN is: " SD-WAN (Software-Defined Wide Area Network) is a networking technology that simplifies the management and operation of a wide area network (WAN) by decoupling the network hardware from its control mechanism. It allows enterprises to securely and efficiently connect users to applications, particularly across multiple branch locations, data centers, and cloud environments. " 156 VPC: A Virtual Private Cloud is a virtual network dedicated 157 to one client account. It is logically isolated from 158 other virtual networks in a Cloud DC. Each client can 159 launch his/her desired resources, such as compute, 160 storage, or network functions into his/her VPC. At the 161 time of writing this document, most Cloud operators' 162 VPCs only support private addresses, some support IPv4 163 only, others support IPv4/IPv6 dual stack. [minor] A simpler proposal to describe VPC " A VPC (Virtual Private Cloud) is a secure, isolated segment of a public cloud, where users can deploy and manage resources such as virtual machines, databases, and applications. VPCs offer the flexibility of using the public cloud's infrastructure while providing more control over networking and security. " 165 3. Issues and Mitigation Methods of Connecting to Cloud DCs 167 This section identifies some high-level problems that the IETF could 168 address, especially within the Routing area. Other Cloud DC problems 169 (e.g., managing cloud spending) are out of the scope of this 170 document. [DISCUSS1] Connecting to cloud data centers presents various routing challenges, including scalability, security, latency, routing policy consistency, and multi-cloud complexity. Enterprises need to carefully plan and manage their routing architecture to ensure reliable, efficient, and secure connections between on-premises infrastructure and cloud data centers. Solutions like dedicated connections, BGP security enhancements, and dynamic routing policies can help mitigate some of these challenges, but they also add complexity to the overall network architecture. Not all of these high level enterprise related concerns are addressed in draft-ietf-rtgwg-net2cloud-problem-statement-41 Key Routing Issues of interest by enterprises when Connecting to Cloud Data Centers: 1. Latency and Path Optimization: * Suboptimal Routing: Traffic between on-premises data centers and cloud providers may traverse multiple ISPs or intermediary networks, leading to increased latency. Default internet paths may not always be the most optimal, which can negatively impact performance for latency-sensitive applications. * Traffic Engineering: Enterprises may struggle to optimize routes for specific applications. This can be critical when performance demands, such as low latency for real-time applications, are high. 2. Multi-Cloud and Hybrid Cloud Connectivity: * Inter-Cloud Routing Complexity: Routing between multiple cloud providers (multi-cloud) or between on-premises environments and the cloud (hybrid cloud) is challenging. Each cloud provider may use different routing policies, protocols, and architectures, complicating consistent policy enforcement and efficient routing across different environments. * Vendor-Specific Routing Mechanisms: Cloud providers like AWS, Microsoft Azure, and Google Cloud have their own proprietary routing mechanisms, such as AWS Transit Gateway or Azure Virtual WAN. Managing routing across different clouds requires expertise in each platform’s unique setup. 3. BGP Complexity: * BGP Configuration: Enterprises often use Border Gateway Protocol (BGP) to connect their on-premises networks with cloud DCs. However, configuring BGP for efficient and secure communication can be complex, especially when dealing with cloud providers’ route limitations, filtering, and peering configurations. * BGP Route Convergence: If there is a network topology change, BGP may take time to converge on a new optimal route, which could cause temporary routing loops or black holes, leading to downtime or degraded performance. * BGP Security: Routing security issues like BGP hijacking can be a concern. If not properly secured, attackers can manipulate routes, potentially intercepting or redirecting traffic between an enterprise and a cloud data center. 4. Overlapping IP Addresses: * IP Address Conflicts: When connecting multiple cloud environments or when integrating with on-premises networks, organizations may encounter overlapping private IP address spaces (e.g., two networks using the same RFC1918 address space). This creates routing conflicts and requires address translation (e.g., NAT) or careful IP planning. * NAT Complexity: Network Address Translation (NAT) is often used to resolve overlapping IPs, but it adds complexity to routing, and troubleshooting connectivity issues can become more difficult. 5. Routing Scalability: * Large Route Tables: Cloud environments often host a large number of subnets, virtual machines (VMs), and applications, which results in significant route table growth. On-premises routers may struggle to handle the large number of routes advertised by cloud data centers. * Route Aggregation: To manage large routing tables, route aggregation is essential, but improper aggregation can lead to suboptimal routing or create security issues by allowing unintended access to broader network segments. 6. East-West Traffic Optimization: * East-West Traffic Challenges: Modern cloud workloads often involve significant east-west traffic (i.e., traffic between different applications or services within the cloud). Efficiently routing this traffic between cloud regions or between an on-premises data center and the cloud can be challenging, especially if cross-region bandwidth or routing constraints exist. 7. Latency and Bandwidth Considerations: * Performance Over Public Internet: Connecting to a cloud DC over the public internet introduces unpredictable latency and limited control over the routing path. Enterprises may use dedicated connectivity solutions like AWS Direct Connect or Azure ExpressRoute to avoid the public internet and achieve more predictable performance, but these solutions come with additional cost and complexity. * Bandwidth Costs: Cloud providers often charge for egress traffic (traffic leaving the cloud data center). Suboptimal routing can increase data transfer costs if traffic is unnecessarily routed through expensive pathways. 8. Route Propagation and Policy Enforcement: * Consistent Route Propagation: Propagating routes between an on-premises network and a cloud data center can be inconsistent, especially when using complex routing policies. Enterprises need to carefully manage route redistribution between different routing domains (e.g., BGP on-premises and cloud provider proprietary routing). * Policy Control: Implementing consistent routing policies (e.g., security, load balancing, and traffic engineering policies) across cloud and on-premises environments can be challenging due to the different tools and mechanisms used by cloud providers. 9. Routing Security: * Securing Routing Information: When using BGP to connect to cloud data centers, securing routing information is crucial. BGP hijacking and route leaks can lead to malicious traffic redirection. Organizations need to implement security measures like BGP authentication, RPKI (Resource Public Key Infrastructure), and route filtering to prevent unauthorized route advertisements. * Encryption and Privacy: Data traveling between an enterprise and the cloud may need encryption to protect against eavesdropping. Implementing encrypted tunnels (e.g., IPSec VPN) can add complexity to the routing setup. 10. Failover and High Availability: * Redundancy and Failover: Ensuring high availability in cloud connectivity involves setting up redundant links and implementing fast failover mechanisms to ensure traffic is re-routed quickly in the event of a link failure. However, configuring effective failover paths that meet performance and cost requirements can be complex, especially across different clouds or between cloud and on-premises environments. * Dynamic Failover: In hybrid environments, ensuring that routes dynamically change during failover scenarios can be difficult due to the different routing protocols or static routes used in cloud environments. 11. Geographic Routing and Data Residency: * Compliance and Regulation: Enterprises may face legal and regulatory challenges regarding where data is routed. For instance, data residency requirements (e.g., GDPR) may mandate that certain data be routed or stored only within specific geographical regions. Ensuring that routing policies comply with these regulations across cloud and on-premises environments can be a complex issue. * Geographic Load Balancing: Routing traffic to cloud data centers in different regions to optimize for performance or compliance requires careful planning and monitoring. 743 7. Security Considerations 744 745 The security issues in terms of networking to Cloud DCs include [DISCUSS2] Enterprises connecting to cloud data centers must address a wide range of security concerns, from ensuring encrypted communications and controlling access, to securing routing protocols and complying with regulatory requirements. By employing robust encryption, strong access controls, comprehensive monitoring, and segmentation strategies, organizations can mitigate risks and securely connect their on-premises infrastructure to cloud environments. Additionally, leveraging the security tools and services provided by cloud vendors can help ensure that the network and data remain protected. A security section should investigate these to provide an holistic security overview. While not all of these have direct impact upon routing, or should even be standardized, it is important for enterprises to have a secure and robust cloud DC experience. 1. Encryption of Data in Transit: * End-to-End Encryption: Data traveling between on-premises infrastructure and the cloud should be encrypted to protect against interception and eavesdropping. Common methods include using IPsec VPNs, SSL/TLS, or private connectivity options like AWS Direct Connect or Azure ExpressRoute, which provide secure, dedicated connections to the cloud. * Encrypted Tunnels: Secure tunnels (IPsec, SSL, or GRE) can be used to ensure data confidentiality and integrity during transmission. Encryption helps mitigate man-in-the-middle attacks. 2. Authentication and Access Control: * Strong Authentication Mechanisms: Employ strong, multi-factor authentication (MFA) for accessing both on-premises and cloud resources. Implement VPN access control to ensure only authorized users and devices can establish connections to cloud environments. * Identity and Access Management (IAM): Use IAM policies to control who can access resources in the cloud. Ensure that IAM roles are tightly controlled and that users and applications only have the minimum permissions they need (principle of least privilege). 3. Secure Routing Protocols: * BGP Security: If using Border Gateway Protocol (BGP) to connect to cloud services, protect the routing protocol by implementing BGP authentication (using for example TCP-AO) and route filtering to prevent unauthorized or incorrect routing information from being accepted. * Route Filtering: Control which routes are propagated between on-premises networks and the cloud to prevent route leaks, which could expose sensitive routes to external parties or misdirect traffic. * RPKI (Resource Public Key Infrastructure): Consider using RPKI to prevent BGP hijacking, ensuring that the routes being advertised are valid and have not been tampered with. 4. Network Segmentation: * Isolating Traffic: Use Virtual Private Clouds (VPCs) and subnet segmentation to isolate traffic between different departments, workloads, or tenants. This ensures that sensitive data is not exposed to unauthorized users within the same cloud environment. * Private Connectivity: Use private connectivity options (e.g., AWS Direct Connect, Azure ExpressRoute) to avoid sending sensitive data over the public internet, reducing the risk of exposure to attacks. 5. Data Encryption at Rest: * Cloud Data Encryption: Ensure that data stored in the cloud is encrypted at rest. Many cloud providers offer encryption services (e.g., AWS Key Management Service, Azure Key Vault) to manage encryption keys securely. Consider using customer-managed keys for additional control over encryption processes. * Compliance with Encryption Standards: Ensure that encryption protocols comply with industry standards and regulatory requirements (e.g., AES-256 encryption for sensitive data). 6. Visibility and Monitoring: * Traffic Monitoring: Use tools like cloud network traffic analyzers or intrusion detection systems to monitor traffic between on-premises infrastructure and cloud environments. Detect anomalous behavior or unauthorized access attempts by maintaining visibility into network traffic. * Logging and Auditing: Enable comprehensive logging of all access and configuration changes in both on-premises and cloud environments. Cloud providers often offer logging services like AWS CloudTrail or Azure Monitor to track user activity and help detect security breaches. * Threat Detection and Response: Deploy security tools that offer threat detection, real-time monitoring, and automated response. Solutions like SIEM (Security Information and Event Management) systems can help correlate events across the hybrid cloud to detect security incidents. 7. DDoS Protection: * Distributed Denial of Service (DDoS) Protection: Cloud data centers can be targets of DDoS attacks, which can disrupt network services. Cloud providers offer DDoS mitigation services (e.g., AWS Shield, Azure DDoS Protection) that can protect both the cloud environment and the connection to on-premises infrastructure. * Rate Limiting: Implement rate limiting and other traffic control mechanisms to prevent network saturation during potential attacks. 8. Firewalls and Security Groups: * Network Firewalls: Use firewalls to control traffic flowing between on-premises networks and cloud environments. Cloud providers offer virtual firewalls that can be configured to enforce strict access controls. * Security Groups: Implement security groups and network ACLs (Access Control Lists) to control inbound and outbound traffic at the VPC or subnet level. These mechanisms should be used to restrict access to only those IP addresses or protocols that are necessary. 9. Zero Trust Security Model: * Zero Trust: Adopt a Zero Trust model that assumes no network (internal or external) is automatically trusted. Every access request should be verified, and users, devices, and applications should be authenticated before being allowed access to resources. * Microsegmentation: Use microsegmentation to further isolate workloads within the cloud, ensuring that even if an attacker gains access to one part of the network, they cannot easily move laterally. 10. Compliance and Regulatory Considerations: * Data Sovereignty and Residency: Ensure compliance with data sovereignty laws (e.g., GDPR) by enforcing routing policies that keep sensitive data within specified geographical regions. * Encryption for Compliance: Encrypt sensitive data both in transit and at rest to meet regulatory requirements like HIPAA, PCI-DSS, or GDPR. Cloud providers often offer compliance certification, but it's important to ensure the proper configurations are in place. * Auditing and Reporting: Regularly audit the security posture of the hybrid cloud environment to ensure ongoing compliance with security standards and regulations. 11. Network Access Control: * VPN Access: Use VPN gateways to securely connect on-premises networks to cloud environments, encrypting traffic between the two endpoints. * Multi-Factor Authentication (MFA): Implement MFA for users and administrators accessing cloud resources remotely to add an extra layer of security. 12. Patch Management and Vulnerability Scanning: * Patch Cloud Resources: Ensure that virtual machines, containers, and other cloud resources are regularly patched to protect against vulnerabilities. Leverage automated tools for patch management across both on-premises and cloud environments. * Vulnerability Scanning: Regularly scan cloud environments for vulnerabilities and misconfigurations that could be exploited by attackers. 13. Distributed Workloads and Traffic Control: * Load Balancing: Use cloud-based load balancers to evenly distribute traffic across multiple servers and data centers, reducing the risk of congestion or single points of failure. * Content Delivery Networks (CDNs): Use CDNs to distribute content closer to users, reducing latency and improving performance while also offering security benefits such as DDoS protection and content encryption. 14. Incident Response Plan: * Develop a Cloud-Specific Incident Response Plan: Ensure that the organization's incident response plan accounts for both on-premises and cloud environments. This includes identifying responsibilities, communication channels, and the tools needed to detect, investigate, and respond to security incidents. * Automated Responses: Consider automating certain responses, such as shutting down suspicious instances, revoking access, or blocking traffic, based on pre-defined security rules. _______________________________________________ rtgwg mailing list -- rtgwg@ietf.org To unsubscribe send an email to rtgwg-le...@ietf.org