Our research explores the design of data centers, develops systems to monitor cloud computing infrastructure, and understand the reliability and manageability of hyperscalars.
-
Quadrant: A Cloud-Deployable NF Virtualization Platform
Jianfeng Wang, Tamás Lévai, Zhuojin Li, and 3 more authors
In SoCC ’22: Proceedings of the ACM Symposium on Cloud Computing 2022
Network Functions (NFs) now touch a significant fraction of Internet traffic. The hope has been that software-based NF Virtualization (NFV) would enable rapid development of new NFs by vendors and leverage the power and economics of commodity computing infrastructure for NF deployment. To date, no cloud NFV systems achieve NF chaining, isolation, SLO-adherence, and scaling together with existing cloud computing infrastructure and abstractions, all while achieving generality, speed, and ease of deployment; these properties are taken for granted in other cloud contexts but unavailable for NF processing. We present Quadrant, an efficient and secure cloud-deployable NFV system, and show that Quadrant’s approach of adapting existing cloud infrastructure to support packet processing can achieve NF chaining, isolation, generality, and performance in NFV. Quadrant reuses common cloud infrastructure such as Kubernetes, cloud functions, the Linux kernel, NIC hardware, and switches. It enables easy NFV deployment while delivering up to double the performance per core compared to the state of the art.
-
Optimal Oblivious Routing for Structured Networks
Sucha Supittayapornpong, Pooria Namyar, Mingyang Zhang, and 2 more authors
In IEEE INFOCOM 2022 - IEEE Conference on Computer Communications 2022
Oblivious routing distributes traffic from sources to destinations following predefined routes with rules independent of traffic demands. While finding optimal oblivious routing is intractable for general topologies, we show that it is tractable for structured topologies often used in datacenter networks. To achieve this, we apply graph automorphism and prove the existence of the optimal automorphism-invariant solution. This result reduces the search space to targeting the optimal automorphism-invariant solution. We design an iterative algorithm to obtain such a solution by alternating between two linear programs. The first program finds an automorphism-invariant solution based on representative variables and constraints, making the problem tractable. The second program generates adversarial demands to ensure the final result satisfies all possible demands. Since, the construction of the representative variables and constraints are combinatorial problems, we design polynomial-time algorithms for the construction. We evaluate proposed iterative algorithm in terms of throughput performance, scalability, and generality over three potential applications. The algorithm i) improves the throughput up to 87.5% over a heuristic algorithm for partially deployed FatTree, ii) scales for FatClique with a thousand switches, iii) is applicable to a general structured topology with non-uniform link capacity and server distribution.
-
CloudCluster: Unearthing the Functional Structure of a Cloud Service
Weiwu Pang, Sourav Panda, Jehangir Amjad, and 2 more authors
In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22) 2022
In their quest to provide customers with good tools to manage cloud services, cloud providers are hampered by having very little visibility into cloud service functionality; a provider often only knows where VMs of a service are placed, how the virtual networks are configured, how VMs are provisioned, and how VMs communicate with each other. In this paper, we show that, using the VM-to-VM traffic matrix, we can unearth the functional structure of a cloud service and use it to aid cloud service management. Leveraging the observation that cloud services use well-known design patterns for scaling (e.g., replication, communication locality), we show that clustering the VM-to-VM traffic matrix yields the functional structure of the cloud service. Our clustering algorithm, CloudCluster, must overcome challenges imposed by scale (cloud services contain tens of thousands of VMs) and must be robust to orders-of-magnitude variability in traffic volume and measurement noise. To do this, CloudCluster uses a novel combination of feature scaling, dimensionality reduction, and hierarchical clustering to achieve clustering with over 92% homogeneity and completeness. We show that CloudCluster can be used to explore opportunities to reduce cost for customers, identify anomalous traffic and potential misconfigurations.
-
Gemini: Practical Reconfigurable Datacenter Networks with Topology and Traffic Engineering
Mingyang Zhang, Jianan Zhang, Rui Wang, and 3 more authors
CoRR 2021
-
A throughput-centric view of the performance of datacenter topologies
Pooria Namyar, Sucha Supittayapornpong, Mingyang Zhang, and 2 more authors
In ACM SIGCOMM 2021 Conference, Virtual Event, USA, August 23-27, 2021 2021
-
Towards Highly Available Clos-Based WAN Routers
Sucha Supittayapornpong, Barath Raghavan, and Ramesh Govindan
2019
-
Understanding Lifecycle Management Complexity of Datacenter Topologies
Mingyang Zhang, Radhika Niranjan Mysore, Sucha Supittayapornpong, and 1 more author
In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI) 2019
-
Evolve or Die: High-Availability Design Principles Drawn from Google’s Network Infrastructure
Ramesh Govindan, Ina Minei, Mahesh Kallahalla, and 2 more authors
In Proceedings of the ACM Conference of the Special Interest Group on Data Communication (SIGCOMM ’16) au 2016