OCP 2025 Talk on Capacity-Aware Adaptive Routing

I recently spoke at OCP 2025 about Meta’s innovative solution that introduces remote-capacity-awareness to adaptive routing, which helps improve network performance for AI training. Here is a brief overview of the talk:

As backend networks continue to evolve and grow exponentially, Meta has been working on optimizing the routing design for its next-generation backend network architecture. Our current Adaptive Routing Scheme which relies on local load awareness, has limitations when applied to our larger topologies. The presence of parallel links and usage of shallow buffer switches can lead to congestion and decreased performance, particularly when remote links fail and reduce capacity without losing reachability.

To address these challenges, we have developed a new approach that combines remote capacity information with local-congestion-aware spray. This solution dynamically adjusts the load balancing scheme based on end-to-end network capacity, ensuring more efficient traffic distribution and minimizing the impact of link failures on network performance. We also developed algorithms for efficiently merging ARS groups and leveraged non load aware ECMP spray groups to address scaling challenges.

Recording of the full talk is available on Youtube.

Paper on Meta's hybrid DC SDN controller was accepted by SIGCOMM 2025!

Super excited to share that our paper on Meta’s hybrid DC SDN controller was accepted by SIGCOMM 2025! In this paper we highlight the challenges of planning and executing live migrations in large-scale DCN using BGP, and share our novel hybrid solution that enables centralized route planning (i.e., SDN) while maintaining distributed enforcement (i.e., BGP).

Paper and more details will be posted after camera-ready. Looking forward to attending SIGCOMM in Portugal this year, meeting old friends and making new ones!

Update (10-23-2025): Paper is available here. Full SIGCOMM talk is available on Youtube.

Successufully defended!

Today I successfully defended my dissertation “Enabling and Improving Centralized Control in Network and Cyber-Physical Systems: An Application-Driven Approach”!

I will be joining Facebook@NYC in the fall as a Research Scientist working on data center routing, control, and analysis.

Paper on traffic migration was accepted by CoNEXT 2019!

Our paper on generalizing traffic migration for different kinds of physical and virtual network functions in different use cases was accepted by CoNEXT 2019! This year there were 190 submissions and 32 acceptance.

Paper will be posted after camera-ready. Looking forward to meeting with other networking folks and an exciting conference!

In Memory of Professor Bi

This post is dedicated to Professor Bi who recently passed away. I was extremely fortunate to have had the opportunity of joining Professor Bi’s lab as an undergraduate research assistant in my junior year. Back then I was a newbie to research, but Professor Bi saw my motivation and potentials and decided to put me on two of his most important projects led by his postdocs. Professor Bi gave me his full support and guidance during my stay and we eventually managed to submit two papers before I graduated. The experience I had at Tsinghua with Professor Bi was invaluable and the biggest reason I decided to and was able to become a PhD student.

Professor Bi works extremely hard, much harder than any of his students. Yet he’s never overly critical nor harsh on his students. I remember vividly the time I fell asleep during group meeting because I pulled an all-nighter. Professor Bi was not angry nor offended, rather, he felt sorry and urged me to get some rest. Now that he’s not with us anymore, I hope he can rest in peace. I am forever grateful and I will miss you, 毕老师.