This talk argues that the historical trend towards ever faster network port speeds in Ethernet networks is increasingly in conflict with building efficient, massive scale data centers. To support these data centers, the industry needs to fix port speed at a technology “sweet spot” of around 200Gbps and instead scale networks by increasing the number of ports (the radix) on NICs and switches. When combined with novel transport protocols that “spray” packets over all possible paths between source and destination, networks built with these high radix components will enable much higher throughputs, higher reliability, lower cost and lower latency than traditional networks containing three or more tiers of switches. These remarks apply not only to large scale networks for training AI models, where there is a single transport layer and no requirement for strict packet ordering, but also to more general networks where multiple protocols need to be supported and where packet order does matter. The remarks are also general in that they are applicable not only to networks built using Ethernet, but to any packet switching technology.
There are two fundamental reasons why this trend needs to be curtailed: First, the cross-sectional bandwidth of packet switches and NICs is necessarily limited by technology at any given point in technology; this means increasing port speed proportionately decreases radix, which in turn means that networks need to have more layers to support a given scale. More layers are more costly and less reliable. They require higher power, have higher latency and make it more difficult to solve congestion and error control problems that become increasingly important at scale. Second, regardless of the transmission medium used for connection, signals ultimately originate and terminate in electronics whose bandwidth is inherently limited by the speed of transistors; continuously increasing port speed stresses implementations to the breaking point at all levels from the physical layer to the logical while providing little to no benefit: At the physical layer, implementors are driven to use exotic technologies to achieve the requisite gain-bandwidth product. At the logical layer, designers struggle to implement logic where each port can receive a packet every clock. Pushing port speeds beyond technology capabilities will result in more expensive and less reliable components.
Dr. Pradeep Sindhu is an industry visionary currently focused on data processing innovations at Microsoft. Sindhu, who co-founded Fungible Inc. and served as its CEO and CTO, is credited with inventing the Data Processing Unit (DPU) that revolutionized storage system efficiency. He is also the founder of Juniper Networks, where he led the development of all major products that shaped the future of networking infrastructure. Dr. Sindhu’s contributions have redefined networking hardware and software, driving advances impacting cloud computing and AI infrastructure.