Connectivity for More Than Cat Videos: Why Global Networks Are the Backbone of AI Featured

Connectivity for More Than Cat Videos: Why Global Networks Are the Backbone of AI Image Credit: PavelMuravev/BigStockPhoto.com

Artificial Intelligence (AI) is now essential for business value creation, with many leading tech companies relying on AI to please the stock market and power their transformation initiatives. While there’s so much focus on the chipsets and data centers enabling AI, what’s network connectivity’s role?

AI faces many challenges to meet its immense data needs, presenting heightened requirements at the network core and the edge. With such high stakes, the underlying connectivity infrastructure enabling AI cannot be the weakest link. Network backbone providers will play a critical role in AI’s future by ensuring the high-capacity bandwidth, low-latency and resiliency requirements of AI applications will remain uncompromised. While hyperscalers are busy building for their next moonshot, enterprises should also start futureproofing their networks in anticipation of AI’s demands.

Bird’s eye view: scenarios from a carrier perspective

The future of AI is still unwritten. But it’s clear that tech companies and their enterprise customers are significantly investing into it. In 2024, hyperscalers (including Alphabet, Amazon and Meta) are forecasted to invest $200 billion (up 45% from last year) in datacenters, hardware and other technologies required to deploy generative AI models. Moreover, the data center industry is busy as ever, with total capacity in North America and Europe to grow 2X in the next three years. These data centers must train models on ever-more specific questions and ever-larger datasets, requiring connectivity that links data centers and serves end user and application requests.

However large these requirements become; it will be a significant change compared to today’s networks. Let’s examine the two main network use cases in AI – training the models (“learning”) and putting them to use (“inference”) – and dive deeper to understand how the network backbone enables both use cases. In each scenario, AI applications require low-latency speed, high-capacity bandwidth and resiliency at the network underlay to enable optimal functionality – but the magnitude of the demands, and the requisite solutions to serve them, may differ.

Scenario one: model learning drives explosive growth between core data centers and cloud

AI learning is highly computationally-intensive with extreme spikes in workloads. It also relies on large data sets to enable meaningful training, meaning that collecting and replicating this data is key. Meanwhile, power and data center colocation space are becoming scarce, with companies struggling to build out their data centers at pace. Sustainable power is even more constrained, and rarely aligned with today’s data center hubs. Hence, workloads must be distributed across clusters of datacenters, requiring higher capacity across the network underlay. Scaling for these massive capacity requirements may take different forms. However, both approaches require operators to build resiliency into the underlying network, providing companies with reliable data transfer via multiple protected paths between data centers.

For enterprises, traditional Wide Area Networks (WANs) are neither sufficient nor cost-efficient. Instead, high-bandwidth Ethernet can address most learning use cases. With leading carriers already three years into 400GE implementation, and their backbone networks carrying hundreds of terabits per second in peak traffic, Ethernet services can scale at tremendous capacities that meet AI’s bandwidth requirements. Ethernet’s protected nature, in combination with Class-of-Service (CoS) at high bandwidths and deterministic routing based on Fixed Paths, means it’s also optimized for reliability and fault tolerance.

However, this will not be sufficient for hyperscalers and new AI tech companies. Nx400G, managed optical networks or dark fiber are the main solutions in hyperscaler use cases to solve these challenges. Nx400G (multiple 400G Ethernet links in parallel) provides high-capacity links to support massive data flows reliably across the network. Economies of scale and the need for control will also increase dark fiber demand. Managed optical networks are a hybrid between the two, where a network service provider builds to a bespoke location based on a single customer, providing a means to extend networks to underserved locations purpose-built for AI/ML workloads. This approach helps hyperscalers focus on their core business operations by relying on specialized providers to ensure optimal performance and reliability.

Scenario two: inference powering the edge to new heights

Inference at the edge is end-user focused, enabling companies to leverage pre-trained AI models to process user requests. Inference is less computationally intensive and power hungry than AI learning. However, IT teams want to perform inference closer to the edge of the network to reduce latency and improve end-user performance. This is similar to how content delivery networks (CDNs) currently optimize access to video, gaming and websites globally by localizing content, constantly synchronizing to distribute the latest blockbuster content.

This content is consumed by end-users through the Internet. Today, CDNs largely rely on Tier 1 Internet providers to ensure reach into the 70,000+ networks that comprise the Internet. Connectivity for AI applications must be optimized similarly to CDNs, but the demands are heightened due to AI being less cacheable, more business-critical and more latency-sensitive. It’s clear that AI’s bandwidth requirements may increase exponentially in the future. Compared to today’s single text string queries, users may submit a voice-based request, an image for editing or a sentence describing a desired piece of code and may receive a new piece of video content or fully-operational software. Moreover, edge nodes must also communicate back to the core for various purposes, including synchronization, retrieving the latest trained model, sharing its own learnings or asking the core model to compute a request it has not yet been trained to compute.

In most cases, Internet connectivity continues to be the obvious choice for distribution of AI applications to end users, highlighting the role of backbone networks in this scenario. Enterprise buyers may also utilize Ethernet backbone services over the same port, mixing public Internet with private, performance-optimized connectivity to their own data center and cloud core. Scalability is critical here so networks can dynamically allocate bandwidth according to AI’s real-time peaks. This is an end-to-end game. Unless your Internet provider is well-peered with sufficient edge capacity and deep connectivity into local providers, your critical traffic may get stuck. For cloud connectivity, network providers also need high-capacity network-to-network interfaces (NNIs), as upgrades to 10G connections are well past due.

Enterprise AI: mirroring hyperscaler requirements

In both scenarios, enterprises must carefully evaluate how they collect, process and secure their collected data, as this is their “gold” in maximizing AI. Due to their reliance on hybrid and multi-cloud environments, it’s vital to reexamine their corporate connectivity architecture. With Internet and cloud security now largely serving the hybrid work environment, the next horizon lies between the data center core and edge. Therefore, enterprises should prioritize high-quality Internet connectivity underpinned by high capacities and deep, uncongested peering, along with scalable Ethernet or Wavelength backbones that provide direct onramps to their data center and cloud infrastructure. Finally, enterprises must remember that their WAN estates will face similar challenges as hyperscalers, but on a smaller scale. Still, they will require high-capacity network solutions that can handle distributed AI workloads if they wish to reap the benefits of emerging technologies.

Understanding the underlay’s role in each scenario

AI enablement is the driving force behind many companies’ current valuations and investment prospects. As a result, innovation of the underlying network is essential to realize AI’s potential at every level of the technology industry. While AI’s exact impact on networking is uncertain, its connectivity requirements are familiar. These possible scenarios show that AI is not “one-size-fits-all,” meaning companies must carefully evaluate their choice of connectivity as they seek to capitalize on AI’s unique benefits. For network backbone providers, AI creates a call to action to provide scalable bandwidth, reliability, performance and reach to keep pace with AI’s networking requirements across diverse industries and scenarios.

NEW REPORT:
Next-Gen DPI for ZTNA: Advanced Traffic Detection for Real-Time Identity and Context Awareness
Author

Johan Ottosson is the Vice President Strategy at Arelion. As an experienced strategist and manager, Johan led the development of growth strategies, M&A and long-term corporate strategic planning across various industries before zeroing in on communications and technology. His experience has taught him that innovation is defined by those on the front lines working with customers or engineering the network, not by organizational charts.

PREVIOUS POST

Push to Eliminate 'Digital Poverty' to Drive Demand for Satellite-Powered Broadband Connectivity Post Pandemic