| Thursday, June 1, 2023
 Jensen Huang is the man of the moment, and his megastar status was certainly reflected at the opening keynote at COMPUTEX 2023 on Monday. He stood for almost two hours talking to an in-person audience of about 3,500 people.
 The word entertaining is probably more descriptive of the way he delivered his talk, to which the room full of tech nerds, computing ecosystems, media representatives and analysts warmed. Huang interspersed his speech with comments in his native Taiwanese to the delight of the audience, and he even got the audience to sing along, karaoke style, to an AI-generated theme song for his keynote.
 His message was clear: AI, and particularly generative AI, is the next big thing with nearly every industry wanting it and potentially benefiting from it. It just happens to be coincidental that Nvidia is in the right place at the right time to deliver the underpinning hardware and software that generative AI will need to function—while also heading for a market value of a trillion dollars on the back of this massive global interest in AI.
 The sustainability angle was also prominent: Huang made sure the audience clearly understood how GPUs deliver more performance while running all the large language models (LLMs) for AI than CPUs, at lower cost and using less power. For example, he said $10 million buys 48 GPU servers that would consume 3.2 GWh to deliver 44 LLMs, while the same money buys 960 CPU servers that would consume 11 GWh to deliver just 1 LLM.
 Huang repeated this in many ways, and with different parameters, to emphasize the point. He said the data-center equation is very complicated, and the key is therefore to create dense computers and not big ones. He quipped, “The more you buy, the more you save,” referring to the figures he shared about saving money and power by buying more GPUs.
 If Steve Jobs was the poster CEO of the mobile phone revolution, Jensen Huang is the poster CEO of the AI “revolution.” His speech was littered with phrases like, “We are extending the frontiers of AI,” and,  “We’d like to bring generative AI to every data center.”
 “Accelerated computing and AI mark a reinvention of computing,” he asserted. “We’re now at the tipping point of a new computing era with accelerated computing and AI that’s been embraced by almost every computing and cloud company in the world.”
 In the last year alone, Huang noted that 40,000 large companies and 15,000 startups now use Nvidia technologies with 25 million downloads of CUDA software.
 There was much information packed into the keynote, ranging from new AI supercomputers, modular reference architectures and network fabrics to digital smart factories and autonomous mobile robotics (AMR).
 Large memory 1 Exaflop AI supercomputer
 If you wanted sheer performance, there was detail of Grace Hopper DGX GH200, a new class of large-memory AI supercomputer that uses NVLink interconnect technology with the NVLink switch system to combine 256 GH200 superchips, allowing them to perform as a single GPU. This provides 1 exaflop of performance and 144 terabytes of shared memory—nearly 500× more memory than the previous generation Nvidia DGX A100, which was introduced in 2020.
 “Generative AI, large language models and recommender systems are the digital engines of the modern economy,” Huang said. “DGX GH200 AI supercomputers integrate Nvidia’s most advanced accelerated computing and networking technologies to expand the frontier of AI.”
 The GH200 superchips eliminate the need for a traditional CPU-to-GPU PCIe connection by combining an Arm-based Grace CPU with a H100 tensor core GPU in the same package, using NVLink-C2C chip interconnects. This is said to increase the bandwidth between GPU and CPU by 7× compared with the latest PCIe technology, slash interconnect power consumption by more than 5× and provide a 600GB Hopper architecture GPU building block for DGX GH200 supercomputers.
 The DGX GH200 is the first supercomputer to pair Grace Hopper superchips with the NVLink switch system, a new interconnect that enables all GPUs in a DGX GH200 system to work together as one. The previous-generation system only provided for eight GPUs to be combined with NVLink as one GPU without compromising performance. This architecture provides 48× more NVLink bandwidth than the previous generation, delivering the power of a massive AI supercomputer with the simplicity of programming a single GPU.
 Huang said Google Cloud, Meta and Microsoft are the first to gain access to the DGX GH200 to explore its capabilities for generative AI workloads. Nvidia also intends to provide the DGX GH200 design as a blueprint to cloud service providers and other hyperscalers so they can further customize it for their infrastructure.
 Improving efficiency of Ethernet-based AI
 On the networking side, Huang announced Spectrum-X, an accelerated networking platform designed to improve the performance and efficiency of Ethernet-based AI clouds. This is built on networking innovations powered by the tight coupling of the Nvidia Spectrum-4 Ethernet switch with the BlueField-3 DPU, said to achieve 1.7× better overall AI performance and power efficiency, along with consistent, predictable performance in multi-tenant environments. Nvidia acceleration software and software development kits (SDKs) allow developers to build software-defined, cloud-native AI applications.
 Dell Technologies, Lenovo and Supermicro are already using Spectrum-X.
 As a blueprint and testbed for Spectrum-X reference designs, Nvidia is also building Israel-1, a hyperscale generative AI supercomputer to be deployed in its Israeli data center on Dell PowerEdge XE9680 servers based on the HGX H100 eight-GPU platform, BlueField-3 DPUs and Spectrum-4 switches.
 The Spectrum-X networking platform can be used in various AI applications and uses fully standards-based Ethernet and is interoperable with Ethernet-based stacks. It enhances multi-tenancy with performance isolation to ensure tenants’ AI workloads perform optimally and consistently. It also offers better AI performance visibility, as it can identify performance bottlenecks, and it features completely automated fabric validation.
 The Ethernet platform enables 256 200Gb/s ports connected by a single switch, or 16,000 ports in a two-tier, leaf-spine topology to support the growth and expansion of AI clouds while maintaining high levels of performance and minimizing network latency.
 Modular reference architecture for diversity of accelerated computing needs
 The main message from the keynote speech was around the diversity of accelerated computing needs—and that data centers increasingly need to at once control costs and meet requirements for both growing compute capabilities and decreasing carbon emissions.
 To address the diversity of applications, Huang unveiled the Nvidia MGX server specification, which provides system manufacturers with a modular reference architecture to quickly and cost-effectively build more than 100 server variations to suit a wide range of AI, high performance computing and Omniverse applications. The modular design of MGX gives system manufacturers the ability to meet each customer’s unique budget more effectively, as well as their power delivery, thermal design and mechanical requirements.
 With MGX, manufacturers start with a basic system architecture optimized for accelerated computing for their server chassis. They then select their GPU, DPU and CPU. Design variations can address unique workloads, such as HPC, data science, LLMs, edge computing, graphics and video, enterprise AI, and design and simulation. Multiple tasks like AI training and 5G can be handled on a single machine, while upgrades to future hardware generations can be “frictionless.” MGX can also be easily integrated into cloud and enterprise data centers.
 Huang said ASRock Rack, ASUS, GIGABYTE, Pegatron, QCT and Supermicro will adopt MGX, which can slash development costs by up to three-quarters and reduce development time by two-thirds—to just six months. QCT and Supermicro will be the first to market, with MGX designs appearing in August.
 Supermicro’s ARS-221GL-NR system, announced at Computex 2023, will include the Grace CPU superchip, while QCT’s S74G-2U system, also announced at the show, will use the GH200 Grace Hopper Superchip. Additionally, SoftBank plans to roll out multiple hyperscale data centers across Japan and use MGX to dynamically allocate GPU resources between generative AI and 5G applications.
                             
                                By: DocMemoryCopyright © 2023 CST, Inc. All Rights Reserved
 |