With ISC's annual high-performance supercomputing conference kicking off this week, Intel is one of several vendors making announcements scheduled for the event. As the crown jewels of the company's HPC product portfolio have been released over the past few months, the company doesn't have any big silicon announcements to make alongside this year's show — and unfortunately, Aurora isn't working just yet to try. on the top 500 list. So, after a tumultuous year so far that has seen significant changes to Intel's GPU roadmap in particular, the company is using ISC to pull itself together and use the backdrop of the show to chart a new roadmap for HPC customers.
Most notably, Intel is using this opportunity to further explain some of the hardware development decisions the company has made this year. This includes Intel's pivot on Falcon Shores, transforming it from an XPU to a pure GPU design, as well as a few more high-end details of what will eventually become Intel's next HPC-class GPU. While Intel would be perfectly happy to keep selling CPUs, the company has realigned (and continues) to a diverse market where its high-end customers need more than just CPUs.
CPU Roadmap: Emerald Rapids and Granite Rapids Xeons in progress
As noted earlier, Intel is not announcing any new silicon today in any part of its HPC portfolio. Therefore, Intel's latest HPC roadmap is essentially a condensed version of itslatest data center roadmap, which was first presented to investors at the end of March. HPC is, after all, a subset of the data center market, so the HPC roadmap reflects that.
I won't go into Intel's CPU roadmap too much here, as we just covered it a few months ago, but the company is once again reiterating the rapid run it intends to make via its Xeon products over the next 18 months. Sapphire Rapids is just a few months away from release, but Intel intends to have its successor on the same platform, Emerald Rapids, ready for delivery in the fourth quarter. Meanwhile, Granite Rapids, Intel's first P-Core Xeon in the Intel 3 process, will ship with its new platform in 2024. Granite will also be Intel's first product to support higher bandwidthMCR DIMM memory, which was similarly demonstrated in March.
Notably here, despite the ISC's HPC crowd, Intel has yet to announce a successor to the current generation HBM-powered Sapphire Rapids Xeon with HBM - which the company calls the Xeon Max Series. Intel is quite proud of the part - pointing out that it's the only x86 processor with HBM every chance it gets - and it's a key part of the Aurora supercomputer. We expected its successor to fit in with the Falcon Shores when it was an XPU, but since the Falcon became a GPU, there have been no further signs of where another HBM Xeon would land on Intel's roadmap.
Meanwhile, Intel is looking forward to demonstrating to the ISC audience the performance benefits of having so much high-bandwidth memory in the package with the CPU cores – and especially before AMD releases its EPYC Genoa-X processors with their oversized caches. 1GB+ L3. To that end, Intel has published several new benchmarks comparing the Xeon Max Series processors to the EPYC 7000 and 9000 chips, which, as they are vendor benchmarks, I won't go into here, but you can find in the gallery below.
Gallery:Intel ISC 2023 Xeon
GPU roadmap today: Ponte Vecchio now available, additional SKUs coming in coming months
The GPU counterpart to Sapphire Rapids with HBM for HPC audiences is Intel's Data Center GPU Max series, also known as Ponte Vecchio. The massively tiled chip is still unlike any other GPU on the market, and Intel's IFS foundry arm is proud to point out to potential customers that they are able to reliably assemble one of the most advanced chips on the market, with almost four dozen chiplets to place perfectly to bring the whole thing together.
Ponte Vecchio has had a long and grueling development cycle for both Intel and its customers, so they are taking a victory lap at ISC to celebrate this achievement. Of course, the Ponte Vecchio is just the beginning of Intel's HPC GPU efforts, not the end. Therefore, they are still in the process of building out the OneAPI software and ecosystem of tools to support the hardware – while being mindful of the fact that they need a strong software ecosystem to compete with rival NVIDIA and capitalize on the AMD's current shortcomings.
Despite being almost a generation behind, Intel surprisingly has some benchmarks comparing the Ponte Vecchio to the new H100 accelerators based on NVIDIA's Hopper architecture. With that said, they are for Intel's high-end OAM-based modules over the H100 PCIe cards; So, choosing aside, it remains to be seen how things would look with a hardware plus apples-to-apples comparison.
Gallery:Intel ISC 2023 Data Center GPU Max
Speaking of OAM modules, Intel is using the show to announce a new Universal 8-Way Baseboard (UBB) for the Ponte Vecchio. Joining Intel's existing 4-way UBB, the x8 UBB will allow 8 Data Center Max GPU modules to be placed on a single server board, similar to what NVIDIA does with its HGX carrier cards. If Intel wants to compete head-to-head with NVIDIA and capture some of the HPC GPU market, this is yet another area where they will need to match NVIDIA's hardware offerings. So far, Supermicro and Inspur are signed up to ship servers using the new x8 UBB, and if all goes well, these shouldn't be Intel's only customers.
Along with the UBB announcement, Intel is also providing for the first time a detailed month-by-month roadmap to Data Center Max GPU product availability. Now that Intel has almost fulfilled its Aurora order, the first parts have been loosely available to select customers, but now we can see where things stand in a little more detail. According to that roadmap, OEMs should be ready to start shipping 4-way GPU systems in June, while 8-way systems will be a month behind in July. Meanwhile, OEM systems using the PCIe version of the Ponte Vecchio, the Data Center GPU Max 1100, will be available in July. Finally, a detuned version of the Ponte Vecchio for “different markets” (read: China) will be available in the fourth quarter of this year. Details on this part are still scarce, but it will have reduced I/O bandwidth to meet US export requirements.
GPU Roadmap Tomorrow: All Roads Lead to Falcon Shores
Looking beyond the current iteration of the Data Center GPU Max and Ponte Vecchio series, the next GPU in the pipeline for Intel's HPC customers is the Falcon Shores. Like usdetailed in march, Falcon Shores will take on a significantly different role in life than Intel initially intended, following the cancellation of the Ponte Rialto, a direct descendant of the Ponte Vecchio. Rather than being Intel's first CPU + GPU combo product – a flexible XPU that can use a mix of CPU and GPU blocks – the Falcon will now be a pure GPU product. Unfortunately, it's also getting a year-long delay in the process, pushing it to 2025, meaning Intel's lineup of HPC GPUs are purely Bridge-based for years to come.
The cancellation of Rialto Bridge and the elimination of Falcon Shores XPU created a lot of consternation in the media and HPC community, so Intel is using this moment to get its messages in order, both in terms of why they pivoted on Falcon Shores, and exactly what that will entail.
The short story is that Intel decided it had mismarketed its first XPU and that the Falcon Shores as an XPU would have turned out to be premature. In Intel's collective opinion, because these products offer a fixed ratio of CPU cores to GPU cores (relative to the number of blocks used), they are best suited for workloads that closely match these hardware allocations.
And what are these workloads? Well, that turns out to be the 100B transistor issue. Intel expected the market to be more stable than it actually has been - i.e. it has been more dynamic than Intel expected - which Intel believes makes an XPU with its fixed rates harder to match workloads and more difficult to to sell to customers. As a result, Intel backtracked on its integration plans, leading to the all-GPU Falcon Shores.
Now, with that said, Intel is making it clear that it's not scrapping the idea of an XPU entirely; it's just that Falcon Shores in 2024/2025 isn't the right time for that. So Intel is also confirming that it will be developing a block-based XPU as a future post-Falcon Shores product (possibly as the Falcon Shores successor?). There aren't any more details on the XPU future than this, but for now, Intel still wants to get to CPU/GPU integration as soon as it deems the workloads and market ready. It also means that Intel is effectively ceding the mixed CPU-GPU accelerator market to AMD (and to a lesser extent NVIDIA) for at least a few more years, so do what you want with Intel's official justification. to delay your own XPU.
As for the all-GPU Falcon Shores, Intel is sharing just a little more about the design and features of its next-gen PC GPU. As you'd expect from a project that started out as a tile product, the Falcon remains a chiplet-based project. While it's unclear what types of chiplets Intel will be using (whether they will be homogeneous GPU blocks or not), they will be paired with HBM3 memory and what Intel calls "I/O designed to scale". In light of Intel's decision to delay XPUs, this is how they will provide a flexible CPU to GPU ratio for their HPC customers in the proven way: add as many GPUs as you need to your system.
Falcon Shores will also support Ethernet switching as a standard feature, which will be an important component in supporting the kind of very large meshes that customers are building with their supercomputers today. And since those parts will be discrete GPUs, Intel will embrace CXL to provide additional functionality to system designers and programmers. Given the time,CXL 3.0functionality is a safe bet, with things like P2P DMA and advanced mesh support going hand in hand with what the HPC market has been building.
And with a few years of experience behind them at this point, Intel hopes to be able to leverage OneAPI even further. Especially since they will need software help to abstract the CPU-GPU I/O gap that Falcon Shores, XPU, would be able to close in hardware.
Gallery:Intel ISC 2023 Falcon Shores
Aurora Update: Over 10,000 Blades Delivered, Additional Specs Released
Lastly, Intel is also offering an upgrade of Aurora, its Sapphire Rapids-based supercomputer with HBM + Ponte Vecchio to Argonne National Laboratory. A product of two lagging processors, Aurora is a lagging system that Intel has been working to catch up with. In terms of the hardware itself, the light is in sight at the end of the tunnel, as Intel is finalizing delivery of the Aurora compute blades.
To date, Intel has delivered over 10,000 blades for Aurora, very close to the system's expected final count of 10,624 nodes. Unfortunately, delivered and installed are not quite the same thing here; so while Argonne has a lot of the hardware on hand, Aurora isn't quite ready to run on the Top500 supercomputer list, leaving the AMD-based Frontier system to hold the top spot for another 6 months.
On the plus side, with Aurora hardware shipments nearly complete, Intel is finally releasing a more detailed rundown of Aurora's hardware specs. This includes not only the number of nodes and the CPUs and GPUs within them, but also the various amounts of memory and storage available to the supercomputer.
With 2 CPUs and 6 GPUs on each node, the fully assembled Aurora will consist of 21,248 Sapphire Rapids CPUs and 63,744 Ponte Vecchio GPUs and, as previously reported, peak system performance is expected to be greater than 2 ExaFLOPS of FP64 calculate. In addition to the 128GB of HBM on each GPU and 64GB of HBM on each CPU, there is an additional 1TB of DDR5 memory installed in each node. Maximum bandwidth will come from HBM to the GPUs, at 208.9 PB/second, although even “slow” DDR5 is still an aggregate of 5.95 PB/second.
And since no supercomputer announcement would be complete without some mention of AI, Intel and Argonne are developing a language-wide model/generative AI for use in Aurora, which they now call Generative AI for Science. The model will be developed specifically for scientific use, and Intel expects it to be a 1 trillion parameter model (which would put it between GPT-3 and GPT-4 in size). The expectation is that they will use Aurora for both the training and inference of this model, although in the case of the latter this is presumably only a fraction of the system given the much lower system requirements for inference.
At this point, Aurora remains on schedule for a launch this year. In addition to starting to be used in production, Intel expects the Aurora to be placed in the Top500 list for its November update, at which point it is expected to become the most powerful supercomputer in the world.
Gallery:Intel ISC 2023 Aurora
Gallery:Intel ISC 2023 Print Platform