AMD Instinct MI200: Multi-chip GPUs with up to 47.9 TFLOPS, 128 GB and 560 W

AMD has introduced the next generation of Instinct graphics cards designed for high-performance computing (HPC): the Instinct MI200 series. It uses the 2nd generation of the CDNA-2 architecture, which transfers AMD’s multi-chip approach from the CPU to the GPU series. Two GPU dies on one package promise performance records.

CDNA vs. RDNA

CDNA is a pure compute GPU architecture and completely dispenses with the so-called fixed-function hardware and therefore has no texture units, rasterizers or even monitor outputs at all. The Infinity Cache introduced with RDNA 2 in gaming GPUs is also missing. Instead, there are faster HBM2 graphics memories on the wide memory interface directly on the package and a few more significant differences. The biggest concern is the multi-chip approach.

CDNA 2 is a multi-chip architecture

With CDNA 2, AMD is porting the multi-chip approach introduced with Zen for CPUs into the GPU segment and all rumours indicate that RDNA 3 will follow the same path with gaming graphics cards.

With the Instinct MI200, which will be available in the three SKUs MI250X, MI250 and MI210 at the start, this means in concrete terms: On the gigantic package, eight HBM2 stacks are not lined up around one, but around two GPU dies, the basis of which remains Graphics Core Next (GCN) is. The third die traded in the rumour mill for RDNA 3, which is supposed to contain the Infinity Cache, is not used in CDNA 2. Both GPU dies to communicate with one another via an ” Ultra High Bandwith Die Interconnect “.

Each CDNA-2 chip contains 110 CUs or 7,040 shaders and is made up of 29 billion transistors (Navi 21: 26.8 billion), resulting in a total of up to 220 CUs or 14,080 shaders and 58 billion transistors. It is not known whether this corresponds to the full expansion. The fact that the Arcturus GPU of the 1st generation CDNA already offered 120 CUs suggests a slightly trimmed design.

Faster matrix cores and faster memory

AMD has significantly expanded the ability of the GPU to handle matrices, as they are used in particular in AI applications: Up to 880 so-called matrix cores of the 2nd generation, which are now also compatible with FP64, are available.

With HBM2e, Aldebaran also uses faster storage. With a clock frequency of 1.6 instead of 1.2 GHz via a now 8.096-bit wide interface, a memory bandwidth of 3.2 TB / s is achieved. That is more than the Infinity Cache in Navi 21 achieves (2.0 TB / s). The connection from the memory to the GPU is also new: the so-called 2.5D Elevated Fan-Out Bridge (EFB) simplifies packaging and lowers its costs because the bridge chip is no longer in the specially adapted base substrate but in an especially moved in additional level. This approach also scales better.

AMD’s first chip from 6 nm production

The new multi-chip GPU, codenamed “Aldebaran”, is AMD’s first chip to be produced at TSMC using the 6 nm manufacturing process. Also for RDNA 3 (5 nm and 6 nm) and the APU Rembrandt, which will combine Zen 3 cores with an RDNA 2 GPU, should appear on this basis in 2022. Even Zen 3+ in 6 nm has already been mentioned, but it will not leave the rumour mill.

Instinct MI250X and MI250 in OAM design

Like the first two products based on CDNA 2, AMD today presents Instinct MI250X and Instinct MI250. Both come in the form factor OCP Accelerator Module (OAM) , which the Open Compute Project has standardized. In turn, Nvidia uses a proprietary standard for the A100.

Up to 560 watts TDP

The only difference between the MI250X and MI250 is the number of active compute units; the maximum clock rate and maximum board power are the same at 1,700 MHz and 560 watts, respectively. The 560 watts require that the cooling in the server rack takes place via water cooling, with passive cooling it is 500 watts each – at which rate, AMD does not say.

Both SKUS use 128 GB HBM2e with 1.6 GHz clock frequency. The fact that AMD speaks of up to 128 GB in relation to the series could also offer other variants.

	Instinct MI200		Instinct MI100
Expression	MI250X	MI250	MI100
Key data
architecture	CDNA 2		CDNA
production	“6 nm”		7 nm TSMC FinFET
Compute units	220	208	120
Shader	14,080	13,312	7,680
max. cycle	1,700 MHz		1,502 MHz
Storage	128 GB HBM2e, 8.092 bit (1.6 GHz, 3.2 TB / s)		32 GB HBM2, 4,096 bit (1.2 GHz, 1.2 TB / s)
Infinity Fabric Links	8th	6th	3rd
max. board power	500 (passive) or 560 watts (Wakü)		300 watts (passive)
Power*
FP64 / FP32 vector	47.9 / 47.9 TFLOPS	45.3 / 45.3 TFLOPS	11.5 / 23.1 TFLOPS
FP64 / FP32 matrix	95.7 / 95.7 TFLOPS	90.5 / 90.5 TFLOPS	– / 46.1 TFLOPS
* MI200 at 560 watts. The key data and performance values of the MI200 in PCIe plug-in card format are not yet known

The promised performance is enormous

Based on its own benchmarks of the Instinct MI250X with 560-watt TDP, AMD is predicting that performance will be more than double that of the MI100 in classic vector loads with simple precision (FP32). With “only” an 87 percent increase in power consumption, efficiency increases. This is more clearly the case with double precision (FP64), which the new GPU can execute in one pass for the first time: Here, the performance increases by more than a factor of four. Compared to MI100, the calculation of matrices is also up by a factor of two; FP64 is possible for the first time with MI200.

However, putting the MI100 in the shade is only the 2nd priority for AMD. The whole presentation of the new series makes it clear: the core is about Nvidia’s A100, which AMD clearly wants to beat in HPC loads and, thanks to its brute multi-GPU performance, even in applications such as AI training, although Nvidia Ampere with the tensor cores which offers better-optimized units.

The first batches are for Frontier

AMD Instinct MI200 is shipping now, but initially to only one customer: HP. The manufacturer is currently installing the Frontier supercomputer in the US, which is set to become the first exascale supercomputer outside of China. The first scientists should be able to use the computer at the Oak Ridge National Laboratory in 2022.

Instinct MI210: Coming soon also in plug-in card format

Also announced, but not yet presented in detail, AMD has the Instinct MI210. Like the MI100, it comes in the classic PCIe plug-in card format (PCIe 4.0) and should appear “ soon ” in early 2022. AMD does not reveal anything about the key data. It is quite possible that the “up to 128 GB HBM2e” refer to this model of the series and that there is (also) less memory here.

In the evening, AMD also officially presented the new Epyc CPUs of the “Milan X” type with 768 MB L3 cache thanks to stacked 3D V cache chips.