AMD patents a chiplet GPU design, very different from Nvidia and Intel’s

Something to look forward to: AMD has released its first patent on chiplet GPU designs. In a typical AMD way, they try not to rock the boat. Chiplet GPUs are only just beginning to emerge. Intel spoke bluntly about their development process and confirmed the use of chipsets in their first generation discrete GPUs. Nvidia, although looking at the details, has published numerous research articles on the subject. AMD was the last strike – which only contributes to the intrigue.

Chiplets, as the name implies, are smaller less complex chips, meaning to work together in more powerful processors. This is probably the inevitable future for all high-performance components, and in some cases the successful present; AMD’s use of chiplet CPU designs was brilliant.

In the new December 31 patent, AMD outlines a chiplet design designed to mimic a monolithic design as closely as possible. Their hypothetical model uses two chiplets connected by a high-speed inactive intermediate link, called a cross-link.

A link connects between the L2 cache and the L3 cache in the memory hierarchy. Everything below it, like the core and L1 case and the L2 case, are aware of their separation from the other chiplet. Everything above, including the L3 cache and the GDDR memory, is shared between the chiplets.

This design is advantageous because it is conventional. AMD claims that computer devices have access to low-level memory on other chips almost as fast as they can access local low-level memory. If true, software does not need to be updated.

The same cannot be said of Intel and Nvidia’s designs. Intel plans to use two new technologies, EMIB (built-in multi-link bridge) and Foveros. The latter is an active intermediary using silicone vias, something AMD explicitly states they will do not use. With Intel’s design, the GPU can contain a system-accessible cache that drives new memory.

Nvidia did not disclose everything, but has indicated some directions they can follow. A 2017 research article describes a four-slide design and a NUMA (non-uniform memory access) conscious and location-conscious architecture. It is also experimenting with a new L1.5 cache, which holds exclusively remote access to data and is bypassed during local memory access.

AMD’s approach may sound the least imaginative, but it also sounds practical. And if history has proven anything, it’s that developer friendliness is a big advantage.

Below are additional diagrams from the patent.

Figure 2 is a cross-sectional view descending from two chiplets to the circuit board. The two chipsets (106-1 and 106-2) are stacked vertically on the passive cross-link (118) and use dedicated conductor structures to access the tracks of the cross-link (206) and then communicate with each other. Conductor structures not attached to the cross-connection (204) are connected to the circuit for power and other signal.

Figure 3 shows the cashier hierarchy. WGPs (workgroup processors) (302), which are collections of shadow cores, and GFXs (fixed function units) (304), which are dedicated single-purpose processors, connect directly to the L1 cache of a channel (306). Each chiplet contains multiple L2 cache (308) banks that are individually addressable, and also cohesive within a single chiplet. Each chiplet also contains multiple L3 cache (310) cache banks that are cohesive across the entire GPU.

The GDF (Graphic Database) (314) connects the L1 cash banks to the L2 cash banks. The SDF (scalable data file) (316) combines the L2 cache banks and connects them to the crosslink (118). The cross-connect is connected to the SDFs on all the chipsets, as well as the L3 cache banks on all the chipsets. The GDDR memory lanes (written as Memory PHY) (312) connect to L3 cache banks.

If a WGP on one chiplet required data from a GDDR bank on another chiplet, the data would be forwarded to an L3 cash bank, then over the crosslink to an SDF, then to an L2 bank and finally through a GDF to an L1 bank.

Figure 4 is a bird flight of one chipet. It shows more accurately the potential locations and scales of different components. The HBX Controller (404) manages the cross-connection, to which the chiplet is connected by HBX PHY (406) conductors. The small square in the lower left corner (408) is a possible additional connection to the cross link to connect more chiplets.

Source