A mysterious NVIDIA GPU regarded as GPU-N which could perhaps be the first glimpse at the upcoming-gen Hopper GH100 chip has been uncovered in a new analysis paper published by the green workforce (as found out by Twitter user, Redfire).
NVIDIA Research Paper Talks ‘GPU-N’ With MCM Style and design & 8576 Cores, Could This Be Subsequent-Gen Hopper GH100?
The study paper ‘GPU Domain Specialization by way of Composable On-Bundle Architecture’ talks about a up coming-technology GPU design and style as the most realistic resolution for maximizing reduced-precision matrix math throughput to raise Deep Learning overall performance. The ‘GPU-N’ and its respective COPA models have been mentioned alongside with their achievable requirements and simulated efficiency outcomes.
The ‘GPU-N’ is reported to feature 134 SM units (vs 104 SM models of A100). This helps make up a overall of 8576 cores or a 24% boost above the current Ampere A100 alternative. The chip has been calculated at one.four GHz, the identical theoretical clock speed of the Ampere A100 and Volta V100 (not to be puzzled as the closing clocks). Other requirements consist of a 60 MB L2 cache, a 50% improve over Ampere A100, and a DRAM bandwidth of two.sixty eight TB/s that can scale up to 6.3 TB/s. The HBM2e DRAM potential is 100 GB and can be expanded up to 233 GB with the COPA implementations. It is configured all-around a 6144-little bit bus interface at clock speeds of 3.5 Gbps.
|Configuration||NVIDIA V100||NVIDIA A100||GPU-N|
|GPU frequency (GHz)||1.four||one.four||one.4|
|FP16 (TFLOPS)||one hundred twenty five||312||779|
|L2 cache (MB)||six||forty||60|
|DRAM BW (GB/s)||900||1,555||two,687|
|DRAM Potential (GB)||16||40||a hundred|
Coming to the effectiveness numbers, the ‘GPU-N’ (presumably Hopper GH100) generates 24.two TFLOPs of FP32 (24% maximize over A100) and 779 TFLOPs FP16 (two.5x improve around A100) which sounds definitely shut to the 3x gains that had been rumored for GH100 around A100. In comparison to AMD’s CDNA 2 ‘Aldebaran’ GPU on the Instinct MI250X accelerator, the FP32 overall performance is considerably less than 50 percent (95.7 TFLOPs vs 24.two TFLOPs) but the FP16 general performance is 2.15x increased.
From past info, we know that NVIDIA’s H100 accelerator would be based on an MCM option and use TSMC’s 5nm method node. Hopper is intended to have two upcoming-gen GPU modules so we are wanting at 288 SM models in total. We are unable to give a rundown on the core rely however because we don’t know the variety of cores showcased in just about every SMs but if it is really going to stick to 64 cores for every SM, then we get eighteen,432 cores which are two.25x much more than the complete GA100 GPU configuration. NVIDIA could also leverage far more FP64, FP16 & Tensor cores in just its Hopper GPU which would drive up performance immensely. And which is going to be a requirement to rival Intel’s Ponte Vecchio which is predicted to attribute 1:one FP64.
It is possible that the remaining configuration will occur with 134 of the a hundred and forty four SM models enabled on just about every GPU module and as such, we are very likely wanting at a solitary GH100 die in action. But it is unlikely that NVIDIA would reach the exact FP32 or FP64 Flops as MI200’s without having employing GPU Sparsity.
But NVIDIA may possibly probably have a top secret weapon in their sleeves and that would be the COPA-centered GPU implementation of Hopper. NVIDIA talks about two Area-Specialised COPA-GPUs dependent on future-technology architecture, a person for HPC and one particular for DL segment. The HPC variant characteristics a quite regular tactic which is composed of an MCM GPU style and the respective HBM/MC+HBM (IO) chiplets but the DL variant is where by matters start off to get intriguing. The DL variant residences a substantial cache on an fully separate die that is interconnected with the GPU modules.
|Architecture||LLC Capacity||DRAM BW||DRAM Potential|
Many variants have been outlined with up to 960 / 1920 GB of LLC (Previous-Stage-Cache), HBM2e DRAM capacities of up to 233 GB, and bandwidth of up to 6.three TB/s. These are all theoretical but specified that NVIDIA has discussed them now, we may well possible see a Hopper variant with these types of a design and style throughout the comprehensive unveil at GTC 2022.
NVIDIA Hopper GH100 ‘Preliminary Specs’:
|NVIDIA Tesla Graphics Card||Tesla K40
|Tesla P100 (SXM2)||Tesla V100 (SXM2)||NVIDIA A100 (SXM4)||NVIDIA H100 (SMX4?)|
|GPU||GK110 (Kepler)||GM200 (Maxwell)||GP100 (Pascal)||GP100 (Pascal)||GV100 (Volta)||GA100 (Ampere)||GH100 (Hopper)|
|Transistors||seven.one Billion||eight Billion||15.three Billion||15.3 Billion||21.1 Billion||fifty four.2 Billion||TBD|
|GPU Die Dimensions||551 mm2||601 mm2||610 mm2||610 mm2||815mm2||826mm2||TBD|
|SMs||fifteen||24||fifty six||56||80||108||134 (Per Module)|
|FP32 CUDA Cores Per SM||192||128||64||64||sixty four||64||64?|
|FP64 CUDA Cores / SM||sixty four||four||32||32||32||32||32?|
|FP32 CUDA Cores||2880||3072||3584||3584||5120||6912||8576 (Per Module)
|FP64 CUDA Cores||960||96||1792||1792||2560||3456||4288 (For each Module)?
|Strengthen Clock||875 MHz||1114 MHz||1329MHz||1480 MHz||1530 MHz||1410 MHz||~1400 MHz|
|TOPs (DNN/AI)||N/A||N/A||N/A||N/A||one hundred twenty five TOPs||1248 TOPs
2496 TOPs with Sparsity
|FP16 Compute||N/A||N/A||18.7 TFLOPs||21.two TFLOPs||thirty.4 TFLOPs||312 TFLOPs
624 TFLOPs with Sparsity
|779 TFLOPs (For each Module)?
1558 TFLOPs with Sparsity (For every Module)?
|FP32 Compute||5.04 TFLOPs||6.8 TFLOPs||ten. TFLOPs||10.6 TFLOPs||fifteen.7 TFLOPs||19.4 TFLOPs
156 TFLOPs With Sparsity
|24.2 TFLOPs (For every Module)?
193.6 TFLOPs With Sparsity?
|FP64 Compute||one.68 TFLOPs||.2 TFLOPs||4.seven TFLOPs||five.30 TFLOPs||7.eighty TFLOPs||19.five TFLOPs
(nine.seven TFLOPs typical)
|24.2 TFLOPs (For each Module)?
(12.one TFLOPs typical)?
|Memory Interface||384-bit GDDR5||384-bit GDDR5||4096-little bit HBM2||4096-little bit HBM2||4096-little bit HBM2||6144-bit HBM2e||6144-bit HBM2e|
|Memory Sizing||12 GB GDDR5 @ 288 GB/s||24 GB GDDR5 @ 288 GB/s||16 GB HBM2 @ 732 GB/s
12 GB HBM2 @ 549 GB/s
|16 GB HBM2 @ 732 GB/s||16 GB HBM2 @ 900 GB/s||Up To forty GB HBM2 @ one.six TB/s
Up To 80 GB HBM2 @ 1.6 TB/s
|Up To a hundred GB HBM2e @ 3.five Gbps|
|L2 Cache Size||1536 KB||3072 KB||4096 KB||4096 KB||6144 KB||40960 KB||81920 KB|
The publish Mysterious NVIDIA ‘GPU-N’ Could Be Subsequent-Gen Hopper GH100 In Disguise With 134 SMs, 8576 Cores & 2.sixty eight TB/s Bandwidth, Simulated Effectiveness Benchmarks Demonstrated by Hassan Mujtaba appeared 1st on Wccftech.