DFModel Design Space Exploration Results

Fig. 1: For GPT3 1T, heatmaps showing throughput utilization, cost efficiency, and power efficiency for a complete design space of four accelerators, five interconnection topologies, and four combinations of memory/interconnect technologies.

LLM_heatmap

Fig. 2: For GPT3 1T, latency breakdown of each design point in the complete design space of four accelerators, five interconnection topologies, and four combinations of memory/interconnect technologies.

profiler_LLM

Fig. 3: For DLRM 793B, heatmaps showing throughput utilization, cost efficiency, and power efficiency for a complete design space of four accelerators, five interconnection topologies, and four combinations of memory/interconnect technologies.

DLRM_heatmap

Fig. 4: For DLRM 793B,, latency breakdown of each design point in the complete design space of four accelerators, five interconnection topologies, and four combinations of memory/interconnect technologies.

profiler_DLRM

Fig. 5: For 5M x 5M matrix HPL, heatmaps showing throughput utilization, cost efficiency, and power efficiency for a complete design space of four accelerators, five interconnection topologies, and four combinations of memory/interconnect technologies.

HPL_heatmap

Fig. 6: For 5M x 5M matrix HPL, latency breakdown of each design point in the complete design space of four accelerators, five interconnection topologies, and four combinations of memory/interconnect technologies.

profiler_HPL

Fig. 7: For 1T-point FFT, heatmaps showing throughput utilization, cost efficiency, and power efficiency for a complete design space of four accelerators, five interconnection topologies, and four combinations of memory/interconnect technologies.

FFT_heatmap

Fig. 8: For 1T-point FFT, latency breakdown of each design point in the complete design space of four accelerators, five interconnection topologies, and four combinations of memory/interconnect technologies.

profiler_FFT