Microarchitecture Primer – I6500 Scales From Single CPU To Manycore Megasystem
M.Diamant - 12.May.2019
Imagine a modern, high-performance, 64-bit CPU, running at high-frequency and with the full feature-set you may expect from a top-of-line, high-performance CPU:
Superscalar multithreaded 64-bit CPU
128-bit SIMD / FPU processing unit
Full HW virtualization
Programmable Memory Management
… and much more.
Now, imagine this CPU can actually perform as 4 virtual CPUs, with up to four instruction threads running in parallel over a 2-way symmetrical superscalar architecture. In other words, at every cycle, the scheduler introduces to execution two new instructions (one in each pipeline) from two active threads of the available four. This allows you to run incredibly fast, not only thanks to the normally great single-thread performance of the i6500, but also for the fact that, being fed from four threads, the pipelines virtually never rest, bringing your IPC (average instructions per cycle) rate very close to the theoretical maximum of 2.0 for a two-way superscalar CPU.
Of course, if you consider that occasionally, the instruction bonding feature “digests” more than one load and/or store instruction in one cycle, 128-bit SIMD can process several data words in a single cycle, and the FPU might be working in parallel to the main pipelines, your IPC skyrockets, peaking way above the already impressive 2.0 rate.
Next, lets cluster together 6 such CPUs or 24 Virtual Processors, into a powerful CPU subsystem. Such system can perform 24 workloads in parallel, without the time-consuming context switch “a la OS” (i.e., swap-in/swap-out thread contexts and data) that happens when a multi-threading HW engine is not available. In the i6500, context actually switches on every clock cycle WITHOUT swap-outs or any other form of processing delay.
Each such CPU can be configured individually, implementing the ‘inside’ part of the Heterogeneous Inside and Out vision of the i6500. We will talk about the ‘outside’ part in a few paragraphs. Thus, you may configure uniformly your CPUs for a balanced mesh design or optimize unique configurations to the various CPUs to allocate different workloads to different CPU configurations that are tailored for them upfront.
And now for the Grand Finale of our CPU adventure: imagine you can coherently connect 64 such clusters, or 384 CPUs, or 1536 virtual 64-bit processors into an Amazing CPU Megasystem that will perform at a huge performance and crunch your workloads in an accelerated pace.
You may also connect IO-coherent streams and other Wave or 3rd Party IP to our coherent network, thus realizing the ‘Out’ part of our mentioned Heterogeneous Inside and Outvision.
This is the power of the i6500! Scalable from single-cpu to huge 64-bit CPU systems, alone or together with other processing IP, for uniform or highly varying workloads and with a stunning performance.
By using MIPS, you will be dealing with a dedicated partner and team that is here with you and for you, supporting you all along your way.
For more details, start here: https://www.mips.com/products/warrior/i-class-i6500-multiprocessor-core/ ... and then continue with a visit by your local MIPS team to tell you more and answer any questions you may have. Drop me a note when you are ready, and we’ll be with you in no time.
MIPS CPU cores – Designed to Scale.