For decades, the starting point for compute architectures was the
processor. In the future, it likely will be the DRAM architecture.
Dynamic random access memory always has played a big role in
computing. Since IBM’s Robert Dennard invented DRAM back in 1966, it has
become the gold standard for off-chip memory. It’s fast, cheap,
reliable, and at least until about 20nm, it has scaled quite nicely.
There is plenty of debate about what comes next on the actual DRAM
roadmap, whether that is sub-20nm DRAM or 3D DRAM. DRAM makers are under
constant pressure to shrink features and increase density, but there
are limits. That helps explain why there is no DDR5 on the horizon, and
why LPDDR5 is the last in line for mobile devices.
All of this ties directly into compute architectures, where the next
shift may be less about the process used to create the memory than where
the memory is placed, how it is packaged, and whether a smaller form
factor is useful.
There are several options on the table in this area. The first, the
Hybrid Memory Cube (HMC), packs up to eight DRAM chips on top of a logic
layer, all connected with through-silicon vias and microbumps. This is
an efficient packaging approach, and it has been proven to be
significantly faster than the dual in-line memory modules (DIMMs) found
in most computers and mobile devices. But it’s also proprietary and may
never achieve the kinds of economies of scale that DRAM is known for.
HMC was introduced in 2011, but systems using these chips didn’t
start rolling out commercially until last year. The problem for HMC is
that the second generation of high-bandwidth memory, a rival approach,
also began rolling out last year. HBM likewise packs up to eight DRAM
chips and connects them to the processor using a silicon interposer. HBM
has a couple of important advantages, though. First, it is a JEDEC
standard. And second, there are currently two commercial sources for
these chips—SK Hynix and Samsung.
A third approach, which Rambus is exploring, is to put DRAM on a
single card that can be shared by racks of servers in a data center. The
goal, as with the other memory approaches, is to limit the distance
that huge amounts of data have to travel before back and forth to be
processed. This approach shows some merit in the cloud world, where huge
data centers need a solution for minimizing distances that data needs
to travel.
The key in all of these approaches is understanding that it isn’t the
processor that is the bottleneck in compute performance anymore. It’s
the movement of data from one or more processor cores in and out of
memory. Processor cores, regardless of whether they are CPUs, GPUs, MPUs
or even DSPs, generally run fast enough for most applications if there
is an open path to memory. Just turning up the clock speed on processors
doesn’t necessarily improve performance, and the energy costs are
significant. Those costs can be measured in data center operating costs
and and mobile device battery life.
The two big knobs for boosting performance are more efficient
software (a subject for another story), and faster movement of data in
and out of memory. While multiple levels of embedded SRAM help improve
processor performance for some basic functionality, the real heavy
lifting on the memory side will continue to involve DRAM for the
foreseeable future. That requires a change in memory packaging and I/O,
but in the future it also will become a driver for new packaging
approaches for entire systems, from the SoC all the way up to the end
system format.
New memory types will come along to fill in the spaces between SRAM
and DRAM—notably MRAM, ReRAM and 3D XPoint—but there will always be a
need for a more efficient DRAM configuration. What will change is that
entire chip architectures will begin to wrap around memories rather than
processors, softening the impact of what arguably is one of the biggest
shifts in the history of computing.
http://semiengineering.com/its-all-about-dram/
No comments:
Post a Comment