AMD’s CTO talks about how to achieve more performance per watt and how chip architectures are changing.
Mark Papermaster, chief technology officer at Advanced Micro
Devices, sat down with Semiconductor Engineering to discuss how to keep
improving performance per watt, new packaging options, and the
increasing focus on customization for specific tasks. What follows are
excerpts of that conversation.
SE: As we get more into the IoT and we have to deal with more
data, not to mention cars where data needs to be processed and
moved very quickly, the focus seems to be shifting back to performance.
Are people demanding power or performance, understanding they don’t want
power to go up?
Papermaster: That’s one thing that is actually
fundamental. It’s about performance per watt, performance at a given
energy level. It affects everything from PCs and datacenters to IoT
devices and phones. The faster you get a task done, the more performance
you have. As soon as that task is done, you can return down to a zero
state of energy dissipation. The more efficient processing you can
implement into your design, the more you are improving your energy
efficiency.
SE: But a general-purpose big processor is not necessary what you need everywhere. How does that change things?
Papermaster: For sure you have to tailor processing
for your task. If you look at AMD’s lineup, we have a range of computing
capabilities. Just in processors we have a range of low-power,
energy-optimized cores—so they are going to have less area, typically be
less expensive—all the way up to adding more CPUs and leveraging
parallelism and more efficiency in the cores to tackle the more
demanding tasks. All of them are designed to be energy-efficient. The
amount of ‘oomph’ you put into that CPU core changes, depending on the
application that you are targeting that processor.
SE: How does your architecture change as you go forward, given it is getting harder to stay on Moore’s Law?
Papermaster:
Moore’s Law
hasn’t gone away. I call it ‘Moore’s Law Plus’. Moore’s Law was about
doubling performance and keeping your cost and energy dissipation the
same. It’s an economic statement. That economic demand from our
customers remains. We are going about it two ways. First, it is in the
design itself. You have to make the designs, from an architecture
standpoint, more and more efficient going forward. We just designed a
brand new CPU core, Zen, from the ground up. We actually started this
effort in late 2012, so we’ve been working on it for four years. It
takes four years to get a brand new x86, high-performance CPU done. We
are right on track. It’s a very modern core and very efficient in terms
of driving that performance per watt of energy, and it’s very scalable.
We also designed it to work very well with accelerators, like our GPUs.
You can add more CPUs if you need to get more work done, and you can
connect to GPUs,
FPGAs, or other accelerators.
SE: What’s different in Zen versus what you were doing in the past?
Papermaster: When we looked at Zen, we decided to
make a change. We had a power-optimized set of processors for the low
end. We had a very high-performance set of processors for the mid- and
high-end ranges. In Zen, we wanted a new and modern core in every
respect, meaning it can handle a range of workloads. It has high
throughput, energy efficiency and floating point efficiency. It can
scale from low-end applications to high-performance applications. That
is done with both design and process. Design is microarchitecture,
attacking every element of the execution units, of the cache subsystem,
of the scheduling, every aspect to ensure you are removing bottlenecks.
Technology is twofold. We’ve leveraged the new 14nm
finFET
technology. The scalability you have with finFETs is really quite a
large range because it has very little leakage. When you turn off your
clocks—when you are not doing active work—you can get very close to nil
energy, and leakage is lower than previous technologies. Yet as you turn
on your clocks and accelerate your workloads, you get very fast
performance per watt.
SE: Let’s look at throughput and how you achieve that. How do
you move data internally and externally at a higher rate of speed than
what you were doing in the past?
Papermaster: With any microprocessor, it’s about
designing a balanced machine. You have to look at the demand internally
of all of your execution units. You have to look at the amount of
bandwidth that you need and how you optimize bandwidth and latency. How
big is your pipe feeding those engines? How fast can you move data in
and out of those engines? That was the core principle behind the Zen CPU
design. That extends outside as well as you interconnect to the rest of
the world. It’s the same thing on memory and I/O. You need enough
bandwidth and pipes to optimize your latency to ensure you don’t create
bottlenecks.
SE: What else did you have to do?
Papermaster: We looked at what we could do to speed
up both, ensuring no bottlenecks in terms of the execution flow. We’ve
improved the micro-op cache, the efficiency of getting those
instructions into the pipe. We’ve also made a number of efficiencies in
terms of reducing the number of cycles executing though our execution
units. In terms of
memory and feeding it, we’ve optimized our cache subsystem. We stepped back and looked at where the workloads are.
SE: How do you reduce the number of cycles? Is that embedded software or external software?
Papermaster: No, it’s low-level. It’s our Zen
designers rolling up their sleeves and bringing creativity to optimize
how many clock ticks/cycles they can get an instruction completed in.
It’s hardcore engineering of pipelining your microprocessor.
SE: What do you get for that in terms of performance?
Papermaster: We set a target of having a 40%
instruction per clock improvement over the previous generation. We are
shipping Excavator today, which is the previous core that we have in our
AMD products. When Zen comes out in early 2017, it is going to have a
40% improvement. The only way you can get that is to use a combination
of every aspect of the design, of feeding the engine, of optimizing the
engine itself and improving the throughput to the engine. Those are the
three key elements in terms of how you get improvements. Anyone who has
been around microprocessors design for a while will say it is not rocket
science. They’re right, but those are the levers. It’s about breaking
it down into dozens and dozens of specific changes you drive into a
design.
SE: So what’s changed on the software side?
Papermaster: We’re committed to open source software. You look at our microprocessor, we have an
LLVM
open source compiler to optimize the performance you get out of the
CPU. When you look at accelerators, GPUs, we took our stack and put in
open source. If you go to
www.gpuopen.com you’ll see the software and the tools it takes to accelerate using our Radeon technology.
SE: Particularly in the GPU space, you’ve started employing some 2.5D packaging. Where does that fit into everything?
Papermaster: We rolled out our R9 Fury, which is our Radeon product with
2.5D
technology. With R9 Fury and Fury X, we’ve brought memory in closer.
Leveraging that packaging technology, we take stacks of high-bandwidth
memory and bring it right on the same silicon carrier that the GPU chip
resides on. That drastically reduces the time it takes to get at memory,
to suck data in from memory and put it back in memory, and it saves
tremendous energy. As you move that data around, it’s a very short
connection over silicon rather than driving off that dGPU (discrete
graphics processing unit) into a separate memory unit.
SE: Any plans to add that kind of architecture into the CPU side?
Papermaster: We see
HBM
having expanded applications in the future. The biggest driver of that
is the HBM cost coming down. It works great today on the high end of our
discrete graphics line, and as the cost comes down, you’ll see the
applications periphery grow.
SE: Is the cost the memory, the interposer or where are you seeing the problem?
Papermaster: Both. Costs come down generation by
generation over any technology as it matures. As the volumes go up, HBM
costs will go down. The same holds for
interposer. As the manufacturing volumes go up and the
OSAT industry gains more expertise in the packaging techniques, costs will go down there, as well.
SE: What do you see as the evolution of advanced packaging?
Papermaster:
Packaging
and integration and how you put different solutions together will keep
us on a Moore’s Law pace of performance increase, doubling every 18 to
24 months. It’s a fundamental enabler. It did start with 2.5D. Looking
forward, you’ll see 3D integration, where you can stack more complex
devices—active over active devices. You will see new types of organic
packaging coming out, with very dense interconnects, allowing multi-chip
connections at lower cost points. This will spur the ability to mix and
match different CPUs, GPUs, accelerators, and different technology
nodes. When you get that kind of heterogeneous implementation of engines
with cost-effective integration, you’re going make big gains in
performance efficiency.
SE: You also potentially can get to market much quicker with a customized solution than what you can do now, right?
Papermaster: Sure. When you have monolithic
integration on a single die, each element that goes in that monolithic
silicon has to be all created on that single development schedule, all
optimized on that new piece of silicon. In a space like mobile, you have
to do that to be able to hit the cost point and the massive scale.
Think about smartphone, tablet, and low-end PC with high volume. My
sense is that those will stay monolithic. But as you move up the value
chain and you need more tailoring, it creates a lot of options for
customers to create very optimized and innovative solutions.
SE: It’s really the world turned upside down as it used to be
the high value solutions were the ones that were high volume. Now it
seems like we are getting into higher value as we move into customized
or semi-customized solutions.
Papermaster: There are new trends driving a new era
of computing. On the compute side, there is the big data analytics, the
raw number crunching that you needed in the mega data centers to be able
to provide the businesses with the information they need. Then there’s
the visualization side with virtual and augmented reality requiring
incredible rendering to be able to create new environments and analyze
data in different ways or create new markets. You’ll see whole new areas
of applications of this immersive technology. Both the number crunching
side, the analytics to be able to handle all this data, and then the
visualization side all require high performance. They will require
technology to be put together in different ways than it has to date.
SE: That needs flexibility, correct?
Papermaster: It does. Think back to the mobile phone
era. We started out with a number of mobile phones, but it wasn’t quite
taking off. You had initial smartphones and then there was an
application world that was set up. Apple innovated with dropping
applications on the iPhone and then others followed. It drove the
explosion of applications. The same thing will happen with these new
areas. It will start off targeted, but as the software and the
applications start to grow, you’ll see the hardware matching those use
cases. Gaming and entertainment may be one form factor. Medical may be
another set of targeted form factors of the technology.
SE: What about security? How do you approach this?
Papermaster: We look at security in a very
straightforward way. It is fundamental for users to adopt that
technology. It is a must. Our technology has to have a very strong
bedrock of security. We’ve approached this in a way that has leveraged
our experience with game consoles. We’ve worked with the game console
providers with semi-custom technology. Thinking about their business,
you have to protect the titles. We’ve had to become very good at
security. We have partnered with ARM and embedded TrustZone in every one
of our microprocessors and graphic processors. It’s an ARM processor
with a TrustZone implementation, with AMD elements around that
architecture, our own cryptography and our own carefully designed
controllability access technology. From the very moment you boot that
engine and allow access to element on the chip design, it’s in a
controlled secured environment with controlled access.
SE: ARM has moved from TrustZone to the chain of trust concept.
Papermaster: That’s an ecosystem beyond devices and
AMD ties into any ecosystem. The trust and security that we build into
our microprocessor is extensible from any consumer device, consumer or
commercial PC, right up to server and network applications. We provide
full security within any of our microprocessor elements and create an
application interface that we can tie into the kind of other ecosystems
that are out there. Again, it’s our commitment to partnering and open
standards.
SE: We’ve been following that for quite a while, but no matter how good it gets, hackers still break in.
Papermaster: You can’t compromise when you encrypt
data, so our philosophy is based upon a secure and authenticated access,
and our customers deciding when they want to be in a secure
environment. When they do, their data can be encrypted.
SE: Going back to scalability, how do you get there? Is the
architecture scalable? Is it more chips that you are adding? More cores?
Papermaster: Scalability is about how you architect
your design, from the inception of design. We designed Zen to be high
performance and energy efficient. It’s in our test phases nearing our
ship date at beginning of 2017. From there, we design in scalability at
the outset. We have a long history of being able to scale microprocessor
cores so we built on that history. We tuned up our hyper transport even
further and have outstanding scalability as you add cores. As you look
at how we connect to the rest of the world through I/O, we have a very
robust I/O history across our designs. Scalability and connectivity are
key elements of the system design.
SE: Do you solely use bulk CMOS, or are looking at some new materials?
Papermaster: These are bulk
CMOS
designs. We work closely with the foundries. When you look at their
roadmaps in the long term, you’ll see bulk CMOS. You can see tinkering
with the metallurgy and device structure, and you’ll see compound
structures needed to continue device scalability. All that is coming. It
is still largely a bulk CMOS approach, but you are going to see a lot
of innovation.
SE: The slowdown of Moore’s Law seems to have triggered more
creativity than ever. It’s not a question anymore of just shrinking.
It’s now about turning all these knobs at once, right?
Papermaster: Yes. Gone are the days of mapping to the
next technology node, knowing that you’ll stay ahead of the competitive
curve. It is architecture and design, along with process technology,
along with having innovative ability to connect heterogeneous solutions
together.
SE: Does it get to the point that one architecture doesn’t
fit all? With our moves to lots more vertical markets, like medical, the
current architecture may not apply. Do you start doing multiple
iterations that you didn’t have to do in the past?
Papermaster: There’s a limit in terms of innovative
architecture—software. Time and time again, we’ve seen someone has a
better way and whole new architecture to go out and solve a problem. The
problem is that when you have that widget, there’s no software to run
on it. It’s an immense task to get that software ecosystem in place. So,
we’re leveraging that x86 ecosystem on our CPUs. It’s tried and true,
with a massive installed base. Our view is that you really need to work
closely with the software ecosystem to allow them to unlock the full
potential of our architectures.
SE: That’s been one of the big problems is that software has
been one step and too far removed. Hardware and software is developed
separately and then they try to bridge the two of them. If they are
worked together, you can have immense improvements in terms of
performance and efficiency.
Papermaster: We looked right up front at the
software ecosystem that we are targeting and we worked back from that to
make sure we are bringing value to that ecosystem and partnering with
that ecosystem.
SE: It’s partly hardware defining software, but also software defining hardware?
Papermaster: Absolutely. The old days that you can
be off in a corner and devise a better solution without integrating with
the ecosystem are gone.
SE: Are you working the Linaro side, as well?
Papermaster: We do. We are engaged with Linaro and
we introduced the A1100 ARM processor, 8 core ARM device and we offer it
in our produce mix today. Our view is that we welcome competition from
an ISA standpoint. We are focused with Zen on returning high performance
with X86, our heritage. But the reason we put out the A1100 and are
watching that space is if ARM takes off, it is not hard for us to pivot
and add that to our product portfolio as well. We are focused on X86 and
we’re watching the ARM space, engaged with Linaro and other consortiums
as well.
SE: What does the sale of ARM to SoftBank do to you and your relationship to ARM?
Papermaster: We don’t anticipate a change.
SE: When you look out, what worries you most about where you are going next and what is coming in the future?
Papermaster: I’m not worried about the future—I’m a
battle-scarred veteran. I’ve been through many times where the pundits
said the ‘end is near’. It was foretold the end of semiconductor
scaling, yet we continue to see advancements in the semiconductor
capability. You hear about the end of innovation in terms of compute
engines. I see no end in sight in terms of driving innovation into our
CPU and GPU engines.
http://semiengineering.com/the-zen-of-processor-design/