Tj Host

Wedding Ideas & Inspiration
“Moore’s Law Is Really Dead: What Next?” at ACM Turing 50 Celebration

“Moore’s Law Is Really Dead: What Next?” at ACM Turing 50 Celebration


– So now, another very noncontroversial topic. Moore’s law is really dead. What’s next? Next panel is focusing on what will be the
aftermath of the phenomenon in Moore’s law. We’re very on it, and moderator for this discussion
is computer scientist in the form of president of Stanford University, John Hennessy. John and your panel, please come up. – Whatever order you like is fine. So what do we mean when we say Moore’s law
is dead? Do we mean that transistors will never get
faster, will never get more dense? It really helps to go back, and look at Moore’s
original projection. In 1965, he wrote the first paper that actually
projected that semiconductor density would increase very year. In 1975, he modified it to talk about an increase
every two years. And in fact, that rate, that exponential growth
rate was maintained for roughly the next 25 years after his revised paper. And then Moore’s law began perhaps what we
might call dying or slowing down anyway. We went from a doubling every two years to
roughly a doubling every three years sometime between 2000, 2005, and more recently we’ve
been close to a doubling every four years. So we’re slowing down. We’re reaching the end of silicon technology
as we’ve known it but there’s another key factor here that rarely gets talked about
except among people who are friends with electrons as our good friend Chuck Thacker was, and
that’s what’s called Dennard scaling. Dennard scaling is a property that says that
as the devices get smaller, their energy consumption also drops at the same rate. What that meant was for many years, you could
get the same square millimeter of silicon to consume the same energy, which made it
very easy. Dennard scaling ended actually before Moore’s
law actually ended, and it’s been nonoperational for nearly 10 or 15 years, and that’s created
another problem, created this so-called era of dark silicon. We all turned quickly to multi-core. We thought that was gonna solve all the problems
’cause we didn’t had built faster uniprocessors, and lo and behold, along comes the end of
Dennard scaling, meaning that more and more processors cannot run faster because they
burn up. Of course we’ve all built and relied on this
tremendous hardware improvement. It’s made it possible to do things like deep
neural networks. It’s made it possible to build software that
uses layer after layer and software reuse, and still not worry a great deal about the
hardware efficiency. The hardware just kept getting faster and
faster. Now we’re in a different era. Perhaps we’re entering an era where dark silicon
will mean the dark age for computer science. Perhaps it will mean we will have to rethink
the way we program or rethink the way we build machines. To address this problem, we have a great panel
here today. To my left, Doug Burger, former professor
at UT, Austin, now a distinguished engineer at Microsoft working on accelerating computing
in the cloud. Norm Jouppi, one of the original MIPS team
members some 30 years ago, and then spent time at DECWRL, HP, and is now at Google working
on processing in the TPU arena. Butler Lampson, Turing laureate who’s probably
done more work on machines that changed our lives in just about anybody I know including
inventing the modern, what we think of as the personal computer. Butler is now a technical fellow at IBM, at
Microsoft, I’m sorry. There’s another merger coming up here, and
also an adjunct faculty professor at MIT. And finally, Margaret Martonosi, a chair professor
at Princeton who was recently a Jefferson Science Fellow in the US Department of State,
and her work is focused on power-efficient systems, a critical issue for this. So what I’ve been asked is each panelist has
to make a brief opening statement just to get the ball rolling. If anybody has any robust objections to any
panelist’s statements, they can jump on them right away. Then I have a few questions, and the audience
is the critical factor in any great panel so we’re expecting you to ask definitely challenging
questions here, and the students will be passing out index cards. Let’s start with you, Doug. – Okay. First of all, I’m more of a stickler for the
definition of Moore’s law as we are discussing today. Moore’s law was about a rate. It got adjusted once about 40 years ago, and
if I thought that it would get adjusted again, and keep going at that rate for a while, I’d
say that Moore’s law is alive and well but the precise definition is about a rate, and
I think we’re kind of in the end game of a predictable rate, and there are only so many
generations left before we hit really hard atomic limits, and that number is not very
many, and the causes will grow quickly. What does it mean for a 50-year exponential
to end? We’ve been in this exponential for our entire
lives. An analogy I like to draw is global warming,
right? We’re just in a new regime that’s gonna be
transformational, and we know it’s there. It’s like a slow-moving force but we can’t
attribute any one event to it. So a major acquisition by a semiconductor
company, a shift in architecture, consolidation, is that because of the failure of Dennard
scaling, or is Moore’s law being dead or dying depending on where we are in the rate. I think we don’t know but those oscillations
are going to increase in the amplitude, and I think 20 years from now, the industry will
be unrecognizable compared to what it is today because of those oscillations, those computing
stacks, architectures, languages will look very different, at least if you have high-performance
speeds. So what should we do? I have six approaches. I’m not gonna go through them in detail ’cause
I don’t have time. – [John] We have 45 minutes prepared. – That’s right, that’s right. Cut me down 90%. All right. So I’ll list three directions forward. I call them the obvious, the ugly, and the
evolution, and then three new directions, the smart, the crazy, and the wild. Okay. So the obvious is to improve performance within
our current paradigms so mining the fat out of the software stacks, layers of interpretation
as John said earlier today, get our general-purpose processors running faster. They’re less than 1% efficient. A floating-point operation on a modern processor
is 30 picojoules, and the instruction to do that is 10 nanojoules to factor of 300 different
so I think there’s a lot of opportunity left there but we’re not really focused on it as
a community but that’s within our current paradigm. Okay so the ugly is I think an inexorable
trends towards domain-specific architectures in stacks so we’re gonna create frankensystems
with just many, many stacks of different silicon, different languages, okay, and it’s gonna
be ugly but the industry is big enough now, and these are important enough problems that
we can accommodate that. So that’s gonna happen but it’s gonna be ugly. And then there is an evolution. I think we’re moving to new architectures,
and some of my current work has been towards what I like to call spatial computing. It’s not a new idea. I didn’t invent the term but CPUs are really
temporal computing. You have a small working set of data, and
you stream and structure through those, and if you’re changing that working set out, you’re
really slow. Spatial computing is the transpose of that. You fix the instructions down, and you stream
data through, and that’s one reason we made such a big investment in FPGAs at Microsoft
’cause you could put down these functions or instructions, and then stream data through
at lime rate, lime rate, and that happens lots of place in the cloud so that’s I think
a new paradigm. It’s hard and it’s ugly and the languages
aren’t there yet but I think there’ll be a lot more of that, and FPGAs are useful ’cause
you can actually change the function while things are evolving rapidly. Okay so then going to the three new paradigms,
I gave a talk a few years ago at Microsoft’s Faculty Summit. What’s the long-term opportunity? So one is neural or AI, deep neural networks. I hate working in hot areas ’cause there’s
too many smart people. This one feels real so I’m violating my rule
because I think it’s really, really important. There’s something deep and fundamental here,
and we don’t really understand it, and so the CPU performance gains are going like this,
deep network requirements are going like this for performance, and that gap is why we’re
seeing this Cambrian explosion of new architecture, some of which are famously enormous, and I
think one really interesting thing in that space is that we’re doing vector computing,
we’re doing matrix vector multiply. You can reduce precision. You can make those more efficient. You can benefit from the silicon scaling. That’s gonna replay it over the next three
or four years, and what comes after that I think is a really important question. There is a huge gaps, many, many orders of
magnitude between that and the brain so that’s gonna be a fascinating, fascinating time. And then there is the crazy so that’s the
smart. The crazy is quantum. I think the challenge there is the algorithms. I think we’ll build them. Can we solve really important problems with
it? Is that a general-purpose thing or a very
specialized thing? I think that’s TVD, and it really relies on
the theorists and the algorithm people to figure that out. And then for the wild, I think this is the
last point, it’s programmable biology. That’s something that really needs computer
architecture and computer science thoughts behind it so protein pathways, gene expressions
leading to protein pathways. There is an architecture there that we don’t
fully understand. Understanding is very sparse. Getting a handle on that will be really important. I think that’s the longest term one but that’s
much longer term. So I guess to conclude, I think I’m at time,
Moore’s law has given us a free ride in performance with existing paradigms for five decades and
more, and that free ride is just about over so we’re entering a wild, messy, disruptive
time, and it sounds like a lot of fun. – Margaret, why don’t take it next? – Sure, thanks. So I wanted to use my time to tell most of
you who are not circuits and architecture people why you need to care, and you need
to care because it’s gonna be a wild time, and it’s not just gonna be the hardware people
who are having a wild time here. I think that’s very important. We’ve actually already seen the start of this. Over the past 10 to 15 years, we saw this
upsurge in the use of on-chip parallels in which it dramatically changed software already,
and will change it much more as we adopt more parallels, and within applications. And then the second wave has been the adoption
of heterogeneous specialized accelerators on chips, which again sort of pushes things
in a new direction, and dramatically changes software so there’s something else that’s
roughly 50 years old besides Moore’s law and me, and that is the instructions and architecture
so for 50 years, the deal was that Moore’s law was for delivering transistors relatively
regularly, and hardware people were working hard for you to create faster and faster processors
that sat underneath a durable abstraction layer, a hardware-software contract such that
the software above it could get relatively free performance improvements with relatively
few changes, relatively little need for porting or tuning. That has changed. So as one contrast, about 25, 30 years ago,
processor chips hit a previous power spike and power density but we were able to shift
technologies. That power spike is what drove the SimOS adoption
to a large degree, and software people didn’t really see that shift as far as I can tell
at all whereas today’s shift is quite different because we don’t have the good alternatives
so they’re queued up and ready to splice in. So inside our computer systems today, we’re
increasingly likely to have many ISAs present, and this is true whether you’re in mobile
or cloud or anything in-between. Your phone typically has maybe six ISAs on
the processor chip, six different processor languages that are being spoken in there,
and many accelerators. Half the area is accelerators that have no
durable instructions and architecture at all so what that means is we know how to build
the hardware, we haven’t come up with good new ways to program it. In particular, when we move from one implementation
to another, an awful lot of the software has to change. I call it the post-ISA or post-CPU era not
because we’re done with ISAs or CPUs but because they don’t have the durable abstraction, overarching
abstraction powers that they used to have. The amount of reworking varies but it’s often
pretty broad rewriting of software for new mixes of CPU, GPU, and accelerators, sometimes
shielded by libraries or APIs but someone’s dealing with it, and it’s not fun, and that
brings me to the second big issue, which is about correctness and verification. We all know how hard it is to get a computer
system right, both the hardware and the software. We’re increasingly worried about security
and reliability as well, and the current technology trends are gonna make this much worse because
we’re building systems that are more heterogeneous, more complex. Major software changes are happening more
often as we rewrite things to port between things that don’t have good abstraction layers. And last thing is that because things are
changing quickly, and because we’re experimenting with new technologies, we don’t have well-specified
enduring interfaces against which to verify, against which to check correctness, check
security, and so that is gonna create a situation where we’re building systems that are harder
to keep correct and secure, and yet wanting more and more that we do so. The final aspect of that is, aside from FPGAs,
a lot of the other specialized hardware that we’re using is baking functionality into hardware
in a way that makes it very hard to push out a patch when you find a bug so that’ll be
yet another issue for the security and correctness and verification space. And the third thing I’ll stress, and maybe
this is where Doug and I differ is I think that application-driven approaches are kinda
cool, and not necessarily gross or Frankenstein or whatever you call them. – [Doug] Ugly. – I think it’s an interesting opportunity
where instead of the 1975 layering where you have the architects, the compiler people,
the applications people with these horizontal layers that we’ve often drawn, I think we’re
gonna flip it, have these domain-specific slices through those layers where there are
gonna be DNN people who are very good at reaching down and understanding how the hardware implies
something about what they should be doing up high and vice versa. So we need to be exploring design processes
that handle that flip well to be domain-specific, languages, and so forth, and we need to be
training students who have good skills up and down that stack as well because if we’re
gonna shift everything about our field, and then not think about how the curriculum and
the pedagogies should shift, then I think we’re gonna be in real trouble so a main takeaway
is this technology trend is not super new although the end game of it is new and somewhat
mysterious still, it’s definitely not something the hardware folks are gonna solve under the
covers. I think that’s already been the case. It reminds me of the frog. There is the story about if you put a frog
in boiling water, it will jump out but if you put a frog in lukewarm water, and gradually
turn up heat, supposedly it stays still. I don’t know. I haven’t experimented. But I feel like all the software people of
the world, you’re that frog. You’re sitting in water, and it’s warm enough. Thanks. – All right. Great, Margaret. Norm? – Okay, we all feel so much better after that. – [John] Besides, I like my frog sauteed rather
than boiled. – Hardware people are okay. They’re turning up the heat. – That’s a good one. Yeah. So when I saw the title of the panel, Moore’s
law is really dead, it reminded me of the Monty Python pet shop skit where with the
parrot, he’s only resting, pining, and lots of other interpretations. I think we still got a few more years but
the way I think about Moore’s law is in terms of order notation, and so if you think about
making something more precise in one dimension, that seems to turn out to be linear and increase
in cost but for many years, we’re getting a free lunch ’cause with optical lithography
where we’re getting that N factor in two dimensions so we’re getting N-square, and we’re having
Dennard scaling, that was another factor of N in terms of device performance and power
efficiency, and so we were getting this N-cubed for the price of N. That’s really a good deal. But now some of those factors are starting
to go away, and cost will be a limiting factor I think before device physics will be but
it’s neck and neck, it’s kinda close. So right now, what’s happening in optical
lithography is if you want to make something finer by a factor of N, you have to have each
mask be finer by a factor of N, and you have to have N times as many masks so your N-squared
just went away because you’re paying for twice as many masks for 2X scaling, and each mask
has to be 2X more fine. If you think about that, there is many different
applications, and so Moore’s law is not gonna be a binary event. You can’t say Moore’s law ended June 22nd
on this year or something like that, instead it’s gonna depend on the application so some
applications benefit much more from transistors than others. A classical thing from computer architecture
is if you scale cache size by a factor of N, it reduces the miss rate by a factor of
square of N. If you’re dealing with something where you’re
not getting a lot of transistors, and having to pay a high price for them, that’s probably
not a good thing, and have just bigger, bigger caches. Domain-specific architectures, I consider
them a work of art myself but they’re really good, at least the ones that we’ve done so
far are really good at easing transistors, and so if you have N-squared increase in transistors,
you can get N-squared in value out of them, and of course just like any other system,
you can waste the transistors doing needless operations but you do get that increase in
computational capability. So I think we’re gonna be seeing a lot more
domain-specific architectures going forward. If you think about other fields, and how they’ve
matured, if you look at aviation, in 1970, the 747-100 could carry 500 passengers at
almost the speed of sound, and nowadays, we have things like the Dreamliner, which are
much more power efficient. They have bigger windows, all these other
things but it’s not dramatically different but the efficiency is much better, and so
I’m hopeful that we’ll be able to refine the software as well as both the hardware. I think we’ve gotten a little sloppy in the
hardware design. Just racing and keeping up with Moore’s law,
and I remember the very careful designs we did when we didn’t have many transistors,
and they are much more efficient than what we have now so I think both on the hardware
and the software side, there is room for mining that efficiency for a long time. – Butler? – I’m not really a hardware person although
I have done some hardware design. I’m here as a substitute for Chuck Thacker
who has most of you know probably died a few weeks ago but fortunately I’ve been working
with a group at MIT to try to figure out what the consequences are gonna be for computing
as Moore’s law tapers off, and our slogan is there’s plenty of room at the top. This is a play on Richard Feynman’s famous
lecture, There’s Plenty of Room at the Bottom, which he gave in 1959 to the American Physical
Society where he predicted most of the things that have happened in electronics and nanotechnology
up to now. So what does it mean? There is software, there is algorithms, and
there’s hardware that make up the computing stack, and there’s room in all three of these
above the level of the devices. It’s not gonna be as good as Moore’s law because
in the days of Moore’s law, you get more and faster transistors down at the bottom, and
everybody up the stack could benefit without having to do anything ’cause the changes were
not visible functionally. You just got better performance at the same
price, and that’s not gonna be true anymore. As a result, progress is gonna be much more
sporadic, much more opportunistic, and much more bounded than it was in the case of Moore’s
law, and also as several other people have said, the changes are gonna be much more visible
throughout the software stack, which is gonna have very serious consequences for the way
things get developed. So on the software side, we know there’s a
lot of software bloat ’cause we’ve been getting bloat for many decades. It occurred to us that there is an interesting
way to look at it, the theorists like to understanding problems in terms of reduction. If you have one NP-complete problem, you can
show that another problem is NP-complete by showing how if you could solve the second
problem, then you could solve the first one too, which you already know you can’t do. So after people do reduction too, another
name for it is software reuse. Instead of writing a program to solve some
problem from scratch, you write a program that solves the problem using some already
existing piece of software, which is usually cheaper and a more reliable thing to do on
the development side but it’s definitely gonna consume more computing cycles and more memory,
and when you stack these things up 10 or 20 levels deep, which we definitely do nowadays,
there’s a huge amount of bloat and a lot of scope for getting rid of it at a price of
course in development cost. Algorithms, history tells us that at least
in many domain, the improvements in performance produced by better algorithms have been quite
comparable to the improvements in performance produced by Moore’s law but of course the
algorithms are quite problem-specific, and it’s also the case that improvement tapers
off after a while, often because there is some upper bound on how good the performance
can get that you can actually prove. Hardware, the iron laws of physics as they’re
currently manifested in hardware tell you that if you want to get the maximum amount
of performance out of the transistors that you can put on a chip, you have to have the
parallels and then locality of the computation exposed in such a way that the hardware can
take advantage of it, and historically we’ve not been very good at doing that. The other avenue for exploiting advances in
hardware is specialization, also known as domain-specific architectures, and it’s pretty
clear that there are very important domains in which you can get a factor of 100, that
way at least. Final thing I want to say is a consequence
of the fact that changes are not gonna be invisible anymore is that you need a strategy
for propagating the consequences of the changes through the software stack. At the highest level, the only strategy I
know for doing that is what I like to call big components, that is, you make the big
thing that consists of a bunch of hardware and a bunch of software, and you give it a
very stable interface, and then inside there, you can innovate much more rapidly than you
could, you can make changes up and down the stack inside of the big component much more
rapidly than you could if the consequences of changes at a low level had to propagate
through an entire uncontrolled software ecosystem. So a good example of that I think is the way
Google has been doing the TensorFlow thing where they’ve defined a high-level interface,
and they don’t give the users of the chip access to the chip itself, they only give
access to this high-level interface, and inside of that, they can evolve things much more
rapidly. So to sum it all up, we think there really
is plenty of room at the top both in software, in algorithms, and in hardware, and it’s gonna
take a big-component architecture to exploit these opportunities but I don’t think there’s
any doubt that there are several orders of magnitude, more performance that can be gained
by pursuing these ideas. – Good. Thank you, Butler. So let me start by saying if anybody wants
to go out on a limb, and declare whether or not there is a silver bullet solution here
or any of these ideas, whether it’s die-stacking or CrossPoint Technology or quantum, are they
about to solve the problem for us, and we just have to hold our breath until we get
to that magic point? – I did want to say that I know nothing or
much about device physics in spite of having been a student of physics in my youth. I don’t actually believe that Moore’s law
is dead. There’s lots of physical phenomena, it seems
to me, that have the potential to make it possible to have higher and higher performance
computing devices for a long time to come but it’s not gonna be silicon and SimOS, and
up to now, silicon and SimOS has definitely been the place to put your money so people
have not worked really hard on these other things, it seems to me, but that, I’m just
talking off the top of my head. I don’t actually know anything except that
there are all these phenomena. There’s atoms and spin and all kinds of cool
stuff out there. – What do you think, Norm? – You’ve thought about some of these things,
and Margaret I know has done some work on quantum. – Yeah, Doug and I, we’re talking earlier
about how much fun we’re both having fun at work, and we didn’t we’d be having that much
fun a decade ago because, sorry, there are so many different novel things to look at,
and that they can all play a part in the overall solution. It’s not a silver bullet. There’s a lot of things that we should be
investigating and benefiting from. – You were so sad 10 years ago. I’m sorry. I can see that you’re having fun. – What about quantum, Margaret? I know you’ve thought some about this issue
as well. – I’ll say something about quantum but I wanted
to react a little bit to what Butler said. Moore’s law, as we all know, is not a law
of physics, it’s a business or economic law. – [John] It’s empirical observation, yeah. – It’s about will there be reasons to keep
pouring money into a process to maintain a doubling, and so while there may be physics
still to tap, the question is at what cost will it come, and can we find physics that
reaps enough revenue benefits to warrant the investments so that’s that. You prompted me for quantum, and it’s been
close to 10 years since I wrote my first paper on quantum but I keep a side thing in a little
bit secret because architects traditionally work on things that are very real, and quantum,
10 years ago, was kind of a weird thing to even work a little bit on 10 years ago, and
so the good news is that I think quantum computing today, from a physics perspective, is amazingly
close to being real. There are folks at Google and other companies
who say that a 50 to 100-qubit machine in terms of the physics will be available in
the next year or two. There are even people who say that they will
write programs for that 50 or 100-qubit machine. It will show speedup over classical, which
is the so-called quantum supremacy or quantum advantage point so that the physics side of
the story for very narrow problems is starting to become credible in a way that wasn’t true
for many years. The harder part is that the applications that
one could run on a 50 or 100-qubit machine aren’t there. There is a huge gap between the number of
qubits you need to build viable, interesting applications using the quantum algorithms
that we currently have, and the number of qubits that the physicists will be able to
build into reliable operational systems any time soon so that’s one issue. There is this gap between qubit counts that
we might want and qubit counts that we’ll get any time soon. – Will it still be good enough to wreck public
key cryptography? – The likely outcome that many people feel
is that we’ll come up with quantum-resistant crypto, and make that somewhat mooch. It’ll be a while before it can wreck public
key crypto. That needs more qubits from what most people
say. The second thing is it won’t general, it’ll
be in a coprocessor role but many of the things we’re talking about are coprocessor roles,
and I guess the third thing I guess back to this law of physics thing is Moore’s law was
amazingly providential in a way that we came up with many, many intermediate ways of making
money off of transistors that caused that doubling cycle to sustain itself over 50 years,
and the people who are thinking hard about how to create Moore’s law for quantum computing
or having trouble thinking about what might be those intermediate points, when you get
past the kind of Sputnik or moonshot kind of initial bragging rights on 50 to 100 qubits,
and you try to move from there to a 10,000 or a million qubits, there is an awful lot
of money and engineering that will need to go into those phases, and path isn’t clear
for who would pay, and for what applications, and there’s another panel tomorrow morning
so they know more answers, yeah. – Norm, you touched on the cost issue with
respect to lithography but there are lots of other cost issues, and part of what’s causing
this slowdown is the cost of fabs is just going up by leaps and bounds, and the number
of fabs in the world are shrinking dramatically as a result. Do you think that this becomes the actual
barrier that simply you can’t afford to build many advanced fabrication capabilities? – Doug was mentioning consolidation earlier
but yeah, I think as long as we have three or four stable players, I think there will
be good competition, and like with the new iPhone every year, they have to have a better
chip in it, and so there’s a lot of market pressure to come up with the next best thing. – There’s also a lot of price pressure on
the cost of that iPhone. so they got to also be able to build it. So Doug alluded to the franken architectures
as a way to describe this meshing together. It’s clearly the case I think as several of
you pointed out that a heterogeneous computing model is certainly more complicated from the
viewpoint of verification, from the viewpoint of software. Is this going to fall as a giant burden on
programmers or are we going to come up with some magical software technology to sweep
that away? What do you think, Butler? – If it’s not gonna fall on the programmers,
where is it gonna fall? – Yeah. Maybe there is some magical technology that
can do the mapping somehow or extract. Is there? – The Fortran of heterogeneous computing? – Yeah, something or some high-level thing
that can extract the structure. To some extent, TensorFlow is a step in this
direction for a limited range of applications, right? – I think all of these things are for limited
ranges of applications. The whole nature of heterogeneous, of domain-specific
architectures, they’re for limited domains. That’s what it’s all about. – Right. So then where is the burden of, “I’ve got
this giant piece of software “with multiple levels in the stack, lots of software reuse
“but somewhere in there are some pieces “that can be sped up”? – It’s the big component story. You have to find inside it, something that
you can wrap in a stable interface, and if you can’t do that, then all you can do is
proofs of concept. You can’t do anything real. That’s an absolute requirement. – Right. So does this mean that the range of potential
applications will be rather limited? Doug, you started by mentioning deep neural
networks. That’s obvious one. GPUs is another. Are there lots of these domains out there? – [Butler] Deep neural network is not an application
yet. – No, it’s a domain. Yes, domain. – I think one important thing with DNNs, and
Norm, you should comment here too, is that they are surprisingly general like they have
moved into lots of different domains, not the same algorithm but the class. Back when people were doing speech and vision
and this and that, they were all different class of algorithms that have been tweaked
over the years, and the deep networks just kinda swept through, and replace the whole
hog so there’s something very, the risk of making a pun, deep here, and we don’t yet
have the von Neumann architecture for deep learning. If we can find that or come up with that,
that’s a huge direction forward because this is a very general thing. I think that’s why there is so much energy
in it. Yes, hype but also momentum kind of feel right
now. – Yeah, I think it is one of the biggest nuggets
in domain-specific architecture area ’cause it can do all those different things but I
think what your question, John, was actually similar to the original Moore’s law paper
where Moore said that he didn’t think there was a market for a calculator chip or this
or that chip but he could build a microprocessor, and microprocessor could be programmed to
do all those different application. So some application areas or domains will
benefit from this, and others might not so I think we’re gonna get more inequality in
terms of application speedup. And then I think another thing, going back
to what Margaret was saying with all the different components is it’s also easy to get Amdahl’s
law bottlenecks between the different components so it’s a system architect kind of problem
as well. – [Butler] Let’s hear it architecture. – So it is a system problem, and so then the
question becomes question of where is the bottleneck in this? Is it the programmer’s problem? And we’re going from an area where basically
people just wrote code. Look at how much energy went into taking the
x86 architecture, and making faster and faster and faster versions of it so we didn’t have
to touch the software at all. – One thing I’d like to comment on, hearkening
back to the previous panel, is there was a lot of talk about data preservation, and well-specified
interfaces, and ways of describing, and sorry to toot architects more but I think we did
that, right? – [Butler] Better than anyone else. – Better than anyone else. x86 is what? 47 years old now. – And it’s unbelievably complicated but it’s
actually kind of well-specified. – and it still executes. – So the failure for code to run year after
year is because of other parts of the system we didn’t specify to that same degree, the
operating system, the I/O interfaces, and so forth. So this notion of specifying interfaces well,
and then building around them is a very powerful one. – Unfortunately, programmers hate it. Yeah. But they’re gonna have to learn better. There’s gonna be no alternative. – If you want to get faster, you’re gonna
have to do it. All right, let’s take a few of these questions
’cause some of them are quite provocative. This is a great question for you, Doug. Are FPGAs a fundamentally important computing
device or just a crutch for companies and engineers without the courage or skill to
build custom chips? Okay, A-plus question here. – I want to come back with something witty
but my mind is just blank. I draw a graph sometimes when I give talks
on this. On one axis is the rate of change of the algorithm,
and the other axis is the proportion of workloads in your cloud device family that the accelerator
can benefit, and if you look in the cloud, you have 10s of millions of customers. It’s just an incredibly general-purpose thing,
and even the big properties don’t run on more than one or 2% of your service, and then for
the big online services like Bing and Google Search, they’re changing weekly or monthly
so that’s just really tough, and I think that’s even too fast for the FPGAs but you can at
least get a handle on some of that, and upgrade your algorithms so I do think there is something
general there but it’s still far too hard to program, and we’re making progress, and
the effort is coming down but the barrier is still too high to make it really general. – So they really do get flexibility. Norm, what do you think? ‘Cause you’ve gone the custom route, right? In the FPGA. – Yeah. So I think the custom can tackle those big
nuggets but the tail is basically wagging the dog in many of these data centers so I
think the FPGAs can be really useful there as almost like a microprocessor for those
applications, and something that can programmed differently. – Maybe just to follow up on that. There is a three segments to that curve. There is this stuff that’s running at really
large scale and stable. You’re hard on that. There is the stuff that’s changing too fast
or is too small-scale to justify the NRE, to put on an FPGA, and better tools bring
that down, and then there is the stuff in the middle where the economics work so those
three buckets are sort of changing in size, and so we’ll see what happens. – All right. This is a perceptive question. Margaret speaks at turning the traditional
stack of horizontal layers on end but my experience, says this questionnaire, over the past several
decades is that fewer software engineers have the requisite deep vertical background that
this would require. Haven’t we been teaching and moving in the
wrong direction given this change that’s upon us? – Aside from our esteemed moderator, I’m actually
the only academic here. By the way, don’t believe the hard copy brochures
that says I’m at Google. I’m not at Google. I’m at Princeton. I like Google but we’re just friends. So I think it is. We need to tell the story of these verticals
in a way that lets students see the impact of the full set of systems design challenges. They see the application layer just fine. That’s all around them, and that’s what they’re
drawn to, and that’s great. It’s wonderful to see. But they need to know that cloud computing
doesn’t run on actual clouds. They need to know there is hardware under
there that someone has built, and I think sometimes in some departments, they’ve lost
track of what’s supporting this massive revolution. – It’s extremely difficult to keep it on track. MIT tried very hard for a long time, and then
they’re gradually giving it up because you just can’t get the students to pay attention. – But isn’t this partly ’cause our field has
exploded by leaps and bounds? You can’t imagine having a student that doesn’t
have some exposure to machine learning now as part of their undergraduate curriculum,
and everything they’ve got to learn has just blown up by leaps and bounds. How do we get them enough knowledge about
the lower levels of the system including the software levels so that they have a better
understanding, and still get it done in four years, and have them graduate rather than
drop out ’cause we burn them out? – I think we have some nice examples of textbooks
and classes that merge some hardware and software into a single systems-oriented class. I think that’s one avenue. Another avenue is that as they’re learning
about DNNs or other more application-focused topics where they’re flocking to, making sure
that there is enough of the underlying support systems built into those classes as well. – One question here is what is the impact
of Moore’s law on Frasier’s law, which I wasn’t sure what Frasier’s law is but luckily it’s
defined, which says that the cause of computation drops by a factor of 10 every five years. So are we gonna see the computation improvements
in terms of cost of hardware? We are used at hardware dropping at least
in cost or getting faster for the same cost. Are we gonna see that slow down and end? – I think it depends on the application area. Some application areas, I think it’ll continue
for the next decade but other applications, it’s gonna be very slow and very small progress. – I guess I don’t need my cellphone to run
faster because I can’t speak any faster. – Video is really good at sucking up the transistors. – [John] Video sucks up. – Also I think, this isn’t the right way of
saying it. What people actually care about is that the
cost of running the application they care about is dropping, and that means the improvements
in software, improvements in algorithms as well as improvement in hardware can all contribute
to that, and that was kind of the point of the story about there’s plenty of room at
the top. – So talk about this efficiency issue, Butler. I think if you look at a large software system
running on a modern piece of hardware, whether it’s in the cloud or on a big server, the
inefficiency is spread all over the place. There is certainly inefficiency at the top
with multiple levels of software, especially if they’re writing a scripting language or
something else but there is lots of inefficiency in the underlying hardware, and in the exact
interface between that hardware and software. Do we have to go on an expedition to mine
that inefficiency out piece by piece in order to really get the kind of performance we need? – I don’t think we really know except in certain
fairly specialized domains, the motivation has not been there to really dig into this
question. The most that people typically have been willing
to do is to try to make better compilers but I think we have a lot of experience by now
that tells us that’s by no means efficient. – No, it’s a hard problem. We haven’t made the quantum leap that we thought
we might get in compiler technology. – And we definitely haven’t dug into it seriously
in my view. – And it’s also easy to lose a lot of performance
in large-scale distributed systems so yeah. – Although that’s certainly a domain where
the limitations of Moore’s law by no means is so compelling. There has been so much fat in the hardware
and low levels of software that run distributed systems. The communication part of distributed systems
were gradually learning how to take that fat out but I think there is still a lot of opportunity
there. – Here we have a question. Current commodity architectures are very problematic
from a time predictability point of view. The worst case, at execution time, it would
be considerably worse than the best case expected time. Are there any hope that the changes to come
will enable better time predictability in terms of computation? – [Butler] It depends on the application. – This is where specialization really makes
a difference. You can build a very deterministic specialized
pipeline for an application, get great performance predictability but you lose generality. You spend all this time building it that all
those caches that we like to complain about give us generality, and then they just make
predictability terrible. If you could merge those two and have great
predictability and great general-purpose performance, you’d be in great shape but that’s a Holy
Grail that doesn’t exist. – Predictability often involves designing
for a tail. – Yes. – Architects like to make the common case
fast. – Caches work great when they work great,
and when they don’t, it’s disaster, right? It’s the classical kinds of problem. – But in some domain-specific architectures
like GPUs and even in the TPU, we don’t have caches. – Right, right. New SimOS-compatible devices are coming. Example, TFETs and 3D stacks so the hypothesis
from this question is that the evolution will be gradual. There’ll be no big crisis. No big changes will be visible. Is this simply too optimistic a viewpoint? – Yes. – Or is it the silver bullet maybe? – No. – Too optimistic, Norm? – I think some of those things will work out
but it’s gonna be a long painful process. If you think about when different gate dielectrics
were introduced, they thought it was gonna be at like 90 nanometers, and it wasn’t until
like 45 or something because there were reliability problems. We’re seeing the same things in non-memory
technologies. People thought they had it down, and then
the error rates were higher, and endurance wasn’t as good as they thought, and so these
new technologies are often more difficult than they appear. – We haven’t talked at all about DRAMs and
memories but DRAMs are really near the end of their lifetime as we know them, right? They’re really near the end. The prediction where the next DRAM just got
shoved out another year before it’s ready, and there is no path after that next revolution
in DRAMs so we’re not gonna have memory capacity. How are we gonna deal with that? It seems like we’ve used DRAMs to hide a lot
of sins in terms of the amount of memory we use. What happens when you don’t get any improvements
in DRAMs? – It unbalances the architecture, and so it’s
yet another thing where these things are improving at different rates, and that will contort
systems so it’s a full employment offer for system architects. You need to balance. You need to re-architect the system. – Blast it. I’ve a little bit lose track of what’s been
happening last year or two but there’s no doubt that if you look at technologies like
flash, they were originally deployed to store pictures and cameras, and the interfaces that
were provided to the basic technology were incredibly poorly suitable for computing. Because the camera market was much bigger
than the computer market for flash initially, it’s been a fairly long, slow process to fix
that. My belief is it’s still the case that we by
no means have the best possible interfaces to the flash so it may well be that the fact
that the DRAM, it’ll be okay to just treat DRAM as a terabyte-sized cache, and integrate
it much better than we currently do with the next couple of levels up, which don’t have
the same gigantic gap that we used to have between DRAM and disk. – Yeah, and speaking of silver bullets, I
think vertical-NAND is the closest thing that we have to a silver bullet because flash was
supposed to stop scaling at 20 nanometers, and now we’ve got 64-level vertical-NAND flash
that’s coming on the market. – So you’re betting on vertical-NAND flash
rather than some kind of CrossPoint Technology? – I think you need to put your money on lots
of different things. – [Butler] And then you’ll take what you can
get. – Yeah. – [Doug] Put your money on every number. – Yeah, exactly. – [Butler] Then the house wins, right? – [Norm] House always wins. – But your winning are less when you win. Is there a place for portable high-level languages
in the world of domain-specific architectures, and if so, what might those languages look
like? – Yes. I think there is some really interesting work
pushing on domain-specific languages. The Frankencamera work at Stanford is one
nice example of that, and automatic compilation down to different and diverse hardware platforms. I think the best hope is actually to go for
the domain specificity at the language level, and then have it manage the heterogeneity
that’s under the covers down there. So I think that’s a huge opportunity going
forward to have the applications be specified in something that’s high level, and somewhat
agnostic to lots of hardware or software, or what kind of hardware or software, and
yet specific enough to the application that the compilers can actually get some traction
on mapping it to hardware. – The economical example of success in this
domain has been query languages for databases over the last 30 years. We should all aspire to do as well as that. – It’s possible that some of the interfaces
that we think about won’t be languages in a way we thought about them in the past but
more environments, interface environments over which everything is compiled. – Butler, what about moving up the software
stack? It seems to me there has been this tension
between functionality and performance efficiency, however you want to think it, and for the
past 30 years, functionality has triumphed. Functionality has even trumped correctness
often. More important to get it out there even if
it doesn’t work quite right. – Its slogan is worse is better. – Yeah, exactly. Do you think this will change? Or will there really be a lot of pressure
on the companies to get new functionality out more than to get it to work efficiently
or to get it to work correctly? – It depends on the application. I like to say there is two kinds of software,
which I call precise and approximate. Precise software has a spec whether or not
it’s written down carefully, and if you don’t satisfy the spec, the customer is unhappy. Approximate software has no spec. There is no spec for Facebook or Google search. It just doesn’t make sense to think in those
terms. It’s not that one kind of, computer scientists
tend to think that of course precise software is better for obvious reasons. My view both kinds are just fine but it’s
very important to know which kind you’re writing because if you are writing approximate software,
and you think it’s precise, you’re gonna do a huge amount of engineering that your customers
are not gonna appreciate, and the other way around, your customers are gonna be pissed
about the fact that the software doesn’t work but the reason, the whole reason the web was
such a success is that it doesn’t have to work. And it doesn’t work. My personal experience is when you click on
a link, and there is at least a one or two or maybe 5% chance that the wrong thing happens,
and it’s also true that if you click on it again, there’s maybe a 30 or 40 or maybe an
80 or 90% chance that it’ll work the second time but in the whole thing, it definitely
doesn’t work, which is not a criticism. You think it’s a criticism but it isn’t. – But wait a minute. I get to my bank account. – That’s a particular application of the web
that’s been done. I’m talking about the web as ordinary people
experience it. They don’t distinguish the internet part,
the HTTP part, the server code part, the yada yada ya. You’re distinguishing those things. The bank ain’t gonna be much aware of it too. – Where does AI software fit in this? – It depends on the application. – Okay, good. If it’s a self-driving car, that’d be precise
about knowing where it is and the other parts of the road. – Your bank account looks like this. – We’re talking about the software that’s
precise. Excel is precise software. People are very upset if the numbers are wrong. – It’s hard to predict where your software,
where your systems are gonna get used over time. So for example, underneath the covers of the
IV machines in your hospital, there’s typically some Windows XP running. Let that sink in, right? Did they intend for that to be on one side
of the precise, non-precise line when they wrote it, and when they shipped it? – Windows is definitely a precise software. It doesn’t mean that it always does the right
thing but it does have specs, and people get upset when the specs aren’t satisfied. Windows is definitely precise. – [John] It’s supposed to be precise software. – No no, this is a way of thinking. Does the software have a spec, and does the
customer care about the spec? It has nothing to do with how good a job you
did at building the software. – Good. – I hold no brief for Windows XP by the way. I had nothing to do with it. – I know. – But it’s an easy thing to smite that, and
I don’t think that’s particularly sensible. – It’s all supported. To have an IV machine in today’s hospital
running– – Whose fault is that? – But that’s my point is that people are glomming
these things out of other things– – Sure, of course. Yeah, of course. Software lives a lot longer than anybody ever
thought it would live, right? – My favorite story is once upon a time, there
was a 370 that was running in 360 mode. The 360 was running in 7090 emulation mode. The 7090 was running in 704 mode. The 704 was running a program, the emulator
now again 650. And the IBM 650 program was emulating a CPC,
a card program calculator. And down in the bowels of this thing, cards
were flowing through the, simulated cards were flowing through the simulated calculator. That was much faster than 150 cards per minute
that they ever flowed through a real one. – Yeah, and it was precise too. – So this is both horrible and amazing. – [Doug] Right, it is amazing. It is amazing. It is amazing. – All right. So we’ve said here that domain-specific architectures
may be a big part of the way forward. Can any of you identify a small number, say
three application, three domains, that would account for a significant amount, say 30%,
of the world’s computing load? – The GPUs probably already do that. – They’re not 30% of the world’s– – If you’re measuring floating-point operations
executed, they probably are. – Floating-point operations they could do
or floating-point operation they actually do? – Even actually do. There’s a hundred million Sony and Microsoft
gaming consoles out there. Many of them being used pretty heavily. They’re doing a lot of floating-point. It isn’t a very sensibly posed question. As is demonstrated by the fact that– – Does the person who asked this question
want to raise their hand, please? – As is demonstrated. But I just got to reinterpret it plausibly. – All right, that went fast. I thought it was a pretty good question actually. – Even when you don’t like my interpretation. – No, your interpretation is okay. – From a user perspective, aside from web
browsing, I don’t think so but if you look in the cloud as an example, if you just take
software-defined networking, the processes you’re doing, if you’re running a software
on a CPU to follow those protocols, rewrite those flows, it’s an enormous amount of computation. – Yeah. I think if you take your cellphone apart,
I’ll bet you there are more cycles in your cellphone devoted to running cellular network
and running WiFi than there are even in the GPU. – There are special image processing pipelines. There is a lot of different ISAs in there. – Right, right. – That’s already hidden. – For camera, same thing. Camera, right? You taking a video with your cellphone, you’re
doing enormous numbers of operations. – The depressing point about it is kind of
the low-hanging fruits are already gone. – It’s already done. That’s maybe the interesting, yeah. – Software-defined networking is an interesting
example of DSAs in reverse. Things that used to be done in hardware– – Yeah, are being done for software. And now there are big policies moving into
software. – Yeah, that’s the wheel of reincarnation. – The wheel on those, its data center keeps
on turning. – All right. Let’s see. I need another highly. All right. If a big part of the future of information
technology is the cloud, and if the future of computer architecture is domain-specific
architecture, will server chips of the future be more likely to be signed by cloud companies
or chip vendors? Given that we’re turning the entire stack
on its side, are we gonna re-verticalize the computer industry? After we basically had a vertical computer
industry, we turned it into a horizontal? Are we gonna be re-verticalize? – I think the answer is yes. A little bit of both, right? These domain-specific architectures connect
to servers, and so they use traditional server CPUs. It’s very expensive to all flow the whole
problem. There is a lot of random code, and little
things that have to be taken care of so I think there is a place for both. I think it’s a flowering, this Cambrian explosion
where you get more and more diversity so some will be done by traditional, the silicon vendors,
and others by cloud providers. – But will Google design its own RISC-V chips,
and stop buying x86 chips? – You can’t say. – One thing you can say is that we consolidate
it on a few processor vendors in some ways for ISA reasons, and so if x86 doesn’t dominate
it, if ISAs don’t dominate, it’s a reason to buy a particular company’s chip, then you
could imagine the spreading out over more different companies. – [Butler] That’s the whole idea behind RISC-V,
right? – Then I guess the counterargument is there
is a whole lot of specialized expertise let’s say in the chip design component, portion
of the job, and that concentrating that in a single company, which supplies multiple
vendors then is more cost-efficient in the long term. – Maybe that company has got a foundry. – Yeah. Then it’s just a foundry, and it doesn’t add
any value in terms of higher level silicon. – For example, the architecture teams at Intel
and other microprocessor vendors are very large for that reason you just mentioned because
diverse applications, they studied many, many different ones, and they created a lot of
complexity at the system level, and so getting that experience is important for building
high-performance microprocessor. – It’s incredibly hard and very expensive,
and yet the server market is consolidating on cloud vendors, which are going very large,
and so how those two forces play out remains to be seen. – Okay. Since we have about a minute left, I want
everybody’s final robust and aggressive thought about the future so that the audience goes
out full of energy. Doug. – You put me on the spot. Frankenstein’s monster was understood. It’s that crude bottle to Margaret. I don’t see it as a bad thing. I think we’re gonna see the neural stuff really
blow up. I think the hype is at that risk of being
in the top of the hype curve. I think it’s real. – [John] Norm. – And I think domain-specific architectures
are work of art whether they’re A6 or FPGA-based, and the party is not over yet. The parrot is not dead. – [John] Butler? – You already heard my final thought. There’s plenty of room at the top. – [John] Yeah, plenty of room at the top. Margaret? – I think we didn’t talk about storage at
all, and in a world where we’re generating data at vast rates that are still exponential,
we need to come up with better storage technologies as well. We talked a lot about compute, a little bit
about memory. The storage thing is fascinating. – [Butler] DNA will save us. – Hopefully. – Okay. Thank you all. Are we out of time? We’re out of time, yes. Thank you all for your attention. Nice panel. – I took away 15 minutes of our scheduled
time.

2 comments found

Leave comment

Your email address will not be published. Required fields are marked with *.