[EVDL] Dive Deep in2 Tesla’s HW3 auton-AI> love 0tolerance machining

brucedp5 via EV Sun, 16 Jun 2019 13:46:59 -0700


https://cleantechnica.com/2019/06/15/teslas-new-hw3-self-driving-computer-its-a-beast-cleantechnica-deep-dive/
Tesla’s New HW3 Self-Driving Computer — It’s A Beast (CleanTechnica Deep
Dive)
20190615


[images  
https://cleantechnica.com/files/2019/04/tesla-full-self-driving-autopilot-hw3-chip-samsung-perceive-.png

https://cleantechnica.com/files/2019/06/Flash-and-RAM.jpg

https://cleantechnica.com/files/2019/06/Micron.jpg

https://cleantechnica.com/files/2019/06/Processor-Die-Sizes.jpg

https://cleantechnica.com/files/2019/06/Tesla-Die-Labeled-V3.jpg

https://cleantechnica.com/files/2019/06/[email protected]
NVIDIA DRIVE AGX Pegasus

https://cleantechnica.com/files/2019/06/Reductions-in-SoC.jpg


shares
https://twitter.com/elonmusk/status/1140061487160909824
CleanTechnica @cleantechnica · 4h
Replying to @cleantechnica
We get this right, @ElonMusk? 
Any corrections?#Tesla

Elon Musk @elonmusk
Good analysis. Both computers will be used & sync ~20 times per second. This
is a long time to a computer. Like a twin-engine plane, use both engines to
max for normal operation, but can safely operate on just one.
8:00 PM - Jun 15, 2019


https://twitter.com/engineers_feed/status/1135053398150131713
World of Engineering @engineers_feed
 Embedded video
 Gotta love zero tolerance machining ??
12:19 AM - Jun 2, 2019
]

A month ago, Tesla revealed several secrets regarding the new chip the
Silicon Valley company has designed for full self-driving capability.
Nonetheless, some of the people making that presentation may have failed to
take into account that not everyone is fully literate in microprocessor
design and engineering. I don’t fall into that category either, but I have
been a computer enthusiast for quite some time and know a few things that
might help me pick out some of the highlights, point out why they are so
exciting, and further communicate how Tesla really is way ahead of the
competition. I also have some theories about what this new chip can lead to.

Full warning: this presentation is still quite technical, but I do try to
explain the key points in plain English as well.

What’s on the Board?
First, we can see a general image of the board. The board has complete
redundancy, meaning that any system on the board can fail and the computer
will continue operating as if nothing happened. On the right side of the
board is where all the cameras plug into the board, and on the left side is
where the power supply connects — as well as some input and output
connectors. In the middle of the board, stars of the show, you see the 2
processors (note that “processor” is only a semi-accurate description, as
you will see later). Tesla uses 2 processors for redundancy and for
cross-referencing the results, not to increase performance.

Update: Correction from Elon Musk — there is a performance benefit from
having 2 processors:

Under the processor and a bit to the left of the processors (marked light
blue) are the flash memory chips that store the operating system. The
capacity of each chip as of this moment is unknown, but considering that
nowadays you can buy a micro-SD card with a capacity of 500GB, it could
potentially be quite big.

On the left and right side of each processor, you see 4 LPDDR4 chips (marked
green). The processor is being fabricated by Samsung and this has also
mistakenly led some people to believe that the RAM is also from Samsung, but
that is actually not the case.

If you take a really close look at the chip, you can see a little logo.
Samsung does not put a logo like that on its RAM chips, but Micron does and
its logo, especially the one it puts on chips, seems to very closely match
what we see in this image. Micron also happens to manufacture LPDDR4 RAM and
even has a product line targeting the automotive industry.

So, while the exact size of the DRAM is currently unknown, we do have a
probable range, which is somewhere between 4GB and 24GB. For some
perspective, LPDDR4 is faster than the DDR4 currently used in desktops and
laptops and LPDDR4 is what is also currently used in smartphones. LPDDR4
chips are a type of DRAM, which is how it will be referred to later in the
article.

Once you take off the heat spreader, the die is revealed to us, and Tesla
has told us a lot of information about it. The size of the die is around
260mm2. To put that in perspective, the processor in an iPhone is around
80–120mm2, an Intel laptop/desktop processor die is around ~180mm2, NVIDIA’s
Xavier chip die is 350mm2, and the chips on dedicated graphics cards range
from 400 to 800mm2.

The Guts (or Brains) of Tesla’s SoC
First let’s clear up a huge misconception. I called Tesla’s chip a processor
earlier, but that is not completely accurate. It is actually a full
system-on-a-chip (SoC). Tesla put a processor, a graphics card, a neural
processor, as well as a bunch of other things you probably didn’t even know
existed onto this single chip.

Tesla explains the whole process by somewhat following the path that the
data from the cameras take. First, the data come in through what is labeled
as “Input” at a maximum rate of 2.5 billion pixels per second, which roughly
equates to 21 Full HD 1080p screens at 60 frames per second. This a hell of
a lot more data than the currently installed sensors create. This then
travels into the DRAM we discussed earlier, which is one of the first and
main bottlenecks of the chip since this is the slowest component. Then the
data go back into the chip and through the image signal processor that can
process 1 billion pixels per second (roughly 8 Full HD 1080p screens at 60
frames per second). This part of the chip turns the raw RGB data from the
camera sensors into data that is actually useful in addition to enhancing
the tone and removing noise.

Then we finally get the most interesting part of the whole chip, the neural
network processor, or NPU. The first step in the process is that the data
gets stored in the SRAM array. Now, a lot of people, even ones who know a
bit about computer components, may be wondering, “What on earth is SRAM?”
Well, the closest comparison would be the shared L3 cache you will find on
your computer’s processor. What does all of this mean, though? It means
storage that is really fast but also expensive. Right now, Intel’s largest
L3 cache is 45MB (it was 16MB until 2010 and 24MB before 2014). Most
consumer laptop and desktop processors have between 8-12MB of L3 cache.
Tesla’s neural network processor has a whopping total of 64MB SRAM that is
divided into two 32MB SRAM segments to support the two neural network
processors. Tesla considers its large SRAM capacity to be one of its biggest
advantages over any other kind of chip it could have potentially used.

This might actually be enough memory to store, render, and process a single
frame of all cameras and sensor inputs combined, but because the frames are
not bad-quality JPEGs but instead large enhanced lossless frames, it
probably isn’t. Keep in mind that if the cameras do indeed work at 60 frames
per second and one combined frame in the SRAM could potentially equal 3.84
gigabytes of data processed per second. Since a single frame is probably
larger, I don’t even want to venture a guess how many gigabytes per second
this is, but I do know its less than 68 gigabytes.

All the data travel through the primary corridors/hallways of the chip, also
known as the “Network on a Chip” or “NoC” (painted blue in the image) and
then the LPDDR4 DRAM, through which the data travel has a bandwidth of 68
gigabytes per second. Tesla indicated during the Autonomy Day presentation
that this is enough but could be better, and from that we gather that Tesla
will likely improve it in its next-generation product. Right now, it’s not
totally clear whether the bottleneck is the bandwidth of the DRAM or amount
of SRAM.

The neural network processor is an incredibly powerful tool. A lot of the
data go through it, but some of the computational tasks have not yet been
adjusted to work on a neural network processor or are not suitable for that
kind of processor. This is where the GPU comes in. The GPU in this chip has
(per Tesla) modest performance, runs at 1 Ghz, and is capable of 600 GFLOPS.
Tesla indicated that the GPU currently performs some post-processing tasks,
which could potentially include the creation of pictures and videos that are
understandable for humans. However, from the way Tesla described the role of
the GPU in its presentation, expect the next iteration of the chip to have a
much smaller GPU.

There are also some general-purpose processing tasks unsuitable to the
neural processor that are done by the CPU. The way Tesla explained it, there
are 12 ARM Cortex A72 64-bit CPUs in the chip running at 2.2Ghz. Although, a
more accurate description would be to say that there are three 4-core CPUs
in there. Tesla’s choice of going with ARM’s Cortex A72 architecture is a
bit puzzling, however. Cortex A72 is an architecture from 2015. Since then,
the A73, A75, and a few days ago even A77 architectures have been released.
Elon and team explained it by saying that this was what was available when
they started the design of the chip 2 years ago. Perhaps this was a cheaper
option for Tesla, which would make sense if multithread performance is more
important to them than single task performance, hence the inclusion of 3
older processors rather than one or two newer or more powerful ones.
Multithreading usually requires a bit more programming work to distribute
tasks properly, but hey, this is Tesla we’re talking about — it’s probably a
piece of cake for the company. In any case, the CPU performance on this chip
is 2.5 times higher than Tesla had in the previous version HW2.

NVIDIA Feels the Need to Save Face
So that was a lot of technical talk, so let’s have a short break and I’ll
show you something funny. After Tesla’s Autonomy Day, NVIDIA published a new
blog entry complimenting Tesla for “raising the bar for self-driving.”
Immediately after that, NVIDIA tried to save face by patting itself on the
back with useless metrics of comparison.

Tesla’s HW2 is powered by an NVIDIA Xavier chip that can do 21 to 30 TOPs
(terra operations per second). Tesla’s new HW3 chip can do 144 TOPS.

Tesla in its presentation stated that NVIDIA’s Xavier chip is capable of 21
TOPS. NVIDIA tried to correct Tesla in its blog saying that it’s actually 30
TOPS instead of 21. The thing is, NVIDIA’s Xavier chip is built for multiple
purposes and tries its best to conform to the requirements of multiple
potential clients. Thus, the chip doesn’t have a neural network processor,
but can successfully simulate one using software and some of its deep
learning–focused hardware. When Tesla said 21 TOPs, that was the result it
got by going through the simulated neural networks on the GPU of the chip.
Tesla’s benchmark is very simple in its measure. “How many TOPS can our
software reach on this hardware?” That is an entirely different question
than how many TOPS this hardware can produce with software to fully utilize
the chip and produce maximum TOPS on this hardware. Theoretically, if the
chip were tasked to perform some other task in another scenario, it might be
able to reach that 30 TOPs figure, but that is a pretty useless metric in
this context. Nonetheless, it’s sensible NVIDIA would like to set the record
straight for other customers or potential customers.

Worth remembering is that, when benchmarking a complex piece of software, it
is all about the performance that specific software can realize. This is why
the best hardware is not always the hardware with the highest theoretical
performance.

In the past, we used to only have a general purpose processor with a
numerical co-processor. Then we got the graphical co-processor, and now the
NeuralNet co-processor. Although, ironically, in this case, the CPU is more
of a co-processor to the neural processing unit. Basically, what Tesla did
was create a specialized processor that is way better at an extremely
specific task, but would suck at general purpose processing. So, yeah, the
only game this chip is good at is running through the roads in the matrix we
all live in — but it’s really good at that.

To further defend its pride, NVIDIA stated that when you combine XAVIER with
a powerful GPU in the company’s DRIVE AGX Pegasus product, you can achieve
160 TOPS. If Tesla for its purposes can again only utilize 70% of that due
to the need to virtualize the neural network processor, that translates to a
maximum of 112 TOPS and wastes a lot of power. NVIDIA also went on to say
that the DRIVE AGX Pegasus can reach 320 TOPS by stacking two units in
parallel, but this is unrealistic for this particular application.

When we talk about Internet speed, we care not only about the speed, but
also about the latency/response time. Tesla in this case already complains
about the latency of information reaching the chip from the DRAM, which is
right next to it. The latency of data traveling from multiple loosely
interconnected chips using a flimsy NVlink cable would be totally
unacceptable.  

Also, that doesn’t take into account that an electric car is powered by a
battery, not a nuclear reactor, and the amount of electricity you would need
to use to power this 4 chip solution would drain your battery before you
even reach the highway. Efficiency really is key here.

NVIDIA’s solutions focus more on combining multiple chips for performance.
It is stuck in its marketing need of having multiple cores, better CPUs,
better GPUs, and connecting them with NVlink rather than building them for a
specific use case. This is great for companies trying to perfect software or
universities working on a project, but this solution is not efficient enough
for real-world applications.

Hardware 4
So, there you go — that is Tesla’s hardware version 3. So then, what can we
expect for hardware version 4? Right now, all we know is that it will be
aimed at further improving safety. The only thing that really tells us is
that it will not be focused on making an old car learn new tricks, but that
doesn’t mean it won’t include some of that, too. Here is my list of
potential changes and improvements HW4 could have, ranked from most likely
to most speculative:

Tesla will most likely use a newer CPU version, based on when Tesla started
designing the architecture that will likely be the Cortex A75. The increased
processing power gives Tesla the opportunity to save power and space on the
chip, making room for more important components.

Tesla may upgrade to LPDDR5, which would result in a significant speed
increase and a reduction in power consumption. However, if the HW4 chip is
in the design process, or to keep costs down, Tesla may go with LPDDR4X. By
using a lower voltage, LPDDR4X saves power, but it can still result in a
speed increase if multiple chips are used in parallel. Although, this
configuration would not save power compared to HW3. Either choice would
represent an overall improvement over HW3.
Further improved neural processing units with even more SRAM.
Depending on whether or not the processing capacity on the chip can handle
the full resolution and frame rate that the cameras are capable of, Tesla’s
HW4 might come with new cameras and sensors with a higher resolution and
maybe even a higher frame rate. Higher resolution images are critical, as
more detail will help the computer identify objects more accurately, and at
greater distance.

An upgraded image signal processor (ISP). Tesla wanted to make its chip as
cheap and as powerful as possible. That’s why there is a large disconnect in
HW3 between what the chip input is capable of handling and what the ISP is
capable of handling, hence the need for a beefier or secondary ISP,
depending on which solution requires less power or less space, or costs
less.

A smaller GPU. One of the reasons there is still a moderate GPU in the HW3
SoC is because not all of the processing tasks have been transferred to the
neural network yet. Including a moderate GPU may have been a shortcut for
Tesla to give its programmers enough time to re-allocate any remaining GPU
processing tasks to either the NPU or CPU. Eliminating the GPU entirely
might not be possible; however, a smaller GPU with a smaller footprint on
the SoC leads to less NoC, so there is budget for and room for more critical
components like more SRAM.

Conclusion
Tesla’s HW3 computer is an absolute beast. It can handle 7 times as many
frames, has 7 times larger neural nets, and as was said in the presentation,
“There are a lot ways you can spend that.” Being a computer tech enthusiast,
watching Tesla’s Autonomy Day presentation was better than going to
Disneyland. When it comes to achieving Full Self-Driving capability, the
first step is having your priorities straight, and Tesla certainly does.

There are a few points that have not been stressed enough, and this leads to
people underestimating or not understanding why Tesla is actually leading in
the race to developing fully autonomous vehicles — leading by a significant
margin. There is a really good problem analogous to this one. All other
manufacturers that are now starting to make EVs have some advanced tech, but
still have not been able to beat Tesla’s Model S from 2012, and that’s just
on the EV side of things, not to even concerning the computer/software/UI
side of things. The reason why competing EV tech hasn’t caught up is simple:
vertical integration.

Let me dumb it down slightly to explain: Imagine you are a manufacturer and
you need to build a website. You could go to one of those platforms where
you drag and drop some widgets, pages, and solutions on there and type some
text, or you could have a whole team of dedicated programmers make a
professional website. Traditional automakers are trying to do the former
with electric cars and self-driving. They are ordering LEGOs from different
companies and hoping they fit together. Where they do not fit, they simply
use a knife to carve one LEGO to make it fit the other. Tesla, on the other
hand, is a lot more like what you see in this Tweet liked by Elon Musk:

With Tesla’s new HW3 computer, everything is made to fit like a glove, to
fit almost as well as what you see in the tweet above. Elon Musk has said
that Full Self-Driving only really makes sense in an electric car, and he is
right. To focus that a bit more appropriately, it isn’t worth doing it for
an internal combustion engine car. The lack of instant torque makes
self-driving less effective and less safe when it comes to avoiding crashes
and with slippery and icy road conditions, something we will dive into in an
upcoming article.

The most important reason, however, is that investing resources in a
self-driving solution for a dying and soon-to-be-extinct product category
like a gas car is just dumb, plain and simple.

When making a self-driving solution for an electric car, power efficiency
might be the second most important metric after safety, and it is currently
not getting nearly enough consideration (or, at least, not effectively so)
by any other automaker or chipmaker. This is yet another reason why Tesla is
light years ahead.
[© cleantechnica.com]




For EVLN EV-newswire posts use:
 http://evdl.org/archive/


{brucedp.neocities.org}

--
Sent from: http://electric-vehicle-discussion-list.413529.n4.nabble.com/
_______________________________________________
UNSUBSCRIBE: http://www.evdl.org/help/index.html#usub
http://lists.evdl.org/listinfo.cgi/ev-evdl.org
Please discuss EV drag racing at NEDRA (http://groups.yahoo.com/group/NEDRA)

[EVDL] Dive Deep in2 Tesla’s HW3 auton-AI> love 0tolerance machining

Reply via email to