A raw unbridled opinion on the up and coming XBox 360 and PS3’s

I found this raw and unbridled opinion on the up and coming XBox 360 and PS3’s on the wibbly wobbly web. It does not bode well for microsoft and sony who’se penny pinching in the design of their up and coming next gen consoles renders them somewhat neutered.

Click more for the full article.

http://www.anandtech.com/video/showdoc.aspx?i=2461

Microsoft’s Xbox 360 & Sony’s PlayStation 3 – Examples of Poor CPU
Performance

Date: June 29th, 2005
Author: Anand Lal Shimpi

“In our last article we had a fairly open-ended discussion about many of the
challenges facing both of the recently announced next-generation game
consoles. We discussed misconceptions about the Cell processor and its
ability to accelerate physics calculations, as well as touched on the GPUs
of both platforms. In the end, both the Xbox 360 and the PlayStation 3 are
much closer competitors than you would think based on first impressions.

The Xbox 360’s Xenon CPU features more general purpose cores than the
PlayStation 3 (3 vs. 1), however game developers will most likely only be
using one of those cores for the majority of their calculations, leveling
the playing field considerably.

The Cell processor derives much of its power from its array of 7 SPEs
(Synergistic Processing Elements), however as we discovered in our last
article, their purpose is far more specialized than we had thought.
Speaking with Epic Games’ head developer, Tim Sweeney, he provided a much
more balanced view of what sorts of tasks could take advantage of the Cell’s
SPE array.

The GPUs of the next-generation platforms also proved to be quite
interesting. In Part I we speculated as to the true nature of NVIDIA’s RSX
in the PS3, concluding that it’s quite likely little more than a higher
clocked G70 GPU. We will expand on that discussion a bit more in this
article. We also looked at Xenos, the Xbox 360’s GPU and characterized it
as equivalent to a very flexible 24-pipe R420. Despite the inclusion of the
10MB of embedded DRAM, Xenos and RSX ended up being quite similar in our
expectations for performance; and that pretty much summarized all of our
findings – the two consoles, although implementing very different
architectures, ended up being so very similar.

So we’ve concluded that the two platforms will probably end up performing
very similarly, but there was one very important element excluded from the
first article: a comparison to present-day PC architectures. The reason a
comparison to PC architectures is important is because it provides an
evaluation point to gauge the expected performance of these next-generation
consoles. We’ve heard countless times that these new consoles would offer
better gaming performance than anything we’ve had on the PC, or anything we
would have for a matter of years. Now it’s time to actually put those
claims to the test, and that’s exactly what we did.

Speaking under conditions of anonymity with real world game developers who
have had first hand experience writing code for both the Xbox 360 and
PlayStation 3 hardware (and dev kits where applicable), we asked them for
nothing more than their brutal honesty. What did they think of these new
consoles? Are they really outfitted with the PC-eclipsing performance we’ve
been lead to believe they have? The answer is actually quite frequently
found in history; as with anything, you get what you pay for.

Learning from Generation X
The original Xbox console marked a very important step in the evolution of
gaming consoles – it was the first console that was little more than a
Windows PC.

It featured a 733MHz Pentium III processor with a 128KB L2 cache, paired up
with a modified version of NVIDIA’s nForce chipset (modified to support
Intel’s Pentium III bus instead of the Athlon XP it was designed for). The
nForce chipset featured an integrated GPU, codenamed the NV2A, offering
performance very similar to that of a GeForce3. The system had a 5X PC DVD
drive and an 8GB IDE hard drive, and all of the controllers interfaced to
the console using USB cables with a proprietary connector.

For the most part, game developers were quite pleased with the original
Xbox. It offered them a much more powerful CPU, GPU and overall platform
than anything had before. But as time went on, there were definitely
limitations that developers ran into with the first Xbox.

One of the biggest limitations ended up being the meager 64MB of memory that
the system shipped with. Developers had asked for 128MB and the motherboard
even had positions silk screened for an additional 64MB, but in an attempt
to control costs the final console only shipped with 64MB of memory.

The next problem is that the NV2A GPU ended up not having the fill rate and
memory bandwidth necessary to drive high resolutions, which kept the Xbox
from being used as a HD console.

Although Intel outfitted the original Xbox with a Pentium III/Celeron hybrid
in order to improve performance yet maintain its low cost, at 733MHz that
quickly became a performance bottleneck for more complex games after the
console’s introduction.

The combination of GPU and CPU limitations made 30 fps a frame rate target
for many games, while simpler titles were able to run at 60 fps. Split
screen play on Halo would even stutter below 30 fps depending on what was
happening on screen, and that was just a first-generation title. More
experience with the Xbox brought creative solutions to the limitations of
the console, but clearly most game developers had a wish list of things they
would have liked to have seen in the Xbox successor. Similar complaints
were levied against the PlayStation 2, but in some cases they were more
extreme (e.g. its 4MB frame buffer).

Given that consoles are generally evolutionary, taking lessons learned in
previous generations and delivering what the game developers want in order
to create the next-generation of titles, it isn’t a surprise to see that a
number of these problems are fixed in the Xbox 360 and PlayStation 3.

One of the most important changes with the new consoles is that system
memory has been bumped from 64MB on the original Xbox to a whopping 512MB on
both the Xbox 360 and the PlayStation 3. For the Xbox, that’s a factor of 8
increase, and over 12x the total memory present on the PlayStation 2.

The other important improvement with the next-generation of consoles is that
the GPUs have been improved tremendously. With 6 – 12 month product cycles,
it’s no surprise that in the past 4 years GPUs have become much more
powerful. By far the biggest upgrade these new consoles will offer, from a
graphics standpoint, is the ability to support HD resolutions.

There are obviously other, less-performance oriented improvements such as
wireless controllers and more ubiquitous multi-channel sound support. And
with Sony’s PlayStation 3, disc capacity goes up thanks to their embracing
the Blu-ray standard.

But then we come to the issue of the CPUs in these next-generation
consoles, and the level of improvement they offer. Both the Xbox 360 and
the PlayStation 3 offer multi-core CPUs to supposedly usher in a new era of
improved game physics and reality. Unfortunately, as we have found out, the
desire to bring multi-core CPUs to these consoles was made a reality at the
expense of performance in a very big way.

Problems with the Architecture
At the heart of both the Xenon and Cell processors is IBM’s custom PowerPC
based core. We’ve discussed this core in our previous articles, but it is
best characterized as being quite simple. The core itself is a very narrow
2-issue in-order execution core, featuring a 64KB L1 cache (32K
instruction/32K data) and either a 1MB or 512KB L2 cache (for Xenon or Cell,
respectively). Supporting SMT, the core can execute two threads
simultaneously similar to a Hyper Threading enabled Pentium 4. The Xenon
CPU is made up of three of these cores, while Cell features just one.

Each individual core is extremely small, making the 3-core Xenon CPU in the
Xbox 360 smaller than a single core 90nm Pentium 4. While we don’t have
exact die sizes, we’ve heard that the number is around 1/2 the size of the
90nm Prescott die.

IBM’s pitch to Microsoft was based on the peak theoretical floating point
performance-per-dollar that the Xenon CPU would offer, and given Microsoft’s
focus on cost savings with the Xbox 360, they took the bait.

While Microsoft and Sony have been childishly playing this flops-war,
comparing the 1 TFLOPs processing power of the Xenon CPU to the 2 TFLOPs
processing power of the Cell, the real-world performance war has already
been lost.

Right now, from what we’ve heard, the real-world performance of the Xenon
CPU is about twice that of the 733MHz processor in the first Xbox.
Considering that this CPU is supposed to power the Xbox 360 for the next 4 –
5 years, it’s nothing short of disappointing. To put it in perspective,
floating point multiplies are apparently 1/3 as fast on Xenon as on a
Pentium 4.

The reason for the poor performance? The very narrow 2-issue in-order core
also happens to be very deeply pipelined, apparently with a branch predictor
that’s not the best in the business. In the end, you get what you pay for,
and with such a small core, it’s no surprise that performance isn’t anywhere
near the Athlon 64 or Pentium 4 class.

The Cell processor doesn’t get off the hook just because it only uses a
single one of these horribly slow cores; the SPE array ends up being fairly
useless in the majority of situations, making it little more than a waste of
die space.

We mentioned before that collision detection is able to be accelerated on
the SPEs of Cell, despite being fairly branch heavy. The lack of a branch
predictor in the SPEs apparently isn’t that big of a deal, since most
collision detection branches are basically random and can’t be predicted
even with the best branch predictor. So not having a branch predictor doesn’t
hurt, what does hurt however is the very small amount of local memory
available to each SPE. In order to access main memory, the SPE places a DMA
request on the bus (or the PPE can initiate the DMA request) and waits for
it to be fulfilled. From those that have had experience with the PS3
development kits, this access takes far too long to be used in many real
world scenarios. It is the small amount of local memory that each SPE has
access to that limits the SPEs from being able to work on more than a
handful of tasks. While physics acceleration is an important one, there are
many more tasks that can’t be accelerated by the SPEs because of the memory
limitation.

The other point that has been made is that even if you can offload some of
the physics calculations to the SPE array, the Cell’s PPE ends up being a
pretty big bottleneck thanks to its overall lackluster performance. It’s
akin to having an extremely fast GPU but without a fast CPU to pair it up
with.

What About Multithreading?
We of course asked the obvious question: would game developers rather have 3
slow general purpose cores, or one of those cores paired with an array of
specialized SPEs? The response was unanimous, everyone we have spoken to
would rather take the general purpose core approach.

Citing everything from ease of programming to the limitations of the SPEs we
mentioned previously, the Xbox 360 appears to be the more developer-friendly
of the two platforms according to the cross-platform developers we’ve spoken
to. Despite being more developer-friendly, the Xenon CPU is still not what
developers wanted.

The most ironic bit of it all is that according to developers, if either
manufacturer had decided to use an Athlon 64 or a Pentium D in their
next-gen console, they would be significantly ahead of the competition in
terms of CPU performance.

While the developers we’ve spoken to agree that heavily multithreaded game
engines are the future, that future won’t really take form for another 3 – 5
years. Even Microsoft admitted to us that all developers are focusing on
having, at most, one or two threads of execution for the game engine
itself – not the four or six threads that the Xbox 360 was designed for.

Even when games become more aggressive with their multithreading, targeting
2 – 4 threads, most of the work will still be done in a single thread. It
won’t be until the next step in multithreaded architectures where that
single thread gets broken down even further, and by that time we’ll be
talking about Xbox 720 and PlayStation 4. In the end, the more
multithreaded nature of these new console CPUs doesn’t help paint much of a
brighter performance picture – multithreaded or not, game developers are not
pleased with the performance of these CPUs.

What about all those Flops?
The one statement that we heard over and over again was that Microsoft was
sold on the peak theoretical performance of the Xenon CPU. Ever since the
announcement of the Xbox 360 and PS3 hardware, people have been set on
comparing Microsoft’s figure of 1 trillion floating point operations per
second to Sony’s figure of 2 trillion floating point operations per second
(TFLOPs). Any AnandTech reader should know for a fact that these numbers
are meaningless, but just in case you need some reasoning for why, let’s
look at the facts.

First and foremost, a floating point operation can be anything; it can be
adding two floating point numbers together, or it can be performing a dot
product on two floating point numbers, it can even be just calculating the
complement of a fp number. Anything that is executed on a FPU is fair game
to be called a floating point operation.

Secondly, both floating point power numbers refer to the whole system, CPU
and GPU. Obviously a GPU’s floating point processing power doesn’t mean
anything if you’re trying to run general purpose code on it and vice versa.
As we’ve seen from the graphics market, characterizing GPU performance in
terms of generic floating point operations per second is far from the full
performance story.

Third, when a manufacturer is talking about peak floating point performance
there are a few things that they aren’t taking into account. Being able to
process billions of operations per second depends on actually being able to
have that many floating point operations to work on. That means that you
have to have enough bandwidth to keep the FPUs fed, no mispredicted
branches, no cache misses and the right structure of code to make sure that
all of the FPUs can be fed at all times so they can execute at their peak
rates. We already know that’s not the case as game developers have already
told us that the Xenon CPU isn’t even in the same realm of performance as
the Pentium 4 or Athlon 64. Not to mention that the requirements for
hitting peak theoretical performance are always ridiculous; caches are only
so big and thus there will come a time where a request to main memory is
needed, and you can expect that request to be fulfilled in a few hundred
clock cycles, where no floating point operations will be happening at all.

So while there may be some extreme cases where the Xenon CPU can hit its
peak performance, it sure isn’t happening in any real world code.

The Cell processor is no different; given that its PPE is identical to one
of the PowerPC cores in Xenon, it must derive its floating point performance
superiority from its array of SPEs. So what’s the issue with 218 GFLOPs
number (2 TFLOPs for the whole system)? Well, from what we’ve heard, game
developers are finding that they can’t use the SPEs for a lot of tasks. So
in the end, it doesn’t matter what peak theoretical performance of Cell’s
SPE array is, if those SPEs aren’t being used all the time.

Another way to look at this comparison of flops is to look at integer add
latencies on the Pentium 4 vs. the Athlon 64. The Pentium 4 has two double
pumped ALUs, each capable of performing two add operations per clock, that’s
a total of 4 add operations per clock; so we could say that a 3.8GHz Pentium
4 can perform 15.2 billion operations per second. The Athlon 64 has three
ALUs each capable of executing an add every clock; so a 2.8GHz Athlon 64
can perform 8.4 billion operations per second. By this silly console
marketing logic, the Pentium 4 would be almost twice as fast as the Athlon
64, and a multi-core Pentium 4 would be faster than a multi-core Athlon 64.
Any AnandTech reader should know that’s hardly the case. No code is
composed entirely of add instructions, and even if it were, eventually the
Pentium 4 and Athlon 64 will have to go out to main memory for data, and
when they do, the Athlon 64 has a much lower latency access to memory than
the P4. In the end, despite what these horribly concocted numbers may lead
you to believe, they say absolutely nothing about performance. The exact
same situation exists with the CPUs of the next-generation consoles; don’t
fall for it.

Why did Sony/MS do it?
For Sony, it doesn’t take much to see that the Cell processor is eerily
similar to the Emotion Engine in the PlayStation 2, at least conceptually.
Sony clearly has an idea of what direction they would like to go in, and it
doesn’t happen to be one that’s aligned with much of the rest of the
industry. Sony’s past successes have really come, not because of the
hardware, but because of the developers and their PSX/PS2 exclusive titles.
A single hot title can ship hundreds of millions of consoles, and by our
count, Sony has had many more of those than Microsoft had with the first
Xbox.

Sony shipped around 4 times as many PlayStation 2 consoles as Microsoft did
Xboxes, regardless of the hardware platform, a game developer won’t turn
down working with the PS2 – the install base is just that attractive. So
for Sony, the Cell processor may be strange and even undesirable for game
developers, but the developers will come regardless.

The real surprise was Microsoft; with the first Xbox, Microsoft listened
very closely to the wants and desires of game developers. This time around,
despite what has been said publicly, the Xbox 360’s CPU architecture wasn’t
what game developers had asked for.

They wanted a multi-core CPU, but not such a significant step back in single
threaded performance. When AMD and Intel moved to multi-core designs, they
did so at the expense of a few hundred MHz in clock speed, not by taking a
step back in architecture.

We suspect that a big part of Microsoft’s decision to go with the Xenon core
was because of its extremely small size. A smaller die means lower system
costs, and if Microsoft indeed launches the Xbox 360 at $299 the Xenon CPU
will be a big reason why that was made possible.

Another contributing factor may be the fact that Microsoft wanted to own the
IP of the silicon that went into the Xbox 360. We seriously doubt that
either AMD or Intel would be willing to grant them the right to make Pentium
4 or Athlon 64 CPUs, so it may have been that IBM was the only partner
willing to work with Microsoft’s terms and only with this one specific core.

Regardless of the reasoning, not a single developer we’ve spoken to thinks
that it was the right decision.

The Saving Grace: The GPUs
Although both manufacturers royally screwed up their CPUs, all developers
have agreed that they are quite pleased with the GPU power of the
next-generation consoles.

First, let’s talk about NVIDIA’s RSX in the PlayStation 3. We discussed the
possibility of RSX offloading vertex processing onto the Cell processor, but
more and more it seems that isn’t the case. It looks like the RSX will
basically be a 90nm G70 with Turbo Cache running at 550MHz, and the
performance will be quite good.

One option we didn’t discuss in the last article, was that the G70 GPU may
feature a number of disabled shader pipes already to improve yield. The
move to 90nm may allow for those pipes to be enabled and thus allowing for
another scenario where the RSX offers higher performance at the same
transistor count as the present-day G70. Sony may be hesitant to reveal the
actual number of pixel and vertex pipes in the RSX because honestly they
won’t know until a few months before mass production what their final yields
will be.

Despite strong performance and support for 1080p, a large number of
developers are targeting 720p for their PS3 titles and won’t support 1080p.
Those that are simply porting current-generation games over will have no
problems running at 1080p, but anyone working on a truly next-generation
title won’t have the fill rate necessary to render at 1080p.

Another interesting point is that despite its lack of “free 4X AA” like the
Xbox 360, in some cases it won’t matter. Titles that use longer pixel
shader programs end up being bound by pixel shader performance rather than
memory bandwidth, so the performance difference between no AA and 2X/4X AA
may end up being quite small. Not all titles will push the RSX to the
limits however, and those titles will definitely see a performance drop with
AA enabled. In the end, whether the RSX’s lack of embedded DRAM matters
will be entirely dependent on the game engine being developed for the
platform. Games that make more extensive use of long pixel shaders will see
less of an impact with AA enabled than those that are more texture bound.
Game developers are all over the map on this one, so it wouldn’t be fair to
characterize all of the games as falling into one category or another.

ATI’s Xenos GPU is also looking pretty good and most are expecting
performance to be very similar to the RSX, but real world support for this
won’t be ready for another couple of months. Developers have just recently
received more final Xbox 360 hardware, and gauging performance of the actual
Xenos GPU compared to the R420 based solutions in the G5 development kits
will take some time. Since the original dev kits offered significantly
lower performance, developers will need a bit of time to figure out what
realistic limits the Xenos GPU will have.

Final Words
Just because these CPUs and GPUs are in a console doesn’t mean that we
should throw away years of knowledge from the PC industry – performance
doesn’t come out of thin air, and peak performance is almost never achieved.
Clever marketing however, will always try to fool the consumer.

And that’s what we have here today, with the Xbox 360 and PlayStation 3.
Both consoles are marketed to be much more powerful than they actually are,
and from talking to numerous game developers it seems that the real world
performance of these platforms isn’t anywhere near what it was supposed to
be.

It looks like significant advancements in game physics won’t happen on
consoles for another 4 or 5 years, although it may happen with PC games much
before that.

It’s not all bad news however; the good news is that both GPUs are quite
possibly the most promising part of the new consoles. With the performance
that we have seen from NVIDIA’s G70, we have very high expectations for the
360 and PS3. The ability to finally run at HD resolutions in all games will
bring a much needed element to console gaming.

And let’s not forget all of the other improvements to these next-generation
game consoles. The CPUs, despite being relatively lackluster, will still be
faster than their predecessors and increased system memory will give
developers more breathing room. Then there are other improvements such as
wireless controllers, better online play and updated game engines that will
contribute to an overall better gaming experience.

In the end, performance could be better, the consoles aren’t what they could
have been had the powers at be made some different decisions. While they
will bring better quality games to market and will be better than their
predecessors, it doesn’t look like they will be the end of PC gaming any
more than the Xbox and PS2 were when they were launched. The two markets
will continue to coexist, with consoles being much easier to deal with, and
PCs offering some performance-derived advantages.

With much more powerful CPUs and, in the near future, more powerful GPUs,
the PC paired with the right developers should be able to bring about that
revolution in game physics and graphics we’ve been hoping for. Consoles
will help accelerate the transition to multithreaded gaming, but it looks
like it will take PC developers to bring about real change in things like
game physics, AI and other non-visual elements of gaming. ”