XXXXXX

Forum Posts	Following	Followers
3576	199	55

These are originally posted over in the OCU. These are some thoughts that I've had brewing. It's a little wandering, and hasn't been edited much so... be warned. ps. Sorry about the formatting, it's a game spot thing...

AM2 is what you have now,

AM2+ includes: HyperTransport 3.0.

I don't know pin counts or anything else, but HT3.0 is for sure. Interestingly enough, with the drop of DDR3, HT3.0 might be the only difference.

HT3.0 runs at 2.6GHz and has a memory bandwidth of over 40GB/second. Just perfect for those multi-core adventures. This represents nearly a doubling of the memory bandwidth over HT1.0 (the current standard).

Why is that important? Keep watching...

Next step in my thinking...

When intel's dual cores came out it was said that a dual core was 90% faster than a single core alone. Now that intel's quad cores are coming out, it's said that they are 70% faster than a dual core alone.

So, 10% loss of performance when we add a second core, and now 30% loss of performance when we have three additional cores. This means that when the Octocore comes out (and it will, it's already on the board even if it's not being talked about yet, and I'm sure AMD is sweating over it's own too) under the current trend it will only be 30% faster than a single quad core. That's a 70% loss of performance.

Why the performance loss? The cores are getting starved. There's not enough bandwidth to memory to keep the cores fed. If you have enough cache on a chip that your app or function can fit inside a core with little need to move to the outside system, it's going to run at full speed and it's going to run dang fast. BUT as more and more cores all try and talk on that tiny little FSB they have greater and greater contention and finally begin to spend more and more time running idle while waiting for data across the FSB.

The above assumes that the loss is linear and not exponential. I believe the loss is non-linear but it's being off set by raising the FSB speed. Intel's been raising the bus speed already, and will continue to do so. More GHz on the bus means more heat and more power used. We've seen that result already, eventually you hit a wall or you have to go more and more exotic with your cooling. Worse, intel's already show that it doesn't care about the entire system's power use, just the cpu. It can keep power down at the chip level, yet still raise power use across at the socket. It's an easy marketing message to sell.

So intel continues to raise the bus speed, but another problem crops up. DDR2-800 memory is hard to make. A recent article says only 5% of chips are DDR2-800 capable. Therefore, even fewer will be DDR2-1066 read, and even less will hit DDR2-1333. The cost becomes less and less acceptable.

So something has to change. Dual channel DDR gave a nice bump in speed, so Quad Channel will have to come along. Yes, you'll have to buy memory in matched sets of 4.
But in the end, that's not going to be enough, they're going to need to do something smarter, so that means a more radical change to the FSB, and likely a replacement for something more dynamic.
And still there's more....

Intel reverted to Pentium 3 / Pentium M to make Core 2. So the new architecture here is really old architecture re-released with modern production values. It worked great. Had intel never made the P4 detour AMD and intel would be much farther along than where they are today.
But intel's every 2 year note isn't revolutionary at all. Every 2 years we have a die shrink and all the cores built together rather than slapped together is great. However at 20nm you start getting into some odd laws of physics. Portions of the core may no longer be shrinkable. Other portions can go lower.
Someone just showed a 1nm transistor. But at this point, you're now hard against the laws of physics. You reach the end of current tech and are forced to go quantum computing. There's no more left to go. That's fine, we're talking 10 years out there.

But intel back to the main issue, the data bus. HT or FSB it's what you live and die one.

Look at nVidia. Their databus just went wide. 384MB cards. ATI and nVidia have parralle down cold. So the GPU lives and dies by the rate at which data can be sent to them. Go to low and the GPU waits.

intel is already hitting this in the current generation and they have not touched the FSB in a long time. The last change was dual channel. THAT was a long time ago.
On intel, you have the chip talking to the memory controller across a dedicated outboard bus. As I recall it's a one way bus, which means you have to divide reads and writes across (soon) four cores. If your bus is running at 1GHz and your cores all reading and writing on each instruction then you really have each core running 128MHz. (128 X 2 (for read and write) x 4 (for cores)) . This isn't a perfect example, but it helps to illustrate the point.
THAT sounds pretty crappy, and it is, but it's not a real world case. Lots of stuff happens in the cache, some instructions take more than one cycle to execute, some data is written back to the cache to be uses again. Not everything has to go the outside world. But sooner or later stuff does, and now you run into the bottle neck issue.

So even if you move the FSB rate to 2Ghz you can see, it's only giving you the effect of 256MHz at a per cycle. So you need to get very wide and run very fast as you add cores, and each core you add causes more and more pain.

You can get as small and as fast as you want but until they solve this pain point you are still running into a Pentium 4 class design flaw.

Another case is AMD's 16 way systems (8 dual core systems). These can be made to run at all cores at near 100% but its only in benchmark type apps, in real world, these start to starve pretty quickly. AMD without question has the better bus interface, and even there the cores starve.

Something has to be done.

I've seen statements on pricing that says dual AMD FX dual cores will run in the $800 to $900 range for a pair of matched CPUs.

That's pretty interesting because it puts heavy price pressure on intel. If you look at the latest CPU performance graph over at Tom's and if the Core 2 Quad is a 70% increase over the Core 2 duo then you're looking at potnetially equal performance at equal price. The difference really being that you'll be able to upgrade to dual quads on the AMD but you'll likely be limited to only a faster single quad on intel's side.

It's WAY too early to say what that will do in the market and if any of what is being said by anyone is actually true, but it's one more interesting part of the puzzle.

It is as quad as the upcoming Core 2 pseudo-quad.

The HT system allows the processors to talk to each other directly in the same way the intel's two core2 duo's in one socket does. The only difference being the distance.

In addition, because each AMD cpu has it's own memory controller (I'm not sure if it controller per core, or controller per chip, can't remember and don't want to say it wrong) they have direct access to their own memory plus the other's memory.

This is why on an AMD multi CPU board you'll see multiple banks of ram. Some banks are for one processor, some banks are for the other. It greatly reduces memory contention across the board. So even though the cores will have less performance than the core 2 cores do, the lack of bandwidth restricitons may still mean greater performance overall.

Of course, that comes down to what is being run. The slowest intel or amd chip can beat the pants of the fastest amd or intel chip if you run the right benchmark. A single benchmark is meaningless. It all comes down to real world application, and there is a big difference between a gamers use of a chip and data bases use of a chip and web servers use of a chip and statisticians use of a chip.

Now, back to AMD/ATI and intel:

AMD wants on board graphics. They have chips that do onboard graphics, they are better (without question) than intel's onboard graphics. AND in theory, if they work it right, they can become co-processors to the AMD cpu.

ALSO, AMD has HyperTransport (soon 3.0). This means connecting FPU 'cores' (aka the graphics card) to CPU cores becomes a walk in the park.

Intel is already thinking this way, they showed off their prototype 80 'core' cpu, which was a bunch of FPU units hooked up to a few full cores. AMD in theory then could have the same thing in regular production in the coming year through early 2008.

Think about the concept here. An on board GPU would raise the price of a board by $5 or so. No big deal, but it raises the performance of the board well beyond what $200 extra in cpu power would. Now thinking about a more powerful gpu being dropped on to the board. Suddenly things are start look different.

Now, what if you expand what the GPU does. Borrow some GPU ideas for the AMD chip and some CPU ideas for the GPU and now you have a really interesting mess that could provide some very parallel performance.

But wait, there's more.

AMD has a new slot, basically it's a HT3.0 slot that lets you drop in anything and it has full access to the system like the CPU does. Take one of the G/C PU type chips put it in a card and you have massive co-processor.

Use the slot to bridge two computers and you have a double scale computer that shares all tasks and resources.

The whole point of these last few posts is that THINGS are changing fast, and the direction AMD is taking may not be what we've all been thinking. There may be a different plan here. A different idea on how computing could be done. It might be something that's been evolving over time or it might be something that they stumbled on.

In the end, what a 'cpu' is and how processing is done may look very different in the near term. Its worth paying attention to. It's worth looking past the marketing to see what really is going on. Intel and AMD have plans far beyond the the add a core and make it faster and smaller.

Intel is being intel, it seems to be sticking more and more into each chip. Huge caches of ram to overcome a bad bus design and all the processing power stuck into one chip.

AMD is building bigger chips too, but they don't have to use as much cache ram. This means smaller dies, more chips per wafer and higher yields because there are less gates to fail on a per chip basis. BUT they are building systems with idea that processing can happen outside the chip (e.g. the HT3.0 slot).

It's going to be very interesting indeed.

gzader / Member

The future of computing

by gzader on October 7, 2006 Comments