crytek boss says ps3 and 360 hold back pc

This topic is locked from further discussion.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#251 ronvalencia
Member since 2008 • 29612 Posts

There is definitely no "large shift" toward GPUs in high performance computing. The CPU still reigns and has lots of trumps left to play. The GPU on the other hand has to trade raw performance for better generic processing support. So things are not likely to turn into the GPUs favor (SNIP) c0d1f1ed
http://en.wikipedia.org/wiki/TOP500

November 2010 list

The fastest HPC supercomputer (China) today is mostly powered by NVIDIA(USA) Tesla (Fermi) .

The 3rd fastest HPC supercomputer (China) today is mostly powered by NVIDIA (USA) Tesla .

The 4th fastest HPC supercomputer (Japan) today is mostly powered by NVIDIA (USA) Tesla (Fermi) .

Notice, the Amercian HPCs are avoiding the GPU based supercomputers.

Australia's own CSRIO is building it's own HPC supercomputer with NVIDIA's Tesla. When targeting TOP500 rankings,GpGPU reduces CPU nodes.

-------------------

http://www.green500.org/lists/2010/11/top/list.php

Green500's Top 10 most energy-efficient supercomputers in the world

1 PPC4x0 entry.

1 SPARC64 entry.

5 GpGPU (4 NV, 1 ATI) entries.

3 CELL entries.

Avatar image for theuncharted34
theuncharted34

14529

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#252 theuncharted34
Member since 2010 • 14529 Posts

uh huh. Crytek is the only thing holding back the pc. You didn't have to make crysis 2 more of corridoor shooter now did you?

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#253 ronvalencia
Member since 2008 • 29612 Posts

[QUOTE="yodogpollywog"]swiftshader makers dont know the cryengine as good as crytek....crytek could easily get better framerates in their engine if they created the software renderer.c0d1f1ed

Knowing how the engine works doesn't help drawing a triangle any faster.

They can only take the performance characteristics of a software renderer into account. For instance they can write CPU-friendly shaders, disable redundant operations, and take advantage of extensions. But note that these are application side changes, which would benefit any software renderer equally. If you know of any non-trivial renderer side changes that would significancly improve performance, I'd love to hear about it.

And while SwiftShader build 3383 is much faster than version 2.01 at rendering Crysis, all the improvements are unrelated to CryENGINE. So again with the utmost respect I doubt that the Crytek developers would be able to contribute anything engine-specific to the renderer side that would have an even greater effect than generic optimizations.

swiftshader seems pretty unoptimized dude, i was only hitting 97% cpu usage, but my framerate was 2?yodogpollywog

On my Core i7 920 @ 3.2 GHz, I'm getting an average of 17 FPS for the 64-bit Crysis benchmark, using SwiftShader build 3383. It'sreallyquite playable. And that's for a game that's still considered a graphical marvel.

Given that my CPU can do 102.4 SP GFLOPS, and your Athlon 64 X2 2.1 GHz can only do 16.8 GFLOPS and supports fewer vector instructions, a framerate of 2 sounds about right for your system. There's only so much you can do with that computing budget. CPUs are rapidly catching up on throughput performance, but older generations like that had a very low computing density.

i7 XE(QC) @ 3.3Ghz for SGEMM 101GFLOPs. 2.6Ghz version will be lower than 3.3Ghz version.

Avatar image for edo-tensei
edo-tensei

4581

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#254 edo-tensei
Member since 2007 • 4581 Posts

Please do tell us something we don't already know.

Avatar image for topgunmv
topgunmv

10880

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#255 topgunmv
Member since 2003 • 10880 Posts

[QUOTE="topgunmv"]

I think some developers are being shortsighted by only looking at the sales of a game in the first few months.

I personally just purchased stalker a few days ago, and that's from 2007.

They need to look at pc games as an investment, not a quick cash in.

SakusEnvoy

But they make much less revenue from games once they fall to $19.99 or $9.99. The first year of sales is always the most important, even for PC games.

They don't have to fork over a percentage of that like they do on consoles though. It's cheaper to develop for pc, they keep more profit per sale, and they sell much more steadily over time compared to console games.

Avatar image for HuusAsking
HuusAsking

15270

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#256 HuusAsking
Member since 2006 • 15270 Posts
[QUOTE="c0d1f1ed"]

[QUOTE="yodogpollywog"]swiftshader makers dont know the cryengine as good as crytek....crytek could easily get better framerates in their engine if they created the software renderer.ronvalencia

Knowing how the engine works doesn't help drawing a triangle any faster.

They can only take the performance characteristics of a software renderer into account. For instance they can write CPU-friendly shaders, disable redundant operations, and take advantage of extensions. But note that these are application side changes, which would benefit any software renderer equally. If you know of any non-trivial renderer side changes that would significancly improve performance, I'd love to hear about it.

And while SwiftShader build 3383 is much faster than version 2.01 at rendering Crysis, all the improvements are unrelated to CryENGINE. So again with the utmost respect I doubt that the Crytek developers would be able to contribute anything engine-specific to the renderer side that would have an even greater effect than generic optimizations.

swiftshader seems pretty unoptimized dude, i was only hitting 97% cpu usage, but my framerate was 2?yodogpollywog

On my Core i7 920 @ 3.2 GHz, I'm getting an average of 17 FPS for the 64-bit Crysis benchmark, using SwiftShader build 3383. It'sreallyquite playable. And that's for a game that's still considered a graphical marvel.

Given that my CPU can do 102.4 SP GFLOPS, and your Athlon 64 X2 2.1 GHz can only do 16.8 GFLOPS and supports fewer vector instructions, a framerate of 2 sounds about right for your system. There's only so much you can do with that computing budget. CPUs are rapidly catching up on throughput performance, but older generations like that had a very low computing density.

i7 XE(QC) @ 3.3Ghz for SGEMM 101GFLOPs. 2.6Ghz version be lower than 3.3Ghz version.

And current-generation GPUs are already into the teraFLOPs range (single-precision; double-precision expected in the next iteration).
Avatar image for c0d1f1ed
c0d1f1ed

32

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#257 c0d1f1ed
Member since 2009 • 32 Posts

AMD Radeon HD 4870 has

- 2.5 megabytes of register storage space.

- 16,000 x 128-bit registers per SM.

- 16,384 threads per chip.

- greater than 1,000 threads per SM.ronvalencia

What's your point?These are negative feature. You don't really want to be spending any transistors on registers, and you don't want that many strands in flight. It's just a necessity to have that much register space for GPUs because they're so terribly slow at processing threads. All arithmetic instructions have equal latency so they need more registers to store the intermediate results for longer, and they're also needed to bridge the astronomous RAM latency. Because GPU caches are too tiny to have any long term reuse, this happens quite often.

It's not so bad when looking purely atlegacy graphics. The data access patterns are very regular and there are little if any feedback loops. But that's changing now that triangles are getting really tiny, shaders contain ever more branches and unfiltered data accesses, and people try to use the GPU for a lot more than graphics. And it doesn't matter if you can execute 16k strands, if you need the results from a single strand really fast. Also with that many strands you have merely a handful of registers per strand. So either your code needs to be really simple, or you need to lower the number of strands and possibly end up with bubbles.

Even during full utilization the number of strands really don't mean anything. On a CPU you can execute any number of strands per thread. Register renaming, SMT and the fast caches make the low number of architectural registers not much of an issue. So registers space and strand counts are not something you can use to indicate any kind of superiority. The CPU can use the same execution model as the GPU, but it can also execute a single strand really fast. This will become critically important when moving beyond legacy graphics. GPUs that only rely on data parallelism will be slaughtered by GPUs that adopt CPU features, or by the CPU itself.

Each SM has 16 VLIW5 SPs. 16,000 registers isspread across 16 VLIW5 SPs i.e. 1000 x 128bit registers per VLIW5 SP. Your CELL SPE only has 128 x 128bit registers. 16 x 256 bit AVX is no where near RV770's VLIW5 register data storage.

Radeon HD 4870 has 10 SM blocks. Each VLIW5 SP includes 5 scalar stream processors. Math: 16 x 10 x 5 = 800 stream processors. At a given transistor count, normal CPU doesn't have this execution unit count density.

According to AMD's road map, your "mark my words" means nothing. VMX/SSE/SPE floating point transistor count usage is NOT efficient compared to RV770's VLIW5 unit.

Direct competitor against desktop Intel Sandybridge (CGPU) is AMD Llano (CGPU) which includes about 480 stream processors and +500 GFLOPs.ronvalencia

Again, register space doesn't say a darn thing about performance. A quad-core CPU with AVX can sustain 200 SP GFLOPS for kernels with high register pressure, with far fewer physical registers than a GPU would need to sustain the same performance.

And a high execution unit count density doesn't guarantee high performance in practice either. AMD's GPUs are regularly outperformed by NVIDIA GPUs with half the peak GFLOPS. It clearly indicates that AMD relies on fast burst execution and not high utilization, which won't work in the long run due to Amdahl's Law and increasing task diversity.

Now imagine that Llano didn't have a CPU+GPU on the same die, but a CPU with twice the number of modules. The floating-point performance would be in the range of 400 SP GFLOPS, but due to higher utilization it could in practice match or exceed the performance of having lots of stream processors. But most importantly, it wouldn't just be useful for legacy graphics alone.

Anyway, that only covers arithmetic operations. GPUs with the same GFLOPS performance still have a significant lead over CPUs at graphics due to their texture samplers, ROPs, and SFUs. But the addition of gather and scatter instructions would make a massive difference and again would also be useful for much more than legacy graphics. A lot of game developers are already sold on the idea of being able to write their own renderer without any API restrictions. This includes Tim Sweeney from EPIC, who claims the days of the GPUs and APIs as we know hem today are numbered (http://www.ubergizmo.com/15/archives/2009/08/tim_sweeney_the_end_of_the_gpu_roadmap.html).

Avatar image for drinogamer
drinogamer

94

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#258 drinogamer
Member since 2010 • 94 Posts
but if PC was as popular as people in here seem to make it out to be why wouldnt they just develop high end games exclusive to PC?
Avatar image for AnnoyedDragon
AnnoyedDragon

9948

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#259 AnnoyedDragon
Member since 2006 • 9948 Posts

but if PC was as popular as people in here seem to make it out to be why wouldnt they just develop high end games exclusive to PC? drinogamer

For the same reason they are a rarity on consoles as well.

Avatar image for drinogamer
drinogamer

94

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#260 drinogamer
Member since 2010 • 94 Posts

[QUOTE="drinogamer"]but if PC was as popular as people in here seem to make it out to be why wouldnt they just develop high end games exclusive to PC? AnnoyedDragon

For the same reason they are a rarity on consoles as well.

why does the crytek refer to console gaming if PC gaming is # 1 like I have seen in these threads? Do businesses usually worry about what the competition is doing in 2nd place and complain about 2nd and 3rd place as hindering them?
Avatar image for AnnoyedDragon
AnnoyedDragon

9948

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#261 AnnoyedDragon
Member since 2006 • 9948 Posts

why does the crytek refer to console gaming if PC gaming is # 1 like I have seen in these threads? Do businesses usually worry about what the competition is doing in 2nd place and complain about 2nd and 3rd place as hindering them?drinogamer

The impression from Crytek's complaints is they would rather be developing PC exclusives.

However, today's development costs are too high for targeting one platform to be practical.

Avatar image for drinogamer
drinogamer

94

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#262 drinogamer
Member since 2010 • 94 Posts

[QUOTE="drinogamer"]why does the crytek refer to console gaming if PC gaming is # 1 like I have seen in these threads? Do businesses usually worry about what the competition is doing in 2nd place and complain about 2nd and 3rd place as hindering them?AnnoyedDragon

The impression from Crytek's complaints is they would rather be developing PC exclusives.

However, today's development costs are too high for targeting one platform to be practical.

ok fair enough, wish they would make cutting edge games like Crysis for PC, thats the best kinda game for the PC and yet so far and few between,
Avatar image for NanoMan88
NanoMan88

1220

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#263 NanoMan88
Member since 2006 • 1220 Posts

[QUOTE="drinogamer"]why does the crytek refer to console gaming if PC gaming is # 1 like I have seen in these threads? Do businesses usually worry about what the competition is doing in 2nd place and complain about 2nd and 3rd place as hindering them?AnnoyedDragon

The impression from Crytek's complaints is they would rather be developing PC exclusives.

However, today's development costs are too high for targeting one platform to be practical.

This really with these comments they are not making friends on both sides. I hope when Crysis 2 comes out even console gamers can see it for what it is. An extremely pretty game that is super boring and has lackluster gameplay. Extra bonus points if it doesnt sell well and Crytek has no one to blame but themselves.

Avatar image for c0d1f1ed
c0d1f1ed

32

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#264 c0d1f1ed
Member since 2009 • 32 Posts

http://en.wikipedia.org/wiki/TOP500

November 2010 list

The fastest HPC supercomputer (China) today is mostly powered by NVIDIA(USA) Tesla (Fermi) .

The 3rd fastest HPC supercomputer (China) today is mostly powered by NVIDIA (USA) Tesla .

The 4th fastest HPC supercomputer (Japan) today is mostly powered by NVIDIA (USA) Tesla (Fermi) .ronvalencia

Clearly GPUs are great for boosting the statistics, but they haven't proven anything yet. For starters the majority of code simply doesn't run on them. Also note that the no. 1 supercomputer has two six-core 3 GHz CPUs for each Tesla GPU. You'd expect something a little more modest if they could really rely on the GPU to deliver the performance that got these computers in this list in the first place. Heck, why not one CPU and two Tesla's, doubling the lead?

Notice, the Amercian HPCs are avoiding the GPU based supercomputers.ronvalencia

Yeah, they don't fall for the hype. I also got the feeling that theChinese wanted to take the no. 1 spot for bragging rights, but will see very little actual benefit from the GPUs.

http://www.green500.org/lists/2010/11/top/list.php

Green500's Top 10 most energy-efficient supercomputers in the world

1 PPC4x0 entry.

1 SPARC64 entry.

5 GpGPU (4 NV, 1 ATI) entries.

3 CELL entries.ronvalencia

Actually a lot of mobile chips have better FLOPS/Watt ratings than that. Obviously they're not going to use those for HPC purposes, but my point is that you have to take this ranking with a grain of salt.

Things like programmability, flexibility, extendability, debugging features, etc. which can't be expressed in numbers, have a far greater influence on the long term success of an architecture.

Avatar image for c0d1f1ed
c0d1f1ed

32

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#265 c0d1f1ed
Member since 2009 • 32 Posts

As for GPUs moving towards CPU function, I beg to differ. We're discovering two distinct sets of computing that are hard to generalize. On the one hand, you have things that (for reasons of coherency and so on) need to be limited to a few robust threads. On the other hand, you have very simple tasks that are easy to parallelize. Their specialties are just about mutually exclusive. So we're seeing a generalized move towards asymmetric multicore CPUs (the Cell is an early attempt at a mainstream AMCCCPU). In future, we'll see CPUs with at least two different types of cores: the robust general purpose cores we see in CPUs today and the parallel-friendly cores we see in GPUs. They won't converge because parallel-friendly tasks aren't necessarily robust and there's also the matter of having to allocate chip real estate to the task. We're still trying to figure out the right ratio.HuusAsking

There are indeed differences in computing models, but I disagree that they can't converge on the same hardware architecture.

The comparison of CPUs versus GPUs is clouded by the observation that there's a massive difference in performance for graphics. On a CPU, implementing trilinearly filtered texure sampling without gather/scatter support takes roughly a hundred clock cycles, on a GPU it's one clock cycle. So it's no wonder that a GeForce GTX 460 achieves 170 FPS for the Crysis benchmark at low settings while only17 FPS was achieved using SwiftShader on a Core i7 965. At higher resolutions the gap wides.

But when we look at GPGPU results for things that only use arithmethic operations, and compare it to a multi-threaded vectorized implementation on the CPU, the gap is far smaller, if there's even a gap at all.

With AVX and gather/scatter, the CPU would be all-round more attractive to develop high performance applications for than the GPU. And in a later stage is can even be the architecture of choice for innovativegraphics.

Avatar image for HuusAsking
HuusAsking

15270

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#266 HuusAsking
Member since 2006 • 15270 Posts

Anyway, that only covers arithmetic operations. GPUs with the same GFLOPS performance still have a significant lead over CPUs at graphics due to their texture samplers, ROPs, and SFUs. But the addition of gather and scatter instructions would make a massive difference and again would also be useful for much more than legacy graphics. A lot of game developers are already sold on the idea of being able to write their own renderer without any API restrictions. This includes Tim Sweeney from EPIC, who claims the days of the GPUs and APIs as we know hem today are numbered (http://www.ubergizmo.com/15/archives/2009/08/tim_sweeney_the_end_of_the_gpu_roadmap.html).

c0d1f1ed

Interesting article. It was updated after the fact. Basically, the update points out (thanks to a few lengthy e-mail conversations) that any path to convergenge is going to be so full of road bumps as to be better to come at it from the other direction: IOW, just as CPUs can become more like GPUs, so too can GPUs can become more like CPUs. Sure, the top-end GPUs aren't built as efficiently, but with GPUs creeping more into mobile devices, efficiency is being addressed in evolutionary steps. If there is going to be a convergence of CPU and GPU ultimately, it'll probably be something between the two current implementations and will likely not happen for a while yet (probably two generations or so--it's taken GPUs ten years or so to get to where they are now and CPUs have moved away from a pure-clock-speed philosophy in the interim; evolution can be slow, even for a computer).

As for why the GPU count in the supercomputer list isn't so high, it's more a matter of coding efficiency, which is in fact a human issue in that the coding can be improved further with a better grasp of GPU-oriented programming (it's actually a pretty new thing, but the return on it is so great that we can afford early inefficiencies in the name of performance). CPUs had the same problem back when they went mainstream multicore, and they're only now ironing out most of the kinks. Anyway, CPUs are still necessary for their logic and control units (someone's got to direct the traffic, IOW).

PS. Looking further into the scatter/gather issue, from what I've read, it's limited in mainstream CPUs due to bandwidth issues. Thus why the SIMD instructions in Intel/AMD CPUs don't go very far with their "horizontal" operations. Within the architecture, features such as write-combining tend to iron out the rest of the issues before you run out of data streams (the CPU's MMU can only handle so many parallel memory accesses at a time). GPUs have different hookups to their internal memory, thus why they use graphic-oriented (high-bandwidth and low-latency, but it has to sit near the GPU on the card) GDDR3/4/5 SDRAM rather than the more-generalized DDR2/3 SDRAM on the motherboard.

Avatar image for HuusAsking
HuusAsking

15270

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#267 HuusAsking
Member since 2006 • 15270 Posts

[QUOTE="HuusAsking"]As for GPUs moving towards CPU function, I beg to differ. We're discovering two distinct sets of computing that are hard to generalize. On the one hand, you have things that (for reasons of coherency and so on) need to be limited to a few robust threads. On the other hand, you have very simple tasks that are easy to parallelize. Their specialties are just about mutually exclusive. So we're seeing a generalized move towards asymmetric multicore CPUs (the Cell is an early attempt at a mainstream AMCCCPU). In future, we'll see CPUs with at least two different types of cores: the robust general purpose cores we see in CPUs today and the parallel-friendly cores we see in GPUs. They won't converge because parallel-friendly tasks aren't necessarily robust and there's also the matter of having to allocate chip real estate to the task. We're still trying to figure out the right ratio.c0d1f1ed

There are indeed differences in computing models, but I disagree that they can't converge on the same hardware architecture.

The comparison of CPUs versus GPUs is clouded by the observation that there's a massive difference in performance for graphics. On a CPU, implementing trilinearly filtered texure sampling without gather/scatter support takes roughly a hundred clock cycles, on a GPU it's one clock cycle. So it's no wonder that a GeForce GTX 460 achieves 170 FPS for the Crysis benchmark at low settings while only17 FPS was achieved using SwiftShader on a Core i7 965. At higher resolutions the gap wides.

But when we look at GPGPU results for things that only use arithmethic operations, and compare it to a multi-threaded vectorized implementation on the CPU, the gap is far smaller, if there's even a gap at all.

With AVX and gather/scatter, the CPU would be all-round more attractive to develop high performance applications for than the GPU. And in a later stage is can even be the architecture of choice for innovativegraphics.

As I've said before, you keep saying Core i's are into the hundreds of GFPLOS, but I say GPUs are already into the TFLOPs range, and they're almost there at double-precision.
Avatar image for c0d1f1ed
c0d1f1ed

32

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#268 c0d1f1ed
Member since 2009 • 32 Posts

[QUOTE="c0d1f1ed"]

Anyway, that only covers arithmetic operations. GPUs with the same GFLOPS performance still have a significant lead over CPUs at graphics due to their texture samplers, ROPs, and SFUs. But the addition of gather and scatter instructions would make a massive difference and again would also be useful for much more than legacy graphics. A lot of game developers are already sold on the idea of being able to write their own renderer without any API restrictions. This includes Tim Sweeney from EPIC, who claims the days of the GPUs and APIs as we know hem today are numbered (http://www.ubergizmo.com/15/archives/2009/08/tim_sweeney_the_end_of_the_gpu_roadmap.html).

HuusAsking

Interesting article. It was updated after the fact. Basically, the update points out (thanks to a few lengthy e-mail conversations) that any path to convergenge is going to be so full of road bumps as to be better to come at it from the other direction: IOW, just as CPUs can become more like GPUs, so too can GPUs can become more like CPUs. Sure, the top-end GPUs aren't built as efficiently, but with GPUs creeping more into mobile devices, efficiency is being addressed in evolutionary steps. If there is going to be a convergence of CPU and GPU ultimately, it'll probably be something between the two current implementations and will likely not happen for a while yet (probably two generations or so--it's taken GPUs ten years or so to get to where they are now and CPUs have moved away from a pure-clock-speed philosophy in the interim; evolution can be slow, even for a computer).

The convergence is happening from both ends. But you'll see a CPU without a GPU way sooner than a GPU without a CPU. The thing is, as soon as CPUs can deliver adequate graphics performance, the IGP will completely vanish. Note that IGPs were never really about efficiency, they're mainly about price. People who buy a PC with an IGP don't really care about 3D graphics. They want to browse the web, stay in touch with their friends, do some office work, and maybe play some casual games. As long as it lets you do these things, the cheaper the IGP the better. But still, 95% of the time the IGP is idle and still a relatively expensive piece of silicon to be of so little use. So even if a CPU with gather/scatter won't be the summum of efficiency at graphics, if it can do what the IGP can do then the IGP has to go. I don't hear anyone with integrated audio codecs complain that the CPU is less efficient than a dedicated sound chip either...

Also note that first we only had discrete graphics cards. Then we also had IGPs on the motherboard that share system RAM. Then we had IGPs in the same package as the CPU. Very soon we'll have IGPs on the same die as the CPU. So we've already taken massive steps toward truely melting them together. The CPU already has plenty of GFLOPS compared to IGPs, it just lacks a few instructions to deal efficiently with non-consecutively stored data elements.

PS. Looking further into the scatter/gather issue, from what I've read, it's limited in mainstream CPUs due to bandwidth issues. Thus why the SIMD instructions in Intel/AMD CPUs don't go very far with their "horizontal" operations. Within the architecture, features such as write-combining tend to iron out the rest of the issues before you run out of data streams (the CPU's MMU can only handle so many parallel memory accesses at a time). GPUs have different hookups to their internal memory, thus why they use graphic-oriented (high-bandwidth and low-latency, but it has to sit near the GPU on the card) GDDR3/4/5 SDRAM rather than the more-generalized DDR2/3 SDRAM on the motherboard.HuusAsking

It's not a bandwidth issue. Just look at how gather/scatter was implementend on Larrabee. It merely needs to access the cache lines involved, once. So if all the data elements are from a single cache line, gather even becomes a simple load and shuffle operation. By using the texture swizzling technique, sampling will typically only access one or two cache lines, which is way faster than serially loading each texel. And gather/scatter are also useful to improve the efficiency of vertex stream accesses, parallel primitive setup, early z culling, raster operations, transcendental functions, etc. And that's just for legacy graphics.

So anyway we're much closer to CPUs capable of fulfilling the role of IGP than you might think. The really interesting question is what will happen afterwards. AMD doesn't care because it has both Fusion and dedicated graphics cards. But NVIDIA is feeling the heat and needs a CPU architecture fast. They're desperately trying to obtain a license to manufacture x86 CPUs, and for obvious reasons they are much more aggressive at making their GPUs succeed in the HPC market...

Avatar image for Brownesque
Brownesque

5660

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#269 Brownesque
Member since 2005 • 5660 Posts
[QUOTE="doom1marine"]Crytek boss Cervat Yerli has claimed that developers' focus on PS3 and 360 is holding back game quality on PC - a format he believes is already "a generation ahead" of modern day consoles. http://www.computerandvideogames.com/article.php?id=277729

Correct course of action: stop developing games for consoles then, Mr. Yerli. You're supporting them, so stop complaining about them.
Avatar image for c0d1f1ed
c0d1f1ed

32

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#270 c0d1f1ed
Member since 2009 • 32 Posts

As I've said before, you keep saying Core i's are into the hundreds of GFPLOS, but I say GPUs are already into the TFLOPs range, and they're almost there at double-precision.HuusAsking

A pair of 3 GHz 8-core Sandy Bridge-EP based Xeon processors can deliver 768 GFLOPS. That's nothing to sneeze at, and should be available as early as Q3 2011. And because of the CPUs flexibility and programmability this could already be more attractive than a 1.5 TFLOPS Tesla. But that's not all...

The AVX specification already defines FMA instructions, which can double the floating-point performance. And it also reserves the bits to extend it to 512-bit and even 1024-bit vector operations. So clearly they have long term high performance plans for it.

Avatar image for 04dcarraher
04dcarraher

23858

Forum Posts

0

Wiki Points

0

Followers

Reviews: 2

User Lists: 0

#271 04dcarraher
Member since 2004 • 23858 Posts

[QUOTE="HuusAsking"]As I've said before, you keep saying Core i's are into the hundreds of GFPLOS, but I say GPUs are already into the TFLOPs range, and they're almost there at double-precision.c0d1f1ed

A pair of 3 GHz 8-core Sandy Bridge-EP based Xeon processors can deliver 768 GFLOPS. That's nothing to sneeze at, and should be availableas early asQ3 2011. And because of the CPUs flexibility and programmability this could already be more attractive than a 1.5 TFLOPS Tesla.But that's not all...

The AVX specification already defines FMA instructions, which candouble the floating-point performance. And it also reserves the bits toextend itto 512-bit and even 1024-bit vector operations. So clearly they have long term high performance plans for it.

So we would need $2000+ worth of cpu's just to get even into what current gpu's can do.... Software based rendering with cpu's for games will be a ball and chain until they start moving away from the central processing unit methods and into parallel processing.

Avatar image for c0d1f1ed
c0d1f1ed

32

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#272 c0d1f1ed
Member since 2009 • 32 Posts

[QUOTE="c0d1f1ed"]

[QUOTE="HuusAsking"]As I've said before, you keep saying Core i's are into the hundreds of GFPLOS, but I say GPUs are already into the TFLOPs range, and they're almost there at double-precision.04dcarraher

A pair of 3 GHz 8-core Sandy Bridge-EP based Xeon processors can deliver 768 GFLOPS. That's nothing to sneeze at, and should be availableas early asQ3 2011. And because of the CPUs flexibility and programmability this could already be more attractive than a 1.5 TFLOPS Tesla.But that's not all...

The AVX specification already defines FMA instructions, which candouble the floating-point performance. And it also reserves the bits toextend itto 512-bit and even 1024-bit vector operations. So clearly they have long term high performance plans for it.

So we would need $2000+ worth of cpu's just to get even into what current gpu's can do....

Yes, and a Tesla C/M2050 costs 2500 dollar. So what's your point?

Software based rendering with cpu's for games will be a ball and chain until they start moving away from the central processing unit methods and into parallel processing.04dcarraher

Exactly how is using multi-core and vector operations not parallel processing?

Avatar image for Hakkai007
Hakkai007

4905

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#273 Hakkai007
Member since 2005 • 4905 Posts

The impression from Crytek's complaints is they would rather be developing PC exclusives.

However, today's development costs are too high for targeting one platform to be practical.

AnnoyedDragon

ProjecktCD developed The Witcher 2 engine and is making the game with a budget of only 8 million.

Maybe developers should learn to keep budgets lower.

Avatar image for AnnoyedDragon
AnnoyedDragon

9948

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#274 AnnoyedDragon
Member since 2006 • 9948 Posts

ProjecktCD developed The Witcher 2 engine and is making the game with a budget of only 8 million.

Maybe developers should learn to keep budgets lower.

Hakkai007

You are suggesting that 8 million is not a lot.

Granted, waste plays a role, god knows how a 5 hour linear title such as MW2 uses $40-$50 million; excluding marketing and distribution. But keeping costs below 10 million in a high quality title is a exception to the rule, it does not disprove that costs have increased dramatically this gen; and will most likely do so next gen.

Avatar image for teuf_
Teuf_

30805

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#275 Teuf_
Member since 2004 • 30805 Posts

And yet quite a few people are very busy writing ray tracers using CUDA 3 or doing other innovative things with their GPUs. So keeping in mind that all old things are not replaced with the latestnew things overnight, that timeline isn't off by much.

Also, I soundly recall people saying things like "the GeForce 3 will render anything you canthrow at it", and "software renderers will never be able to run something like Oblivion". I'm just saying, things do change very rapidly. In several years computer graphics will again look very different. Whether we'll all be ray tracing or software rendering, or a bit of both, doesn't really change how impressive these advances are. At least to me...

c0d1f1ed



The Optix stuff is cool and people have done some neat things with it (my coworker wrote a pretty badass PRT engine using Optix for our baked lighting pipeline). But the reason I mention ray-tracing specifically is because it's something a lot of people envision as "the future", with no regard to the aspects in which it's just fundamentally less efficient than rasterization. I mean yeah it's super-convenient and makes it really simple and elegent to implement a lot of complex interactions, but it's never going to be faster than rasterization for primary visibility determination. Because of that I think hybrids will win out rather than rasterization dying just because it doesn't scale for all possible use cases.

Anyway the point is that I'm not sure if there's going to be a similar situation for GPU's vs. CPU's. Covergence a la Fusion makes sense to me, but I haven't totally bought the idea of doing everything on sea of general purpose cores. But then again I'm probably not even 1/10th as knowledgable as you are regarding CPU performance + future developments, so that probably says something. :P

Either way I'm sure software platforms will play a large part in shaping things. Once traditional GPU's and the D3D/GL pipeline go away there will be tons of room for flexibility (micropolygons!), but it's going to be harder to get a decently-performing 3D renderer off the ground.

Avatar image for Hakkai007
Hakkai007

4905

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#277 Hakkai007
Member since 2005 • 4905 Posts

You are suggesting that 8 million is not a lot.

Granted, waste plays a role, god knows how a 5 hour linear title such as MW2 uses $40-$50 million; excluding marketing and distribution. But keeping costs below 10 million in a high quality title is a exception to the rule, it does not disprove that costs have increased dramatically this gen; and will most likely do so next gen.

AnnoyedDragon

8 million is not much compared to the average budget a lot of games have.

And they are making a new game engine unlike the CoD series which just recycles.

But the costs are up to the developers or company funding them.

If they want to over spend for certain things then they can.

.

A game can sell millions and become popular without a high budget.

Movies have a high budgets too but ones like Star Wars (Not the new ones) were very low budget.

Heck the lightsabers were old pieces from pipes or cameras.

And yet the film is probably one of the most known series throughout the world.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#278 ronvalencia
Member since 2008 • 29612 Posts

@c0d1f1ed

I recall my gaming PCs still includes multi-core CPUs.

Actually a lot of mobile chips have better FLOPS/Watt ratings than that. Obviously they're not going to use those for HPC purposes, but my point is that you have to take this ranking with a grain of salt.

ronvalencia

Actually, GPUs also comes in mobile parts e.g. AMD Mobility Radeon HD 5730 consumes around 26 watts, while my IntelCore i7-740 Mobile Quad Core consumes around 45 watts.

AMD Mobility Radeon HD 5830 consumes around 25 watts i.e. lower than my laptop'sMobile Radeon HD 5730.

Things like programmability, flexibility, extendability, debugging features, etc. which can't be expressed in numbers, have a far greater influence on the long term success of an architecture.

c0d1f1ed

On the subject of debug, CUDA GDB say's Hi

Clearly GPUs are great for boosting the statistics, but they haven't proven anything yet. For starters the majority of code simply doesn't run on them. Also note that the no. 1 supercomputer has two six-core 3 GHz CPUs for each Tesla GPU. You'd expect something a little more modest if they could really rely on the GPU to deliver the performance that got these computers in this list in the first place. Heck, why not one CPU and two Tesla's, doubling the lead?

c0d1f1ed

The context in this topic is games. The "majority of code" is just red-herring.

Actually a lot of mobile chips have better FLOPS/Watt ratings than that. Obviously they're not going to use those for HPC purposes, but my point is that you have to take this ranking with a grain of salt.

c0d1f1ed

Actually, GPUs also comes in mobile parts e.g. AMD Mobile Radeon HD 5730 consumes around 26 watts, while my IntelCore i7-740 Mobile Quad Core consumes around 45 watts. AMD Mobile Radeon HD 5830 consumes around 25 watts i.e. lower than my Mobile Radeon HD 5730.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#279 ronvalencia
Member since 2008 • 29612 Posts

[QUOTE="04dcarraher"]

[QUOTE="c0d1f1ed"]

A pair of 3 GHz 8-core Sandy Bridge-EP based Xeon processors can deliver 768 GFLOPS. That's nothing to sneeze at, and should be availableas early asQ3 2011. And because of the CPUs flexibility and programmability this could already be more attractive than a 1.5 TFLOPS Tesla.But that's not all...

The AVX specification already defines FMA instructions, which candouble the floating-point performance. And it also reserves the bits toextend itto 512-bit and even 1024-bit vector operations. So clearly they have long term high performance plans for it.

c0d1f1ed

So we would need $2000+ worth of cpu's just to get even into what current gpu's can do....

Yes, and a Tesla C/M2050 costs 2500 dollar. So what's your point?

Software based rendering with cpu's for games will be a ball and chain until they start moving away from the central processing unit methods and into parallel processing.04dcarraher

Exactly how is using multi-core and vector operations not parallel processing?

Tesla C/M2050 may cost $2500, but Geforce Fermi equivalent doesn't cost that much.The real issue performance per dollar, performance per watts, performance per transisior count.

GpGPUs follows explicitly parallel instruction computing.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#280 ronvalencia
Member since 2008 • 29612 Posts

[QUOTE="HuusAsking"]As I've said before, you keep saying Core i's are into the hundreds of GFPLOS, but I say GPUs are already into the TFLOPs range, and they're almost there at double-precision.c0d1f1ed

A pair of 3 GHz 8-core Sandy Bridge-EP based Xeon processors can deliver 768 GFLOPS. That's nothing to sneeze at, and should be available as early as Q3 2011. And because of the CPUs flexibility and programmability this could already be more attractive than a 1.5 TFLOPS Tesla. But that's not all...

The AVX specification already defines FMA instructions, which can double the floating-point performance. And it also reserves the bits to extend it to 512-bit and even 1024-bit vector operations. So clearly they have long term high performance plans for it.

Intel Sandy Bridge has yet to implement hardware FMA. Theoretical peak performance (with AVX): 32 DP GFLOPS/core or 64 SP GFLOPS/core.

Q3 2011??? This is a joke, this year Dec 2010, AMD's Cayman (Radeon HD 6870) GpGPU has about 3 SP TFLOPS. Radeon HD 6990 has 6 SP TFLOPs.

Wide SIMD instruction set such as packed AVX 256bit and Larrabee 512bit has similar issues as with GpGPUs i.e. populating enough parallel data into data payload. Pray and tell if they can reach 26 watts part.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#281 ronvalencia
Member since 2008 • 29612 Posts

Yeah, they don't fall for the hype. I also got the feeling that theChinese wanted to take the no. 1 spot for bragging rights, but will see very little actual benefit from the GPUs.

c0d1f1ed

"but will see very little actual benefit from the GPUs" is BS. Refer to http://www.youtube.com/watch?v=BV5cSswg9uE

AU CSIRO's CPU-GPU supercomputer and it's usage.

--------------------

IBM to provide NVIDIA Tesla in HPC http://www.youtube.com/watch?v=T18j1dg9Bno&feature=related

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#282 ronvalencia
Member since 2008 • 29612 Posts

The convergence is happening from both ends. But you'll see a CPU without a GPU way sooner than a GPU without a CPU. The thing is, as soon as CPUs can deliver adequate graphics performance, the IGP will completely vanish. Note that IGPs were never really about efficiency, they're mainly about price.

c0d1f1ed

The reach a certain price point, that piece of silicon that to reach a certain performance at minimal bills of material cost and minimal energy consumption.

People who buy a PC with an IGP don't really care about 3D graphics. They want to browse the web, stay in touch with their friends, do some office work, and maybe play some casual games. As long as it lets you do these things, the cheaper the IGP the better. But still, 95% of the time the IGP is idle and still a relatively expensive piece of silicon to be of so little use. So even if a CPU with gather/scatter won't be the summum of efficiency at graphics, if it can do what the IGP can do then the IGP has to go. I don't hear anyone with integrated audio codecs complain that the CPU is less efficient than a dedicated sound chip either...

c0d1f1ed

DX9b+ IGPs is use for Windows Aero Glass/Aero Peak i.e. "95% of the time the IGP is idle" claimis wrong.

So anyway we're much closer to CPUs capable of fulfilling the role of IGP than you might think.

c0d1f1ed

On Swiftshader 3.0 at default settings, my Core i7-740M QC (1.73Ghz/2.9Ghz Turbo) get about 2-7 FPS in this Crysis scene i.e. this is the lowest resolution setting (800x480) on my laptop.It slows down during heavy foliage scenes e.g. around 2 FPS.

Your Core i7-920 QC is clocked at 2.66Ghz/2.9Ghz Turbo.

Crysis on Apple MacAir's Geforce 320M IGP.http://www.youtube.com/watch?v=m-bqbFRsgcI

Avatar image for dakan45
dakan45

18819

Forum Posts

0

Wiki Points

0

Followers

Reviews: 13

User Lists: 0

#283 dakan45
Member since 2009 • 18819 Posts
Cervat is captain obvious!! We already knew that. But the fact he is making a game that focuses on multiplatform system and so far it looks like it hardly looks better than the console versions....does not support what he is saying. Lets wait and see.
Avatar image for c0d1f1ed
c0d1f1ed

32

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#284 c0d1f1ed
Member since 2009 • 32 Posts

[QUOTE="c0d1f1ed"]

And yet quite a few people are very busy writing ray tracers using CUDA 3 or doing other innovative things with their GPUs. So keeping in mind that all old things are not replaced with the latestnew things overnight, that timeline isn't off by much.

Also, I soundly recall people saying things like "the GeForce 3 will render anything you canthrow at it", and "software renderers will never be able to run something like Oblivion". I'm just saying, things do change very rapidly. In several years computer graphics will again look very different. Whether we'll all be ray tracing or software rendering, or a bit of both, doesn't really change how impressive these advances are. At least to me...

Teufelhuhn



The Optix stuff is cool and people have done some neat things with it (my coworker wrote a pretty badass PRT engine using Optix for our baked lighting pipeline). But the reason I mention ray-tracing specifically is because it's something a lot of people envision as "the future", with no regard to the aspects in which it's just fundamentally less efficient than rasterization. I mean yeah it's super-convenient and makes it really simple and elegent to implement a lot of complex interactions, but it's never going to be faster than rasterization for primary visibility determination. Because of that I think hybrids will win out rather than rasterization dying just because it doesn't scale for all possible use cases.

I fully agree. Here's an interesting article that presents the same sobering view: http://www.beyond3d.com/content/articles/94/. But that doesn't take away that things already have and will continue to drift away from pure rasterization. The fact that ray tracing hasn't conquered the world yet and probably never will, doesn't mean CPUs and GPUs won't fully converge in the long run.

Anyway the point is that I'm not sure if there's going to be a similar situation for GPU's vs. CPU's. Covergence a la Fusion makes sense to me, but I haven't totally bought the idea of doing everything on sea of general purpose cores. But then again I'm probably not even 1/10th as knowledgable as you are regarding CPU performance + future developments, so that probably says something. :PTeufelhuhn

Just ask yourself, what features does a GPU have that a CPU can't possibly incorporate or have an equivalent for, ever? You'll quickly see that computing density is not a stumbling block. AVX can give each CPU core up to 16 times higher floating-point performance than today's cores. Texture samplers and ROPs will become programmable on the GPU sooner or later, so generic gather/scatter support is all that is needed for those. I don't know of anything else that would really allow the GPU to keep a significant lead at anything.

So really just because it's called a CPU doesn't mean it in herently has semiconductor constraints that make it impossible to get anywhere near what a GPU can do. Another aspect is how things are perceived by software developers. Do they rather stick to single-threaded development and let a GPU crunch through large parallel workloads, or will multi-threading once be as natural as object-oriented programming? It's probably a matter of opinion but again I think the CPU will prevail. We already have libraries that implement Direct3D, OpenGL, OpenCL, etc. which abstract things for the developers who don't want to deal with the multi-threading complications. Once again the CPU can offer anything the GPU offers, and even more. So it really doesn't matter if the main application only uses one or maybe a few threads, there's still an incentive to buy many-core CPUs because in the next years we'll see plenty of libraries and drivers make good use of those extra cores.

This may sound like wishful thinking, but history shows that software developers have never left an opportunity unused to gain an advantage over the competition.

Either way I'm sure software platforms will play a large part in shaping things. Once traditional GPU's and the D3D/GL pipeline go away there will be tons of room for flexibility (micropolygons!), but it's going to be harder to get a decently-performing 3D renderer off the ground.Teufelhuhn

I don't see why it would be harder. APIs won't suddenly vanisch just because the hardware can do a whole lot more. You'll still have the majority of game developers buying off-the-shelf complete engines, some will develop their own engine using the legacy APIs, and several will do totally revolutionary things by using the hardware directly. The important thing is that there's a choice, and a lot more diversity. It's interesting for consumers and creates opportunities for developers to really push the silicon to new heights.

Avatar image for HuusAsking
HuusAsking

15270

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#285 HuusAsking
Member since 2006 • 15270 Posts

Just ask yourself, what features does a GPU have that a CPU can't possibly incorporate or have an equivalent for, ever? You'll quickly see that computing density is not a stumbling block. AVX can give each CPU core up to 16 times higher floating-point performance than today's cores. Texture samplers and ROPs will become programmable on the GPU sooner or later, so generic gather/scatter support is all that is needed for those. I don't know of anything else that would really allow the GPU to keep a significant lead at anything.

So really just because it's called a CPU doesn't mean it in herently has semiconductor constraints that make it impossible to get anywhere near what a GPU can do. Another aspect is how things are perceived by software developers. Do they rather stick to single-threaded development and let a GPU crunch through large parallel workloads, or will multi-threading once be as natural as object-oriented programming? It's probably a matter of opinion but again I think the CPU will prevail. We already have libraries that implement Direct3D, OpenGL, OpenCL, etc. which abstract things for the developers who don't want to deal with the multi-threading complications. Once again the CPU can offer anything the GPU offers, and even more. So it really doesn't matter if the main application only uses one or maybe a few threads, there's still an incentive to buy many-core CPUs because in the next years we'll see plenty of libraries and drivers make good use of those extra cores.

This may sound like wishful thinking, but history shows that software developers have never left an opportunity unused to gain an advantage over the competition.c0d1f1ed

I would like to point out that only very, very recently has multicore programming become the norm. It takes a very different way of thinking to make a program that can reliably use multiple cores, and there are still some thing you just can't break up easily (like cycle-exact machine emulators--the timing on them is just too tight). Similarly, programming for a GPU takes another change of thinking that is only slowly sinking in. That's part of the reason GPUs aren't so efficient in HPC just yet--it's more a human thing than anything else.

On to another matter, you keep saying that all a CPU needs to reach GPU levels is a scatter/gather unit in its SIMD set. Which begs the question, "If that's all it takes, why isn't it already there?" Maybe that's why Larabee and it's supposed S/G unit was abandoned: because the answer isn't really that simple. You mentioned that it would just use the cache line. Wouldn't there be a problem, though, on a cache miss?

Avatar image for c0d1f1ed
c0d1f1ed

32

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#286 c0d1f1ed
Member since 2009 • 32 Posts

On the subject of debug, CUDA GDB say's Hironvalencia

I prefer Parallel Nsight. But anyway, it's not like the GPU isn't debuggable. The point is that it's quite complex to really understand what's going on. CUDA has blocks, warps, threads, constant memory, local memory, shared memory, etc. And there are already six different compute capabilities, which doesn't facilitate the matter when a client reports a bug. It would be much simpler for the developers to be offered the same debugging model as on the CPU.

[quote="c0d1f1ed"]Clearly GPUs are great for boosting the statistics, but they haven't proven anything yet. For starters the majority of code simply doesn't run on them. Also note that the no. 1 supercomputer has two six-core 3 GHz CPUs for each Tesla GPU. You'd expect something a little more modest if they could really rely on the GPU to deliver the performance that got these computers in this list in the first place. Heck, why not one CPU and two Tesla's, doubling the lead?ronvalencia

The context in this topic is games. The "majority of code" is just red-herring.

The context of this particular discussion is high performance computing. Truely the majority of code doesn't take advantage of the GPU. That may change, but until then it hasn't proven anything yet and only helps the statistics. Rewriting applications to make use of CUDA is far from easy, and looking at how many GPGPU applications barely outperfom a quad-core CPU, outperforming a pair of 6-core CPUs would really be no small feat.

And you could probably slap a million floating-point adders and multipliers on a chip, and rank no. 1 in the TOP500 list with just a handful of them. But LINPACK really just measures peak sustainable performance and not real world performance for a range of practical HPC applications.

Tesla C/M2050 may cost $2500, but Geforce Fermi equivalent doesn't cost that much.ronvalencia

You really can't compare an HPC chip with a consumer GPU. There's a huge difference in reliability. With a failure rate of a few percent, using thousands of consumer GPUs in a supercomputer would be totally unacceptable.

"but will see very little actual benefit from the GPUs" is BS. Refer to http://www.youtube.com/watch?v=BV5cSswg9uE

AU CSIRO's CPU-GPU supercomputer and it's usage.ronvalencia

First off, that video is sponsored by NVIDIA. So please use your marketing speak earplugs. They're claiming things to be "hundreds of times faster than they were used to". That doesn't say a darn thing without mentioning what the previous setup was. Also, their main application is visualization, which for the time being the GPU excells at but doesn't really give an indication of how it would perform for a wider variety of workloads.

DX9b+ IGPs is use for Windows Aero Glass/Aero Peak i.e. "95% of the time the IGP is idle" claimis wrong.ronvalencia

Windows throttles down the GPU for desktop rendering. So it's still making very minimal use of it.

Crysis on Apple MacAir's Geforce 320M IGP.http://www.youtube.com/watch?v=m-bqbFRsgcIronvalencia

Crysis on a GMA 4500M HD: http://www.youtube.com/watch?v=-LJBipxNlBY

Avatar image for rock_solid
rock_solid

5122

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#287 rock_solid
Member since 2003 • 5122 Posts
this crytek guy speaks the truth. i like him.
Avatar image for pimpog
pimpog

659

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#288 pimpog
Member since 2010 • 659 Posts

Crytek knows the truth if you build a game to take advantage of current pc hardware it will run poorly or not at all on console. So most devs use console as the starting point then port the game to pc. Sure the game looks great on pc but is not really pushing the hardware. To make the most money having your game for sale on pc and console is the way to go.

Avatar image for The_Gaming_Baby
The_Gaming_Baby

6425

Forum Posts

0

Wiki Points

0

Followers

Reviews: 117

User Lists: 52

#289 The_Gaming_Baby
Member since 2010 • 6425 Posts

Well of course they do, because most people use consoles and all dev's care about is money, which is fair enough considering their goal is to make as much money as possible

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#290 ronvalencia
Member since 2008 • 29612 Posts

I prefer Parallel Nsight. But anyway, it's not like the GPU isn't debuggable. The point is that it's quite complex to really understand what's going on. CUDA has blocks, warps, threads, constant memory, local memory, shared memory, etc. And there are already six different compute capabilities, which doesn't facilitate the matter when a client reports a bug. It would be much simpler for the developers to be offered the same debugging model as on the CPU.

http://www.youtube.com/watch?v=zGnaRM-Si8g Nsight + Visual Studio 2008

The context of this particular discussion is high performance computing. Truely the majority of code doesn't take advantage of the GPU. That may change, but until then it hasn't proven anything yet and only helps the statistics. Rewriting applications to make use of CUDA is far from easy, and looking at how many GPGPU applications barely outperfom a quad-core CPU, outperforming a pair of 6-core CPUs would really be no small feat.

Depends on the workload type. To maximize performance, you also need to modifiy the applications into AVX.

From Beyond3Dhttp://forum.beyond3d.com/archive/index.php/t-57839.htmlthere's a discussion on AVX's instruction set issues.

From rpg.314

a) AVX's lack of predication and scatter/gather gives it, ironically, a much more restricted programming model than modern GPU's. Frankly they should just ditch AVX for LRBni.

b) Last 4 years we have seen cores/socket growth slower than predicted by moore's law. I am not sure why it would change in the next 2 years.

From CarstenS

The point is: What they have done is rewriting the algorithms used in much the same way, people would have to do when going to GPU-space. The main problem being: If you need to rewrite your stuff, manually optimizing for parallelism, the main advantage of CPUs (drop-in-replacements) is gone and now you're rewriting everyhting and still are not at GPU-Perf-Levels by quite a margin.

From rpg.314

If you are relying on a vectorizing compiler, then without predication and scatter gather, the performance falloff will be pretty rapid for a large number of applications.

Either way, IMHO, avx is much less useful as you increase the vector width

And you could probably slap a million floating-point adders and multipliers on a chip, and rank no. 1 in the TOP500 list with just a handful of them. But LINPACK really just measures peak sustainable performance and not real world performance for a range of practical HPC applications.

What does CSIRO (a government department) say?

You really can't compare an HPC chip with a consumer GPU. There's a huge difference in reliability. With a failure rate of a few percent, using thousands of consumer GPUs in a supercomputer would be totally unacceptable.

My point was performance. The context in this topic is consumer.

First off, that video is sponsored by NVIDIA. So please use your marketing speak earplugs. They're claiming things to be "hundreds of times faster than they were used to". That doesn't say a darn thing without mentioning what the previous setup was. Also, their main application is visualization, which for the time being the GPU excells at but doesn't really give an indication of how it would perform for a wider variety of workloads.

The context in this topic is games e.g. rendering graphics.

Windows throttles down the GPU for desktop rendering. So it's still making very minimal use of it.

Crysis on a GMA 4500M HD: http://www.youtube.com/watch?v=-LJBipxNlBY

Crysis on Intel HD graphics http://www.youtube.com/watch?v=-7Q77bh-SM0

Avatar image for Rahnyc4
Rahnyc4

6660

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#291 Rahnyc4
Member since 2005 • 6660 Posts
i tell you one thing. even with my pc specs its still better than both ps3 and xbox 360. i realized how good my pc is graphically when i went and purchased a ps3 and hooked it up to a hd monitor. AA is very very important in HD gaming, real important. with out it hd gaming is fugly.
Avatar image for c0d1f1ed
c0d1f1ed

32

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#292 c0d1f1ed
Member since 2009 • 32 Posts

I would like to point out that only very, very recently has multicore programming become the norm.HuusAsking

Among the average application developers, yes. But that doesn't have to stop a CPU manufacturer from ditching the IGP and offering a multi-threaded software graphics driver, and won't stop high performance library developers from taking advantage of the parallel computing improvements either (e.g. sound and video decoding/encoding). Of course it's still going to take time for any of this to become mainstream, but the technology is already within reach today.

It takes a very different way of thinking to make a program that can reliably use multiple cores, and there are still some thing you just can't break up easily (like cycle-exact machine emulators--the timing on them is just too tight). Similarly, programming for a GPU takes another change of thinking that is only slowly sinking in. That's part of the reason GPUs aren't so efficient in HPC just yet--it's more a human thing than anything else.HuusAsking

Yes and no. Eventually threads and locks will be abstracted into tasks and dependencies, which will be a much more natural way of thinking about concurrency. Again I have to refer to object-oriented programming. Originally it was a really bizarre idea to think of code as objects. And early implementations were so inefficient that the concept itself was ridiculed. But nowadays it's hard to imagine developing and maintaining large codebases without object-oriented designs and implementations. It will again take time but the current research makes me hopeful that CPUs with dozens of cores are useful for a wide range of applications. Some of the advances will have to come from the hardware side (e.g. transactional memory), but for the most part it's a software problem that will require new tools and languages (think of a native implementation of SystemC). Anyway, for lots of libraries they don't have to wait that long. Working directly with threads and locks is hard but not impossible, and every gradual step is enough to keep up with the pace of increasing core numbers.

That said, the CPU supports these elementary building blocks in the most generic way. Therefore it allows different approaches and abstractions, depending on the situation. In contrast, the GPU forces you to use a single progamming model, with limited parameter ranges. If your algorithm happens to map badly to this model, you'll get very dissapointing performance. Broadening the possibilities will require introducing more CPU features, and will cost a bit of raw performance (relatively).

On to another matter, you keep saying that all a CPU needs to reach GPU levels is a scatter/gather unit in its SIMD set. Which begs the question, "If that's all it takes, why isn't it already there?" Maybe that's why Larabee and it's supposed S/G unit was abandoned: because the answer isn't really that simple.HuusAsking

It's not already there because previously the vectors weren't wide enough for it to make a difference. A 128-bit SSE register can only hold two double-precision floating-point numbers. So if they aren't stored consecutively in memory it's no big deal to load/store them sequentually.

It's also a chicken and egg problem. CPU manufacturers are hesitant to add support for complex operations if there's no hard proof that it significantly helps a wide range of applications, and sofware developers can't use them before they get their hands on the hardware. I recall though that several people have asked for it on the Intel Software Network forum a couple years ago, and the answer from the engineers was that ithad already been considered but it takes a great deal of research (determining exactly which instructions are needed, their impact on the rest of the architecture, the cache coherency and exception handling model, etc.) and the feature set for Haswell had already locked down.

The situation for Haswell would be pretty ridiculous though. It can perform an FMA operation involving 32 single-precision elements every clock cycle, but it would take 64 clock cycles to sequentially extract the address offset of individual elements and load/store them. Although this is the worst case scenario (typically only one vector will require non-consecutively stored data), it's clear that they can't ignore the issue any longer. And given their experience with Larrabee, I'm confident that the next CPU architecture will have gather/scatter support. AMD might even beat them to it.

On a side note, I believe that AVX was born out of desperation when Intel saw that GPUs were starting to achieve impressive LINPACK results, which could have far-stretching consequences (not just losing the top spot in the HPC market, but cause consumers to spend more of their budget on the GPU). The fastest way to increase the benchmark results is to widen the vectors and add FMA support. They probably decided to worry about getting good performance out of it for other applications later...

You mentioned that it would just use the cache line. Wouldn't there be a problem, though, on a cache miss?HuusAsking

No. An L1 cache miss triggers an L2 cache access, and an L2 cache miss triggers an L3 cache access, and an L3 cache miss triggers a RAM access, and a page miss triggers an interrupt which initiates a swap disk access. But no matter how many misses occur, in the end you always end up with a full cache line in the load/store unit. And if it's a gather operation it can gather all the elements from this cache line, and if necessary fetch more cache lines.

Note that load/store units already support a form of gather/scatter, for unaligned accesses. When you read a 16-bit variable, but the second byte happens to be on the next cache line, the CPU will access both cache lines sequentially. One could be in L1 cache while the other is swapped out on disk. Note that this means it actually wouldn't require a whole lot of extra logic to support fully generic vector gather/scatter.

Avatar image for c0d1f1ed
c0d1f1ed

32

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#293 c0d1f1ed
Member since 2009 • 32 Posts

Depends on the workload type. To maximize performance, you also need to modifiy the applications into AVX.ronvalencia

Applications that already use SSE can easily be extended to use AVX. For instance LLVM, which is used by one of SwiftShader's back ends, already adds support for AVX so developers can prepare their code for AVX even before the hardware is sold.

From Beyond3Dhttp://forum.beyond3d.com/archive/index.php/t-57839.htmlthere's a discussion on AVX's instruction set issues.ronvalencia

Thanks but I already read that back in June. Note that what started the whole discussion is a paper by Intel that concluded the following:

"In the past few years there have been many studies claiming GPUs deliver substantial speedups (between 10X and 1000X) over multi-core CPUs on these kernels. To understand where such large performance difference comes from, we perform a rigorous performance analysis and find that after applying optimizations appropriate for both CPUs and GPUs the performance gap between an Nvidia GTX280 processor and the Intel Core i7-960 processor narrows to only 2.5x on average."

Although you have to take that with a grain of salt as well, it is definitely correct that GPUs are not orders of magnitude faster than CPUs. A lot of GPGPU benchmarks shamelessly compare GPU code that has been worked on for months, with C code they haven't even bothered to enable SSE2 optimizations in the compiler for, let alone makes use of intrinsics.

I fully agree with the Beyond3D forum members that gather/scatter is still lacking. But there really isn't anything stopping them from adding it to AVX in due time. And when that happens the raw performance advantage of the GPU won't suffice to achieve superior performance in a wide range of applications.

My point was performance. The context in this topic is consumer.

The context in this topic is games e.g. rendering graphics.ronvalencia

No. The topic for this particular side discussion was HPC.

But if you really want to talk about the consumer market; a Phenom II X6 1055T costs as little as 179 USD. This will also buy you a GeForce GTX 460. DP performance is 67.2 versus 75.6 GFLOPS respectively, and for SP it's 134.4 versus 907.2 GFLOPS. A nice lead for the GPU in SP performance, but in practical applications with DP calculations the CPU will always outperform the GPU.

As early as January 9'th, you'll be able to buy a quad-core Sandy Bridge CPU that delivers 198.4 SP and 99.2 DP GFLOPS for only 184 USD. And lets not forget, this CPU has a 35% lower TDP than the GTX 460. So even for the theoretical numbers the GPU is not the clear winner. It will take FMA and gather/scatter support for the CPU to really catch up, but it's pretty clear that the GPU is not unrivaled.

Also note that it's not actually fair to compare the CPU against the GPU at price alone, because you always need a CPU anyway. Heck, you need a CPU to make your GPU worth anything at all! So we should probably compare a system with a 300$ CPU against a system with a 100$ CPU and a 200$ GPU. I'll leave that excercise up to you.

In conclusion the GPU will always be faster at specific applications, but the results drop rapidly when people start running a wider range of applications that the GPU wasn't designed to run. To achieve better performance at more generic tasks, they have no other choice but to implement CPU features, which costs computing density. Silicon is silicon and they play by the same rules. So in the long run it's inevitable that they converge.

Crysis on Intel HD graphics http://www.youtube.com/watch?v=-7Q77bh-SM0ronvalencia

Please try SwiftShader on a fast Core i7 with Crysis settings on 'high' (I don't know what those 'custom' settings are, I used 'high' for them as well). I'm getting an average of 3.5 FPS for the benchmark (on the second run - you have to let the shader caches warm up). That's not a lot but the HD Graphics doesn't appear to do much better.

So AVX with FMA and gather/scatter could totally render the IGP useless.

Avatar image for HuusAsking
HuusAsking

15270

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#294 HuusAsking
Member since 2006 • 15270 Posts

"In the past few years there have been many studies claiming GPUs deliver substantial speedups (between 10X and 1000X) over multi-core CPUs on these kernels. To understand where such large performance difference comes from, we perform a rigorous performance analysis and find that after applying optimizations appropriate for both CPUs and GPUs the performance gap between an Nvidia GTX280 processor and the Intel Core i7-960 processor narrows to only 2.5x on average."

c0d1f1ed

The thing is CPU tech may be jumping forward, but so is GPU tech. We already have a GPU capable of 1 TFLOP at double precision (the AMD HD5970), and that card is a GPU generation old. And unlike the nVidia card you cited, AMD chips supports double-precision better (because of greater awareness and demand for DP computation) and thus the double/single performance ratio is only about 1/4 (your example had about a 1/6 ratio); expect the ratio creep closer to the 1/2 near-ideal CPUs make. And GPU generations move fast: usually no more than a year between generations (thanks to the constant tug-of-war between AMD and nVIDIA on the GPU front).

Avatar image for c0d1f1ed
c0d1f1ed

32

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#295 c0d1f1ed
Member since 2009 • 32 Posts

The thing is CPU tech may be jumping forward, but so is GPU tech.HuusAsking

Not exactly. GPUs are now at the mercy of semiconductor technology improvements. But CPUs benefit from that as well so we can factor that out.

We already have a GPU capable of 1 TFLOP at double precision (the AMD HD5970), and that card is a GPU generation old.HuusAsking

Using twice the silicon at twice the price doesn't count as a jump forward. CPUs too can play that game.

If we factor out the semiconductor technology and the die size, GPUs have only made a giant leap forward thanks to spending an ever larger percentage of die space on compute cores. In the early GPUs the die size was dominated by memory controllers, texture units, and ROPs. Combining colors together barely took a few operations per pixel so not a lot of ALUs were needed. But then the shaders became much longer and the ALU:TEX ratio went from 1:2 to about 8:1. In only several years time the ALU density increased dramatically.

But this increase has come to a halt, and may even reverse. To increase the ALU density further they would have to cut down on other things, like memory controllers, registers, cache size, etc. But these components are vital to ensure that the ALUs stay fed with data. And as the complexity of the code continues to increase, the compute cores need to pack more features, and less data coherency means larger caches are needed. This means there's less room for ALUs.

CPUs on the other hand already have loads of features and gigantic caches. They have the opportunity to dramatically incease the compute density by using longer vectors and FMA, and to improve efficiency by implementing gather/scatter. Intel claims that Sandy Bridge supports 256-bit vector operations at minimal die size expense, and even found ways to reduce the die size of several components while at the same time improving their efficiency. Adding FMA and gather/scatter support also shouldn't have a big impact on die size. And because the cache stores shared data, it doesn't have to increase much when adding more cores in the future (it actually dropped from 2 x 6 MB for Yorkfield to 8 MB for Nehalem).

So in terms of GFLOPS the GPU is falling behind or at best clining on to Moore's Law, while the CPU will practically double the pace for the next several years. It's also quite interesting what AMD will do with Bulldozer. They'll double the core count with only a minor increase in die size. They achieve this by observing that the CPU's front-end is actually overdimensioned to avoid it to become a bottleneck, but it's actually underutilized most of the time. So by sharing a front-end between two slightly smaller cores, they could achieve higher effective throughput. It won't increase the GFLOPS rating because the floating-point units are also shared, but nevertheless it increases integer computation density which for some HPC applications is just as important.

And unlike the nVidia card you cited, AMD chips supports double-precision better (because of greater awareness and demand for DP computation) and thus the double/single performance ratio is only about 1/4 (your example had about a 1/6 ratio); expect the ratio creep closer to the 1/2 near-ideal CPUs make.HuusAsking

The ratio for GF104 is actually 1/12 because only one in six cores has a DP ALU and it has half the throughput. But anyway, the GPU can definitely still increase the compute density for double-precision calculations. It's going to cost transistors though, which is why GF104 has such minimal support for it, and it also has been blamed to be on of the reasons why GF100 needed a refresh in the form of GF110 to have acceptable thermal and power characteristics. For affordable consumer GPUs it may take a while for the ratio to go up, as sacrificing a bit of SP density in favor of DP density it makes it harder to compete at graphics.

Anyway you're probably right that it will happen sooner or later. But note that it's just one out of many things that sacrifice a little bit of raw SP performance in exchange for improved generic computing capabilities. It moves the GPU closer to the CPU. In particular changes like these make the GPU less graphics specific, so gradually it will lose its advantages and the CPU becomes just as good at complex graphics as a low-end GPU. At that point the CPU can start trading some single-threaded performance for higher compute density, and larger die sizes become marketable and will rival mid-end GPUs. It's doubtful that any company can survive on selling high-end GPUs alone, so in the long run it would just make more sense for the enthousiasts to buy a dual-CPU system...

And GPU generations move fast: usually no more than a year between generations (thanks to the constant tug-of-war between AMD and nVIDIA on the GPU front).HuusAsking

I wouldn't call a refresh a generation. Also, the staggered release of high-end and mid-end products makes it appear as if they move forward faster. But in reality the GeForce GTX 460 I recently bought barely has twice the GFLOPS as my previous two year old GeForce 8800 GTS 512, and was more expensive too. In the same period of time and for the same money you can now buy an efficient quad-core instead of a lower clocked dual-core.

So I'm not sure exactly where to put the tipping point but the breakneck performance increase of GPUs has definitely started stagnating, while the CPU still has loads of potential for rapid throughput improvements.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#296 ronvalencia
Member since 2008 • 29612 Posts

Applications that already use SSE can easily be extended to use AVX. For instance LLVM, which is used by one of SwiftShader's back ends, already adds support for AVX so developers can prepare their code for AVX even before the hardware is sold.

LLVM does not magically convert serial code into parallel code.

Thanks but I already read that back in June. Note that what started the whole discussion is a paper by Intel that concluded the following:

"In the past few years there have been many studies claiming GPUs deliver substantial speedups (between 10X and 1000X) over multi-core CPUs on these kernels. To understand where such large performance difference comes from, we perform a rigorous performance analysis and find that after applying optimizations appropriate for both CPUs and GPUs the performance gap between an Nvidia GTX280 processor and the Intel Core i7-960 processor narrows to only 2.5x on average."

Same issues with NVIDIA marketing. Anyway, GTX280 is already EOL. Such deduction irrelevant to this topic i.e. running current games.

Although you have to take that with a grain of salt as well, it is definitely correct that GPUs are not orders of magnitude faster than CPUs. A lot of GPGPU benchmarks shamelessly compare GPU code that has been worked on for months, with C code they haven't even bothered to enable SSE2 optimizations in the compiler for, let alone makes use of intrinsics.

I already know this e.g. I have posted PhysX X87 mess in this forum.

I fully agree with the Beyond3D forum members that gather/scatter is still lacking. But there really isn't anything stopping them from adding it to AVX in due time. And when that happens the raw performance advantage of the GPU won't suffice to achieve superior performance in a wide range of applications.

Unlike AMD, both Intel and NVIDIA doesn't havethe balance view. Only AMD Bulldozer has AVX on FMA hardware.

No. The topic for this particular side discussion was HPC.

No. before this topic was side tracked, the mainissue is about running the current games.

But if you really want to talk about the consumer market; a Phenom II X6 1055T costs as little as 179 USD. This will also buy you a GeForce GTX 460. DP performance is 67.2 versus 75.6 GFLOPS respectively, and for SP it's 134.4 versus 907.2 GFLOPS. A nice lead for the GPU in SP performance, but in practical applications with DP calculations the CPU will always outperform the GPU.

At this time, DP is almost useless in running most of the heavy workload type in the current games. If "CPU is so good", how come Ghostbuster's raytacing pass is done on GpGPU? I would like see it run on my 45 watt Intel Core i7QC Mobile btw.

AMD Phenom II X6 1055T's 134.4 GFLOPs consumes 125 or 95 watts while AMD Mobility Radeon HD5730's 520 GFLOPS consumes 26 watts. AMD Phenom II X4 mobile consumes around 35 watts to 45 watts.

As early as January 9'th, you'll be able to buy a quad-core Sandy Bridge CPU that delivers 198.4 SP and 99.2 DP GFLOPS for only 184 USD. And lets not forget, this CPU has a 35% lower TDP than the GTX 460. So even for the theoretical numbers the GPU is not the clear winner. It will take FMA and gather/scatter support for the CPU to really catch up, but it's pretty clear that the GPU is not unrivaled.

In pure GPU GFLOPs (e.g. SGEMM,DGEMM) race, refer to AMD Radeon HD GPUs instead.

Also note that it's not actually fair to compare the CPU against the GPU at price alone, because you always need a CPU anyway. Heck, you need a CPU to make your GPU worth anything at all! So we should probably compare a system with a 300$ CPU against a system with a 100$ CPU and a 200$ GPU. I'll leave that excercise up to you.

Funny, i bought a laptop with bothIntel Core i7 Quad CPU and AMD Mobility Radeon HD 5730 GpGPU. I didn't skimp on CPU side for a dual core mobile part.

In conclusion the GPU will always be faster at specific applications, but the results drop rapidly when people start running a wider range of applications that the GPU wasn't designed to run. To achieve better performance at more generic tasks, they have no other choice but to implement CPU features, which costs computing density. Silicon is silicon and they play by the same rules. So in the long run it's inevitable that they converge.

Hardware GPU doesn't have to worry about legacy ISA i.e.Radeon HD "Cayman" includes a new VLIW4based cores.

Please try SwiftShader on a fast Core i7 with Crysis settings on 'high' (I don't know what those 'custom' settings are, I used 'high' for them as well). I'm getting an average of 3.5 FPS for the benchmark (on the second run - you have to let the shader caches warm up). That's not a lot but the HD Graphics doesn't appear to do much better.

It's better on performance per watts i.e. it doesn't consume 130 watts to do it.

Intel Core i7-740 Mobile QC (45 watts) is about 68 percent of Intel Core i7-920 QC (130 watts).

So AVX with FMA and gather/scatter could totally render the IGP useless.

It depends on

1. performance per watt,

2. who made the IGP, AMD Fusion APU includes GpGPU i.e. includes 480 stream processor version.

3. bill of materials.

For pure Intel solutions, it doesn't matter if Intel removes it's IGP or use AVX since it's all in one chip anyway.

For my laptop, CPU+GPU has combined power consumption about 71 watts.

Your pure Intel Core i7-920 @130 watts idea is inferior this current setup. Factor in Core i7 6x0 Mobile with dual core and Intel IGP's 35 watt power consumption.

The future battles would be AMD Fusion (CGPU)vs Intel Sandybridge (CGPU).

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#297 ronvalencia
Member since 2008 • 29612 Posts

Not exactly. GPUs are now at the mercy of semiconductor technology improvements. But CPUs benefit from that as well so we can factor that out.

Both are in the mercy of semiconductor technology improvements. AMD Radeon HDs has access to AMD/GloFlo's high clock rate process tech.

Using twice the silicon at twice the price doesn't count as a jump forward. CPUs too can play that game.

AMD Radeon HD 5970 is still under 300 watts limit.

But this increase has come to a halt, and may even reverse. To increase the ALU density further they would have to cut down on other things, like memory controllers, registers, cache size, etc. But these components are vital to ensure that the ALUs stay fed with data. And as the complexity of the code continues to increase, the compute cores need to pack more features, and less data coherency means larger caches are needed. This means there's less room for ALUs.

Register space for AMD Radeon HD 4870 is 2.5 Megabytes.

CPUs on the other hand already have loads of features and gigantic caches. They have the opportunity to dramatically incease the compute density by using longer vectors and FMA, and to improve efficiency by implementing gather/scatter. Intel claims that Sandy Bridge supports 256-bit vector operations at minimal die size expense, and even found ways to reduce the die size of several components while at the same time improving their efficiency. Adding FMA and gather/scatter support also shouldn't have a big impact on die size. And because the cache stores shared data, it doesn't have to increase much when adding more cores in the future (it actually dropped from 2 x 6 MB for Yorkfield to 8 MB for Nehalem).

AMD Fusion's GPU has fast access to CPU's cache and still has the benefits of a mid-range Radeon HD level 480 stream processors.

So in terms of GFLOPS the GPU is falling behind or at best clining on to Moore's Law, while the CPU will practically double the pace for the next several years. It's also quite interesting what AMD will do with Bulldozer. They'll double the core count with only a minor increase in die size. They achieve this by observing that the CPU's front-end is actually overdimensioned to avoid it to become a bottleneck, but it's actually underutilized most of the time. So by sharing a front-end between two slightly smaller cores, they could achieve higher effective throughput. It won't increase the GFLOPS rating because the floating-point units are also shared, but nevertheless it increases integer computation density which for some HPC applications is just as important.

In the GFLOP increases, it depends on the GPU maker e.g. Radeon HD 3870 to 4870(1 TFLOPs) to 5870 (2 TFLOPs)to 6970 (3 TFLOPs).

AMD Bulldozer expands on Intel Core i7's two SSE ADD unit (one for each thread) idea.

Anyway you're probably right that it will happen sooner or later. But note that it's just one out of many things that sacrifice a little bit of raw SP performance in exchange for improved generic computing capabilities. It moves the GPU closer to the CPU. In particular changes like these make the GPU less graphics specific, so gradually it will lose its advantages and the CPU becomes just as good at complex graphics as a low-end GPU. At that point the CPU can start trading some single-threaded performance for higher compute density, and larger die sizes become marketable and will rival mid-end GPUs. It's doubtful that any company can survive on selling high-end GPUs alone, so in the long run it would just make more sense for the enthousiasts to buy a dual-CPU system...

GPU doesn't have worry about X86's instruction set mess.

Avatar image for Hakkai007
Hakkai007

4905

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#298 Hakkai007
Member since 2005 • 4905 Posts

I don't understand why you go by GFLOPs.....

They are widely inaccurate and usually only theoretical measurements.

There are many other important things that add to the performance of a GPU or CPU.

Avatar image for subrosian
subrosian

14232

Forum Posts

0

Wiki Points

0

Followers

Reviews: 7

User Lists: 0

#299 subrosian
Member since 2005 • 14232 Posts
[QUOTE="doom1marine"]Crytek boss Cervat Yerli has claimed that developers' focus on PS3 and 360 is holding back game quality on PC - a format he believes is already "a generation ahead" of modern day consoles. http://www.computerandvideogames.com/article.php?id=277729FreshPrinceUk
why is this guy stating the obvious?.... why ? why would he say something like this?

Because if you genuinely care about gaming it gets old dealing with all of the people who don't share your passion. Trust me, it gets old. At a certain point you just don't care anymore. What's the point of maintaining PR and basically sugar-coating everything and lying to people just to make them like you more? What are you supposed to say? "Oh you play CoD on a console, you're hardcore as hell!"? "No way man, the Xbox came out in 2005, that hardware is NOT outdated, Microsoft made the slim, that's cutting edge!". _ Is he supposed to just sit there quietly while the average person prattles on that they'll be perfectly happy with their 360 until 2015 just so long as they don't have to pay out a couple hundred bucks for new hardware? Watch the same garbage sell year after year and applaud it? It gets old, especially if you're the company that's supposed to push the envelope software-wise, to give the hardware guys the carrot to shoot for, and you're suddenly having to deal with that audience. - There's a wonderful thing about the move to commodity, fixed-hardware devices like the iPad. There's also a dark reality when you deal with hardware limitations that are just locked in place for huge periods of time, and eating up a huge share of your potential market. A reality that sucks if you dream at night, if you want something more than today, if you're an inventor, a creator, a designed, an architect of the future. - Because everyone's just living for today and you want something more. And now you're the bad guy for it.
Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#300 ronvalencia
Member since 2008 • 29612 Posts

I don't understand why you go by GFLOPs.....

They are widely inaccurate and usually only theoretical measurements.

There are many other important things that add to the performance of a GPU or CPU.

Hakkai007

From Beyond3D's forum, a developer was able to achieve practical ~1 TFlops (1000 GFlops) using Radeon HD 4870 for the SGEMM benchmark.

http://forum.beyond3d.com/showthread.php?t=54842

RV770 with 800 ALUs @ 750 mhz: 980 Gflop/s (81% utilization)
RV870 with 1600 ALUs @ 850 mhz: 2220 Gflop/s (81% utilization)

Oneshould able to see why AMD Bart (aka Radeon HD 6850/6870) has slighly cut-down SPU count, while delivering similar performance to 5850/5870.

Intel used SGEMM benchmark to show case thier Intel Larrabee. Intel Larrabee is dead on arrival with AMD Radeon HD 5870's release.