crytek boss says ps3 and 360 hold back pc

This topic is locked from further discussion.

Avatar image for Hakkai007
Hakkai007

4905

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#301 Hakkai007
Member since 2005 • 4905 Posts

[QUOTE="Hakkai007"]

I don't understand why you go by GFLOPs.....

They are widely inaccurate and usually only theoretical measurements.

There are many other important things that add to the performance of a GPU or CPU.

ronvalencia

From Beyond3D's forum, a developer was able to achieve practical ~1 TFlops (1000 GFlops) using Radeon HD 4870 for the SGEMM benchmark.

http://forum.beyond3d.com/showthread.php?t=54842

RV770 with 800 ALUs @ 750 mhz: 980 Gflop/s (81% utilization)
RV870 with 1600 ALUs @ 850 mhz: 2220 Gflop/s (81% utilization)

Oneshould able to see why AMD Bart (aka Radeon HD 6850/6870) has slighly cut-down SPU count, while delivering similar performance to 5850/5870.

Intel used SGEMM benchmark to show case thier Intel Larrabee. Intel Larrabee is dead on arrival with AMD Radeon HD 5870's release.

Rather than copy and paste another's view why don't you type your own and explain what relevancy it has with my post.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#302 ronvalencia
Member since 2008 • 29612 Posts

Rather than copy and paste another's view why don't you type your own and explain what relevancy it has with my post.

Hakkai007

I just countered your "theoretical" POV with practical SGEMM scores.

the·o·ret·i·cal/Adjective

1. Concerned with or involving the theory of a subject or area of study rather than its practical application: "a theoretical physicist".

2. Based on or calculated through theory rather than experience or practice: "a theoretical reformer of opinions.

Avatar image for Hakkai007
Hakkai007

4905

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#303 Hakkai007
Member since 2005 • 4905 Posts

the·o·ret·i·cal/Adjective

1. Concerned with or involving the theory of a subject or area of study rather than its practical application: "a theoretical physicist".

2. Based on or calculated through theory rather than experience or practice: "a theoretical reformer of opinions.

ronvalencia

Try comparing the GFLOPs of different generations of gpus or different companies.

The ATI 4730 has a GFLOP count of 896 while the 3870 has a GFLOP count of 496.

Does that mean the 4730 is 1.8 times faster than the 3870?

Nope the 3870 will trade blows with it.

The 8800gt has a GFLOP count of only 504 but it beats the 4730.

.

To make this more apparent the GTX 470 has a GFLOP count of 1088 while the 4870 is 1200.

And it is quite obvious the GTX 470 is much better.

The 6870 has a GFLOP count of 2016 does that mean it is almost twice the power of a GTX 470?

Nope it is actually weaker.

Avatar image for Dynafrom
Dynafrom

1027

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#304 Dynafrom
Member since 2003 • 1027 Posts

[QUOTE="Hakkai007"]

Rather than copy and paste another's view why don't you type your own and explain what relevancy it has with my post.

ronvalencia

I just countered your "theoretical" POV with practical SGEMM scores.

the·o·ret·i·cal/Adjective

1. Concerned with or involving the theory of a subject or area of study rather than its practical application: "a theoretical physicist".

2. Based on or calculated through theory rather than experience or practice: "a theoretical reformer of opinions.

Poor choice to use rated performance as any indicator performance. I'd rather look @ sustained performance. If it's anything like measuring cpu performance with garbage like having the entire program already in the cache and registers it's a pointless number looking at flops.
Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#305 ronvalencia
Member since 2008 • 29612 Posts

[QUOTE="ronvalencia"]

the·o·ret·i·cal/Adjective

1. Concerned with or involving the theory of a subject or area of study rather than its practical application: "a theoretical physicist".

2. Based on or calculated through theory rather than experience or practice: "a theoretical reformer of opinions.

Hakkai007

Try comparing the GFLOPs of different generations of gpus or different companies.

The ATI 4730 has a GFLOP count of 896 while the 3870 has a GFLOP count of 496.

Does that mean the 4730 is 1.8 times faster than the 3870?

Nope the 3870 will trade blows with it.

The 8800gt has a GFLOP count of only 504 but it beats the 4730.

.

To make this more apparent the GTX 470 has a GFLOP count of 1088 while the 4870 is 1200.

And it is quite obvious the GTX 470 is much better.

The 6870 has a GFLOP count of 2016 does that mean it is almost twice the power of a GTX 470?

Nope it is actually weaker.

Depends on the workload type.

As an example, SGEMM doesn't touch ROP related bottlenecks i.e. pure compute matrix workload. Geforce GTX 470 has 40 ROP units, while Radeon HD 6870 has 32 ROP units. Radeon HD 4730 has 8 ROP units while Radeon HD 3870 has 16 ROP units.

In the past, Intel has used SGEMM benchmarks for thier Intel Larrabee show demos.

Different workloads yields different performance results.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#306 ronvalencia
Member since 2008 • 29612 Posts

[QUOTE="ronvalencia"]

[QUOTE="Hakkai007"]

Rather than copy and paste another's view why don't you type your own and explain what relevancy it has with my post.

Dynafrom

I just countered your "theoretical" POV with practical SGEMM scores.

the·o·ret·i·cal/Adjective

1. Concerned with or involving the theory of a subject or area of study rather than its practical application: "a theoretical physicist".

2. Based on or calculated through theory rather than experience or practice: "a theoretical reformer of opinions.

Poor choice to use rated performance as any indicator performance. I'd rather look @ sustained performance. If it's anything like measuring cpu performance with garbage like having the entire program already in the cache and registers it's a pointless number looking at flops.

My post was for "theoretical" comment context.

Avatar image for imprezawrx500
imprezawrx500

19187

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#307 imprezawrx500
Member since 2004 • 19187 Posts
He also said that developers won't start putting more effort into PC until it starts pulling sales numbers close to what the consoles give them.6matt6
well devs need to stop trying to sell the same game with new maps every year before I'm going to pay full prices for new games. any single consoles hardly outsells pc, but both combined do, not really surprising two system outsell one.
Avatar image for Hakkai007
Hakkai007

4905

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#308 Hakkai007
Member since 2005 • 4905 Posts

Depends on the workload type.

As an example, SGEMM doesn't touch ROP related bottlenecks i.e. pure compute matrix workload. Geforce GTX 470 has 40 ROP units, while Radeon HD 6870 has 32 ROP units. Radeon HD 4730 has 8 ROP units while Radeon HD 3870 has 16 ROP units.

In the past, Intel has used SGEMM benchmarks for thier Intel Larrabee show demos.

Different workloads yields different performance results.

ronvalencia

But now you shifted the argument into something I wasn't even talking about.

My point was that solely relying on GFLOP count is not accurate.

And you just proved my point when you had to bring up ROP.

Avatar image for Leejjohno
Leejjohno

13897

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#309 Leejjohno
Member since 2005 • 13897 Posts

That's his way of being pre-emptive with the damage control. It just gives elitests something to scream about. I could say with some accuracy that piracy has held back and will continue to hold back PC however.

Either way they should play to the strengths of each platform.

Avatar image for AnnoyedDragon
AnnoyedDragon

9948

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#310 AnnoyedDragon
Member since 2006 • 9948 Posts

That's his way of being pre-emptive with the damage control. It just gives elitests something to scream about. I could say with some accuracy that piracy has held back and will continue to hold back PC however.

Either way they should play to the strengths of each platform.

Leejjohno

Piracy is driving more console developers onto PC than PC developers to consoles? Because if you look at the GameSpot spreadsheet, consoles have a higher cross platform to exclusive ratio.

Also saying 2GB is more than 256mb, and hence can hold more information, makes me elitist?

Avatar image for c0d1f1ed
c0d1f1ed

32

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#311 c0d1f1ed
Member since 2009 • 32 Posts

Both are in the mercy of semiconductor technology improvements. AMD Radeon HDs has access to AMD/GloFlo's high clock rate process tech.ronvalencia

Both CPUs and GPUs equally benefit from semiconductor technology improvements, but on top of that the CPU can still significantly increase the compute density while the GPU can not. Therefore is said that the GPU is at the mercy of semiconductor technology improvements, while the CPU has more possibilities to increase performance.

Register space for AMD Radeon HD 4870 is 2.5 Megabytes.ronvalencia

Exactly. And because register files are less dense than caches this takes up a considerable amount of die space. And it's getting worse. Complex HPC applications need a true stack for storing function arguments.

For instance the GF100 only has 1024 32-bit registers per scalar core. This may seem like a lot at first but note that they are shared by many strands to hide latency. And with RAM access latencies of around 600 clock cycles it's clear that this register file barely suffices and caches are needed to keep the number of RAM accesses down. It works out alright for graphics, but most HPC applications are a lot more demanding.

CPUs used in HPC systems typically have around 12 MB of cache. The performance would plummet with smaller caches. So GPUs need similar amounts to become capable of running anywhere near the same range of applications. In conclusion it's really difficult to prevent today's GPUs from stalling a lot for HPC applications, and they'll need to sacrifice compute density to increase efficiency.

AMD Fusion's GPU has fast access to CPU's cache and still has the benefits of a mid-range Radeon HD level 480 stream processors.ronvalencia

Which illustrates the point that the compute density drops fast when you add things to make the GPU capable of more generic computing, on an affordably sized chip.

AMD sais Llano will offer 500 GFLOPS. That's nowhere near the 3 TFLOPS number you've been touting about earlier, and much closer to the several hundred GFLOPS CPUs could offer in the near future. In fact it would only take a 6-core 3 GHz CPU with two 256-bit wide FMA units to achieve 576 GFLOPS. CPUs like that will only appear about a year after Llano, but as you can see the gap is closing pretty rapidly and it makes a lot of sense to consider fully unifying them.

GPU doesn't have worry about X86's instruction set mess.ronvalencia

It's not that big of a mess. Sure there are some oddball legacy instructions, but the majority has a nicely structured encoding format. The die area spent on x86 decoding is only a few percent, and actuall shrinks every generation. It also hasn't stopped Intel from achieving TFLOPS performance on early Larrabee prototypes.

GPUs have a bigger problem. Their instructions are not compacted so they need more storage. And there's still a hard limit on the kernel size. Also, to improve efficiency they need to be capable of running more kernels in parallel. This again requires additional instruction storage. So once again the only solution is to evolve toward a CPU architecture.

From Beyond3D's forum, a developer was able to achieve practical ~1 TFlops (1000 GFlops) using Radeon HD 4870 for the SGEMM benchmark.

...

Intel used SGEMM benchmark to show case thier Intel Larrabee. Intel Larrabee is dead on arrival with AMD Radeon HD 5870's release.ronvalencia

SGEMM is merely a synthetic benchmark. It's not much of an indication of practical performance of the entire system. There are plenty of things the CPU truely excels at, but are not showing you the big picture either.

Let me remind you again that AMD's GPUs have almost twice the GFLOPS ratings as NVIDIA's GPUs, but their results in games are comparable. It shows that running out of work is a very real problem, even for legacy graphics workloads! Whether it happens due to high register pressure or cache misses, GPUs need to sacrifice compute density to achieve greater efficiency at a larger range of applications.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#312 ronvalencia
Member since 2008 • 29612 Posts

Both CPUs and GPUs equally benefit from semiconductor technology improvements, but on top of that the CPU can still significantly increase the compute density while the GPU can not. Therefore is said that the GPU is at the mercy of semiconductor technology improvements, while the CPU has more possibilities to increase performance.

The GpGPU has a very high instruction count per cycle and can be increased via clock speed. For this to happen, AMD would need to shift thier GPU designs to GoFlo fabs or TSMC improves thier fab tech.

Exactly. And because register files are less dense than caches this takes up a considerable amount of die space. And it's getting worse. Complex HPC applications need a true stack for storing function arguments.

On die space usage, it depends on the skills of the engineers.

Intel Larrabee is over 600 mm^2 at 45 nm and ~1.7 billion transistors.

AMD Radeon HD 4870, 263mm^2 at 55 nm and 956 million transistors.

AMD Radeon HD 5870, 334 mm^ 2 at 40 nm and 2.15 billion transistors.

AMD Radeon HD 6870,225 mm^2 at 40nm and 1.7 billion transistors.

Even if we normalise the AMD Radeon HD part to 45nm, the AMD Radeon HD 5870 is still smaller.

Also, AMD Radeon HD 5870's die size is smaller than NV Geforce GTX460 at the same 40nm process.

For instance the GF100 only has 1024 32-bit registers per scalar core. This may seem like a lot at first but note that they are shared by many strands to hide latency. And with RAM access latencies of around 600 clock cycles it's clear that this register file barely suffices and caches are needed to keep the number of RAM accesses down. It works out alright for graphics, but most HPC applications are a lot more demanding.

GF100 is connected to faster GDDR5 memory. Unlike DDR2 SDRAM's uni-directional signal strobe, GDDR3 use bi-directional signal strobe i.e. better write and read ratios. DDR3 SDRAM follows DDR2 SDRAM when it comes signal strobe.

CPUs used in HPC systems typically have around 12 MB of cache. The performance would plummet with smaller caches. So GPUs need similar amounts to become capable of running anywhere near the same range of applications. In conclusion it's really difficult to prevent today's GPUs from stalling a lot for HPC applications, and they'll need to sacrifice compute density to increase efficiency.

GPUs will just increase thier register and thread count e.g.AMD Bart.AMD Cayman is the last 40nm TSMC fab part before switching to 28nm.

Which illustrates the point that the compute density drops fast when you add things to make the GPU capable of more generic computing, on an affordably sized chip.

AMD sais Llano will offer 500 GFLOPS. That's nowhere near the 3 TFLOPS number you've been touting about earlier, and much closer to the several hundred GFLOPS CPUs could offer in the near future. In fact it would only take a 6-core 3 GHz CPU with two 256-bit wide FMA units to achieve 576 GFLOPS. CPUs like that will only appear about a year after Llano, but as you can see the gap is closing pretty rapidly and it makes a lot of sense to consider fully unifying them.

The mobile Llano CGPU part consumes around 20 to 59 watts. It's lower than my Core i7-740M QC+AMD 5730M's 71 watt combo.

Llano could change btw i.e. switch from VLIW5 to Cayman's VLIW4 uArch.

It's not that big of a mess. Sure there are some oddball legacy instructions, but the majority has a nicely structured encoding format. The die area spent on x86 decoding is only a few percent, and actuall shrinks every generation. It also hasn't stopped Intel from achieving TFLOPS performance on early Larrabee prototypes.

On transistor count vs performance vs die size, Larrabee is a joke.

GPUs have a bigger problem. Their instructions are not compacted so they need more storage. And there's still a hard limit on the kernel size. Also, to improve efficiency they need to be capable of running more kernels in parallel. This again requires additional instruction storage. So once again the only solution is to evolve toward a CPU architecture.

...

No problem with AMD's Radeon HD logic packing engineering skills. Both NVIDIA and Intel just makes large/bloated cores.

SGEMM is merely a synthetic benchmark. It's not much of an indication of practical performance of the entire system. There are plenty of things the CPU truely excels at, but are not showing you the big picture either.

As stated above, AMD presents a balance view on this issue.

Let me remind you again that AMD's GPUs have almost twice the GFLOPS ratings as NVIDIA's GPUs, but their results in games are comparable.

As I stated eariler, there are other bottlenecks in the design e.g.raster rendering issue.

Unlike AMD Cypress and RV770, notice why AMD's Bart includes two Ultra-Threaded Dispach processor front-end blocks.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#313 ronvalencia
Member since 2008 • 29612 Posts

[QUOTE="ronvalencia"]

Depends on the workload type.

As an example, SGEMM doesn't touch ROP related bottlenecks i.e. pure compute matrix workload. Geforce GTX 470 has 40 ROP units, while Radeon HD 6870 has 32 ROP units. Radeon HD 4730 has 8 ROP units while Radeon HD 3870 has 16 ROP units.

In the past, Intel has used SGEMM benchmarks for thier Intel Larrabee show demos.

...

Different workloads yields different performance results.

Hakkai007

But now you shifted the argument into something I wasn't even talking about.

My point was that solely relying on GFLOP count is not accurate.

And you just proved my point when you had to bring up ROP.

In this thread, there are two issues involved i.e. raster rendering vs general processing.

Avatar image for Hakkai007
Hakkai007

4905

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#314 Hakkai007
Member since 2005 • 4905 Posts

In this thread, there are two issues involved i.e. raster rendering vs general processing.

ronvalencia

Actually it was about the thread title...

Also why did you try arguing against my point then?

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#315 ronvalencia
Member since 2008 • 29612 Posts

[QUOTE="ronvalencia"]

In this thread, there are two issues involved i.e. raster rendering vs general processing.

Hakkai007

Actually it was about the thread title...

Also why did you try arguing against my point then?

It was started with Unreal99 CPU software render comment and ended with non-raster render workloads sub-thread.

Avatar image for HuusAsking
HuusAsking

15270

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#316 HuusAsking
Member since 2006 • 15270 Posts

[QUOTE="Hakkai007"]

[QUOTE="ronvalencia"]

In this thread, there are two issues involved i.e. raster rendering vs general processing.

ronvalencia

Actually it was about the thread title...

Also why did you try arguing against my point then?

It was started with Unreal99 CPU software render comment and ended with non-raster render workloads sub-thread.

The whole argument began over whether or not a CPU could realistically do the same thing a GPU does. c0d1f1ed says, yes, it can; the CPU simply needs a few vector-related additions (like a scatter/gather unit) for it to be a race, and with its general-purpose architecture, it'd be more accommodating of more-novel techniques. Others, like myself, see the gap as still too wide. CPUs and GPUs are limited not just by themselves but also by the boards on which they sit: their memory, bandwidth, and so on. They're also limited by programming constratints, and it's arguable whether or not those imposed by the GPU can be overcome to produce more-efficient results (GPUs currently suffer from a lack of efficiency but have enough raw horse power to make up the difference at present). The debate is ongoing.
Avatar image for savagetwinkie
savagetwinkie

7981

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#317 savagetwinkie
Member since 2008 • 7981 Posts

[QUOTE="lazerface216"]

i love how people don't use this same reasoning for used games on this board...

obviously there's no guarantee that a person would buy a game if it weren't easily available for cheap used. yet ignorant posters on this site try to compare piracy with the rental and used game industry.

sorry for the rant...

lundy86_4

:P No problem.

Whilst obviously the used game industry is an issue, I personally couldn't compare it to piracy due to one being legal and one not. From a finance perspective, it obviously impacts the dev, but it certainly is no guarantee that if a game wasn't available used, that an individual would buy the game new. Unless the prices were comparable for new and used (such as with Gamestop/EBGames).

piracy impacts the dev FAR more, though, you don't need more then 1 copy to circulate through through millions of people, they each can play it at the same time without stopping any one else,

on the used sales end, 1 copy can only be played by 1 person, it can circulate through but if someone has that copy the next person will have to go buy a new copy, adding another to the used sales eventually, the only thing this really cuts off is more long term sales, they'll eventually be enough used copies circulating around so new copies won't be needed anymore.

Its actually still unfair for the dev though, if you look at other used sales from other products, like a car, games don't have the same wear and tear. With games used copies get saturated eventually and new one's aren't necessary, with mostly everything there is a wear and tear factor where eventually the used product breaks and new one's still need to be brought in eventually.

Avatar image for savagetwinkie
savagetwinkie

7981

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#318 savagetwinkie
Member since 2008 • 7981 Posts
[QUOTE="HuusAsking"][QUOTE="ronvalencia"]

[QUOTE="Hakkai007"]

Actually it was about the thread title...

Also why did you try arguing against my point then?

It was started with Unreal99 CPU software render comment and ended with non-raster render workloads sub-thread.

The whole argument began over whether or not a CPU could realistically do the same thing a GPU does. c0d1f1ed says, yes, it can; the CPU simply needs a few vector-related additions (like a scatter/gather unit) for it to be a race, and with its general-purpose architecture, it'd be more accommodating of more-novel techniques. Others, like myself, see the gap as still too wide. CPUs and GPUs are limited not just by themselves but also by the boards on which they sit: their memory, bandwidth, and so on. They're also limited by programming constratints, and it's arguable whether or not those imposed by the GPU can be overcome to produce more-efficient results (GPUs currently suffer from a lack of efficiency but have enough raw horse power to make up the difference at present). The debate is ongoing.

aren't the 470's c capable? with the x86 instruction set? its not a matter of making a CPU into a gpu, gpu's are far more complex, but why not make a cpu into a GPU and call it a day? Regardless of how it happens, cpus/gpus will merge, look at the 360 slim, they are both on the same die already, secondly, you won't get efficiency on gpus because of the HAL on PC's, basically directx will always limit the GPU's so they can be generalized and you won't need to know the hardware specifics about what your developing on.
Avatar image for seabiscuit8686
seabiscuit8686

2862

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#319 seabiscuit8686
Member since 2005 • 2862 Posts
[QUOTE="locopatho"]How? Let everyone make PC games if they are so awesome in every way?doom1marine
Developers are stupid and buying into the hype console games sell better LOL.

Yes, because companies often look at "hype" in determining sales figures...
Avatar image for HuusAsking
HuusAsking

15270

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#320 HuusAsking
Member since 2006 • 15270 Posts

[QUOTE="HuusAsking"][QUOTE="ronvalencia"] It was started with Unreal99 CPU software render comment and ended with non-raster render workloads sub-thread.

savagetwinkie

The whole argument began over whether or not a CPU could realistically do the same thing a GPU does. c0d1f1ed says, yes, it can; the CPU simply needs a few vector-related additions (like a scatter/gather unit) for it to be a race, and with its general-purpose architecture, it'd be more accommodating of more-novel techniques. Others, like myself, see the gap as still too wide. CPUs and GPUs are limited not just by themselves but also by the boards on which they sit: their memory, bandwidth, and so on. They're also limited by programming constratints, and it's arguable whether or not those imposed by the GPU can be overcome to produce more-efficient results (GPUs currently suffer from a lack of efficiency but have enough raw horse power to make up the difference at present). The debate is ongoing.

aren't the 470's c capable? with the x86 instruction set? its not a matter of making a CPU into a gpu, gpu's are far more complex, but why not make a cpu into a GPU and call it a day? Regardless of how it happens, cpus/gpus will merge, look at the 360 slim, they are both on the same die already, secondly, you won't get efficiency on gpus because of the HAL on PC's, basically directx will always limit the GPU's so they can be generalized and you won't need to know the hardware specifics about what your developing on.

CUDA is written in C as well, but you still have to take limitations into consideration or your program chugs on GPUs. That's why there hadn't been an efficient AVC encoder that could be run on GPUs as of late: the motion-estimation that is the core of the codec is a memory-divergent operation: not friendly to GPU constraints. As for DirectX, you have to consider who contributes to the features being added to DirectX: the graphics chip makers themselves. Since they provide input on what should go in there, they're aware of what's coming and can design their chips and drivers around them to make them very efficient. DirectX 10 and 11 additions were mostly at the suggestion of nVidia and AMD. As for the console game makers, they always have the option of going closer to the metal since they know the exact specs of the console.

Avatar image for WilliamRLBaker
WilliamRLBaker

28915

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#321 WilliamRLBaker
Member since 2006 • 28915 Posts

:roll: Or him and the other ""PC"" developers could give up the money they get which is FAR more then they get from PC sales and start making PC games for PC.... Its funny in the fact its not the fault of the consoles its the fault of the PC developers that are sick of getting no money from PC sales.

Avatar image for 04dcarraher
04dcarraher

23858

Forum Posts

0

Wiki Points

0

Followers

Reviews: 2

User Lists: 0

#322 04dcarraher
Member since 2004 • 23858 Posts
[QUOTE="WilliamRLBaker"]

:roll: Or him and the other ""PC"" developers could give up the money they get which is FAR more then they get from PC sales and start making PC games for PC.... Its funny in the fact its not the fault of the consoles its the fault of the PC developers that are sick of getting no money from PC sales.

That's bologna....... Pc gaming is larger then console gaming worldwide. then you have facts like Activision says last year that over 60% of their profits didnt come from console based sales but from Pc. Then you have NPD numbers showing US Pc game sales numbers at retail being over 40 million then finding out through digital distribution over 60% of games are bought that way (not including Steam). Havent you noticed that almost all devs are going multiplatform? even console only game makers are producing Pc versions and vis versa. They are trying to maximize their profits, its not because of Pc piracy or console renting/preown market ethier....
Avatar image for c0d1f1ed
c0d1f1ed

32

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#323 c0d1f1ed
Member since 2009 • 32 Posts

The GpGPU has a very high instruction count per cycle and can be increased via clock speed. For this to happen, AMD would need to shift thier GPU designs to GoFlo fabs or TSMC improves thier fab tech.ronvalencia

Clock speed has close to nothing to with the foundry. TSMC and Global Foundries have very competitive process technology. The real reason AMD's GPUs don't clock higher is the balance between switching activity and power consumption. It's a design choice, and you can't beat the laws of physics by changing foundry.

NVIDIA decided to have a separate clock domain for the compute cores to clock them higher, but the consequence is they could have only a significantly lower number of ALUs and had to considerably crank up the efficiency with other techniques that take die space. And even within the compute cores themselves, you need extra pipeline stages to achieve higher clock rates, which costs transistors too.

You simply can't significantly increase the clock speed without design consequences which lower compute density.

On die space usage, it depends on the skills of the engineers.

(...)

Also, AMD Radeon HD 5870's die size is smaller than NV Geforce GTX460 at the same 40nm process.ronvalencia

Its die size is slightly smaller, its transistor count slightly higher. Also, a Radeon HD 5870 costs a lot more than a GeForce GTX 460 due to yields (the latter has 1 of its 8 Shader Multiprocessors disabled). So you should probably compare it to an HD 5850 instead. The GTX 460 outperform that one at demanding games: http://www.bit-tech.net/hardware/graphics/2010/07/12/nvidia-geforce-gtx-460-graphics-card-review/7

That's 907 GFLOPS holding its own against 2088 GFLOPS. It perfectly proves that GFLOPS are not the be all and end all, even for graphics performance. I also expect the GTX 460 to be the most future-proof choice, as shaders continue to get more complex. AMD has some pretty great GPUs for today's games, but going forward their strategy of throwing ever more GFLOPS at the problem isn't going to hold much longer. And NVIDIA is already years ahead in the HPC market.

And sooner or later CPUs will also join the party. They continue to be indispensable in the HPC market, and through relatively small changes that dramatically increasing their compute density they can easily prevent the GPU from becoming the architecture of choice for throughput computing, and even invade the GPUs space at graphics tasks, starting with making IGPs useless...

[QUOTE="c0d1f1ed"]For instance the GF100 only has 1024 32-bit registers per scalar core. This may seem like a lot at first but note that they are shared by many strands to hide latency. And with RAM access latencies of around 600 clock cycles it's clear that this register file barely suffices and caches are needed to keep the number of RAM accesses down. It works out alright for graphics, but most HPC applications are a lot more demanding.ronvalencia

GF100 is connected to faster GDDR5 memory. Unlike DDR2 SDRAM's uni-directional signal strobe, GDDR3 use bi-directional signal strobe i.e. better write and read ratios. DDR3 SDRAM follows DDR2 SDRAM when it comes signal strobe.

Your point being?

GDDR5 is way slower than DDR3 when it comes to latency. For applications with large working sets and many branches this means GPUs need massive register files and caches to sustain high utilization. It's bordering the ridiculous really. At 4 kB of register space per core, GF104 has 12.5 million bits of register space. That's gotta be well over 100 million transistors. And that's even without counting the operand collectors and results queue in. Increase register space by any significant amount, and it will affect compute density. So its no wonder that in every GPU architecture review, register space has become a key parameter in evaluating how it might perform under various workloads. Some benchmarks even show that the same chips can be faster using GDDR3 versus GDDR5.

So the only solution is for GPUs to stop aggressively trading latency for bandwidth, and slowly but surely implement techniques borrowed from CPU architectures. Speculative execution, advanced prefetching, out-of-order execution, register forwarding, etc. it all takes a bit of die size of its own but can prevent the other components from growing out of proportion and help keep a high efficiency. This convergence will inevitably result in unification.

GPUs will just increase thier register and thread count e.g. AMD Bart. AMD Cayman is the last 40nm TSMC fab part before switching to 28nm.ronvalencia

Again, strand count is a negative feature. It's a bad thing for applications to have to submit massive amounts of independent work to achieve high utilization. "Batch batch batch" has long reached its peak, and applications want to diversify more, not less. Also, higher strand counts require more registers, which don't come for free. And they may actually be better off increasing cache size. Either way, compute density will have to go down to increase the efficiency.

The mobile Llano CGPU part consumes around 20 to 59 watts. It's lower than my Core i7-740M QC+AMD 5730M's 71 watt combo. Llano could change btw i.e. switch from VLIW5 to Cayman's VLIW4 uArch.ronvalencia

A dedicated sound processing chip would also consume less than using an audio driver running on the CPU. Yet they were already close to extinction years ago. Power consumption isn't that much of an obstacle, because like I said before dedicated chips are only sporadically used at full capacity.

When GPU manufacturers hinted at unifying the vertex and pixel pipelines, some analysts also claimed it would increase power consumption because the cores would be less specialized. They were probably right, but removing the bottleneck and extending the programmability more than compensated for the loss of dedicated efficiency. Also note that for a long time IGPs actually used the CPU for vertex processing. The CPUs back then had far worse performance/Watt, yet that didn't seem to bother them much. They just wanted the cheapest adequate graphics solution.

Last but not least, a growing number of people are content with the performance of software rendering. And that's with older CPUs. Multi-core, AVX, FMA and gather/scatter all considerably improve the experience beyond what is possible today. So I don't see any reason why anyone with a mainstream 500 GFLOPS CPU who doesn't consider gaming a top priority would shell out for a GPU.