OK, so the PS4 is stronger than the XBOX ONE...

This topic is locked from further discussion.

Avatar image for Tessellation
Tessellation

9297

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#401 Tessellation
Member since 2009 • 9297 Posts

[QUOTE="AMD655"][QUOTE="Tessellation"]damn Ron stop with the ownage,no wonder why you get so easily under the basement dwellers skin..you prove them wrong with knowledge :cool:tormentos

Actually, he is wrong.

 

Careful or Tessellation may claim that he got under your skin..:lol:

old fart obsessing detected :cool: #STAYBUTTHURT
Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#402 ronvalencia
Member since 2008 • 29612 Posts

[QUOTE="ronvalencia"]

Actually, the 7770 has less usable power than the Xbox One since lesser GCNs with less internal SRAM storage has greater chances for compute overspill. 7770 also has less triangle rate than 7790 or 7850.

My 8870M takes a greater performance hit when I lower the memory speed to DDR3 levels i.e. AMD's GDDR5 bar graphs are applicable for these cases.

-----

7790 doesn't sport X1's memory bandwidth and the prototype 7850 (with 12 CUs) is a closer fit to the X1.

X1's GCN has Pitcairn's crossbar, 256bit memory controllers and related L2 cache. L2 cache bandwidth is important since this is used as a shared cache across 4 CUs.

7790/7790 has less I/O L2 cache bandwidth than the Pitcairn's version.

http://www.behardware.com/articles/848-4/amd-radeon-hd-7970-crossfirex-review-28nm-and-gcn.html

GCN Cache Hierarchy

With 128bit memory controllers, 7790/7770 has two L2 blocks. That's 64 bytes per cycle per L2 block x 2 = 128 bytes per cycle.

With 256bit memory controllers, 7850/7870 has four L2 blocks. That's 64 bytes per cycle per L2 block x 4 = 256 bytes per cycle.

tormentos

The broken song again,stop the damn 660Ti has 100Gb/s of bandwidth less than the 7950 and still out perform it in some test...:lol:

You have no argument,i demostrated how a GPU with 100Gb/s less bandwidth than another can still beat it,so the xbox one miserable 37gb/s over the 7790 should be even less of a problem.

The 7790 is very close to the 7850 even that it has 57Gb/s less in bandwidth,so yeah bandwidth is not the issues is power,so yeah the 7790 >> xbox one..

No matter how you slice it what you try to imply is idiotic,if you put 250Gb/s bandwidth on the xbox one would it beat the 7950.? Just because it has more bandwidth.?

7770/7790 @ 800Mhz, L2 cache I/O bandwidth: 128 bytes x 800Mhz = 95.367 GB/s External memory setup faster than 95.367 GB/s would be useless.

W5000/7850-768 @ 853hz, L2 cache I/O bandwidth: 256 bytes x 853Mhz = 203.37 GB/s. ESRAM was design for Pitcairn level I/O.

Avatar image for tormentos
tormentos

33793

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#403 tormentos
Member since 2003 • 33793 Posts

 

7770/7790 @ 800Mhz, L2 cache I/O bandwidth: 128 bytes x 800Mhz = 95.367 GB/s External memory setup faster than 95.367 GB/s would be useless.

W5000/7850-768 @ 853hz, L2 cache I/O bandwidth: 256 bytes x 853Mhz = 203.37 GB/s. ESRAM was design for Pitcairn level I/O.

ronvalencia

 

660Ti =144Gb/s

7950 = 240Gb/s..

 

Bandwidth mean nothing when you have no power,and the xbox one doesn't have it..

 

Trying to imply that the xbox one would need 203Gb/s is down right stupid it will not.

Avatar image for MisterMeek
MisterMeek

25

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#404 MisterMeek
Member since 2013 • 25 Posts

[QUOTE="ronvalencia"]

 

7770/7790 @ 800Mhz, L2 cache I/O bandwidth: 128 bytes x 800Mhz = 95.367 GB/s External memory setup faster than 95.367 GB/s would be useless.

W5000/7850-768 @ 853hz, L2 cache I/O bandwidth: 256 bytes x 853Mhz = 203.37 GB/s. ESRAM was design for Pitcairn level I/O.

tormentos

 

660Ti =144Gb/s

7950 = 240Gb/s..

 

Bandwidth mean nothing when you have no power,and the xbox one doesn't have it..

 

Trying to imply that the xbox one would need 203Gb/s is down right stupid it will not.

Xbox gpu is like Hawaii gpu. Not is but is like. Aaaaaaaannddddd go!
Avatar image for 04dcarraher
04dcarraher

23857

Forum Posts

0

Wiki Points

0

Followers

Reviews: 2

User Lists: 0

#405 04dcarraher
Member since 2004 • 23857 Posts

[QUOTE="ronvalencia"]

 

7770/7790 @ 800Mhz, L2 cache I/O bandwidth: 128 bytes x 800Mhz = 95.367 GB/s External memory setup faster than 95.367 GB/s would be useless.

W5000/7850-768 @ 853hz, L2 cache I/O bandwidth: 256 bytes x 853Mhz = 203.37 GB/s. ESRAM was design for Pitcairn level I/O.

tormentos

 

660Ti =144Gb/s

7950 = 240Gb/s..

 

Bandwidth mean nothing when you have no power,and the xbox one doesn't have it..

 

Trying to imply that the xbox one would need 203Gb/s is down right stupid it will not.

Memory bandwidth means squat after a certain point and the difference between the X1 with DDR3+esram and PS4's GDDR5 will be negligible. The difference seen will be the gpu's processing power
Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#406 ronvalencia
Member since 2008 • 29612 Posts

[QUOTE="ronvalencia"]

7770/7790 @ 800Mhz, L2 cache I/O bandwidth: 128 bytes x 800Mhz = 95.367 GB/s External memory setup faster than 95.367 GB/s would be useless.

W5000/7850-768 @ 853hz, L2 cache I/O bandwidth: 256 bytes x 853Mhz = 203.37 GB/s. ESRAM was design for Pitcairn level I/O.

tormentos

660Ti =144Gb/s

7950 = 240Gb/s..

Bandwidth mean nothing when you have no power,and the xbox one doesn't have it..

Trying to imply that the xbox one would need 203Gb/s is down right stupid it will not.

I didn't imply anything. My post includes information on CU's max compute processing memory bandwidth from L1 cache and L2 cache I/O.

For X1's GCN, L1 bandwidth has 64 byte per cycle per CU = 50.8425 GB/s x 12 CUs = 610.11 GB/s.

One should look for any bottlenecks and seek the reasons on why it doesn't use extra memory bandwidth.

I have shown you why 7770/7790 @ 800 Mhz would gimp any memory faster than ~96 GB/s. AMD would need to redesign 7770/7790's crossbar/memory controller/L2 cache stack for faster memory based on GDDR6. For the X1, it's easier to "copy-and-paste" from the Pitcairn's crossbar/L2 cache/memory controller designs.

7770/7790 has a massive crossbar/L2 cache I/O bottleneck that gimps higher memory bandwidth e.g. not ready for GDDR6.

Reference http://www.behardware.com/articles/848-4/amd-radeon-hd-7970-crossfirex-review-28nm-and-gcn.html


------------------------

NVIDIA Kelper is different fish when it comes to handling it's compute workloads.

Entry level GK104 (e.g 660 TI) is another class of GPU above AMD Pitcairn i.e. it needs AMD Tahiti LE to battle NV 660 Ti.


K20 vs 7970 vs GTX680 vs M2050 vs GTX580 from http://wili.cc/blog/gpgpu-faceoff.html

performance.png

K20 = GK110

There's very difference between NVIDIA's OpenCL and CUDA.

cudaopencl.png

App 1. Digital Hydraulics code is all about basic floating point arithmetics, both algebraic and transcendental. No dynamic branching, very little memory traffic.

App 2. Ambient Occlusion code is a very mixed load of floating point and integer arithmetics, dynamic branching, texture sampling and memory access. Despite the memory traffic, this is a very compute intensive kernel.

App 3. Running Sum code, in contrast to the above, is memory intensive. It shuffles data through at a high rate, not doing much calculations on it. It relies heavily on the on-chip L1 cache, though, so it's not a raw memory bandwidth test.

App 4. Geometry Sampling code is texture sampling intensive. It sweeps through geometry data in "waves", and stresses samplers, texture caches, and memory equally. It also has a high register usage and thus low occupancy.

Minimise App 2 type workloads, Kepler's would perform fine.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#407 ronvalencia
Member since 2008 • 29612 Posts

[QUOTE="ronvalencia"]

Actually, the 7770 has less usable power than the Xbox One since lesser GCNs with less internal SRAM storage has greater chances for compute overspill. 7770 also has less triangle rate than 7790 or 7850.

My 8870M takes a greater performance hit when I lower the memory speed to DDR3 levels i.e. AMD's GDDR5 bar graphs are applicable for these cases.

-----

7790 doesn't sport X1's memory bandwidth and the prototype 7850 (with 12 CUs) is a closer fit to the X1.

X1's GCN has Pitcairn's crossbar, 256bit memory controllers and related L2 cache. L2 cache bandwidth is important since this is used as a shared cache across 4 CUs.

7790/7790 has less I/O L2 cache bandwidth than the Pitcairn's version.

http://www.behardware.com/articles/848-4/amd-radeon-hd-7970-crossfirex-review-28nm-and-gcn.html

GCN Cache Hierarchy

With 128bit memory controllers, 7790/7770 has two L2 blocks. That's 64 bytes per cycle per L2 block x 2 = 128 bytes per cycle.

With 256bit memory controllers, 7850/7870 has four L2 blocks. That's 64 bytes per cycle per L2 block x 4 = 256 bytes per cycle.

tormentos

The broken song again,stop the damn 660Ti has 100Gb/s of bandwidth less than the 7950 and still out perform it in some test...:lol:

You have no argument,i demostrated how a GPU with 100Gb/s less bandwidth than another can still beat it,so the xbox one miserable 37gb/s over the 7790 should be even less of a problem.

The 7790 is very close to the 7850 even that it has 57Gb/s less in bandwidth,so yeah bandwidth is not the issues is power,so yeah the 7790 >> xbox one..

No matter how you slice it what you try to imply is idiotic,if you put 250Gb/s bandwidth on the xbox one would it beat the 7950.? Just because it has more bandwidth.?

You demonstrated a different GPU design.

For 7950, L1 bandwidth has 64 byte per cycle per CU @ 800Mhz = 47 GB/s x 28 CUs = 1.3 TB/s.

Your "250Gb/s bandwidth on the xbox one would it beat the 7950" is just plain stupidity. Stop adding words to my posts.

Again, you can't read a simple bar graph that shows a prototype 7850 with 768 stream processors.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#409 ronvalencia
Member since 2008 • 29612 Posts

[QUOTE="tormentos"]

[QUOTE="ronvalencia"]

7770/7790 @ 800Mhz, L2 cache I/O bandwidth: 128 bytes x 800Mhz = 95.367 GB/s External memory setup faster than 95.367 GB/s would be useless.

W5000/7850-768 @ 853hz, L2 cache I/O bandwidth: 256 bytes x 853Mhz = 203.37 GB/s. ESRAM was design for Pitcairn level I/O.

04dcarraher

660Ti =144Gb/s

7950 = 240Gb/s..

Bandwidth mean nothing when you have no power,and the xbox one doesn't have it..

Trying to imply that the xbox one would need 203Gb/s is down right stupid it will not.

Memory bandwidth means squat after a certain point and the difference between the X1 with DDR3+esram and PS4's GDDR5 will be negligible. The difference seen will be the gpu's processing power

The useless memory bandwidth increase is due to a bottleneck in the system.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#410 ronvalencia
Member since 2008 • 29612 Posts

[QUOTE="ronvalencia"]

1. PC is not limited to 1080p.

2. It's minor.

When it's possible, I play games at 5760x1080p on my 7950 (with 900mhz firmware) @ 1000Mhz which is higher than the reference 7950 @ 800Mhz.

bf3_5760_1080.gif

tormentos

Stop it you blind biased fanboy that game is unplayable on both the 7950 and 660Ti at that resolution,so 2560x1600 is actually more playable and the difference is not even 2 frames,but nice to miss the point and change the argument..

The 660Ti has 144Gb/s bandwith is even lower in Bandwidth than the 7850 yet it perform as good or better in some test vs the 7950 which has 240Gb/s bandwidth,so yeah my was proven the 7790 >>> xbox one..

Nice to miss the point and changed the argument by using a different GPU design. Your not even factoring the GK104's cache designs.

My 7950 is NOT running at 800Mhz; it's running at 1Ghz (from 900Mhz "out-of-the box" 7950).

http://www.hardocp.com/article/2013/02/01/xfx_radeon_hd_7950_black_edition_video_card_review/8#.UhHM3dB--po

the stock XFX Radeon HD 7950 with the factory overclock stood its ground, and provided nearly identical performance to the Radeon HD 7970. The overclock we achieved pushed performance beyond the Radeon HD 7970.

XFX Radeon HD 7950 Black Edition is "out-of-the-box" 900Mhz version.


I have already shown you what a flagship PC GPU design (i.e. 7950 @ 950Mhz) with GDDR5-2400 + 1080p Tomb Raider.

With 660 Ti, you're missing the fact that the entry level GK104 is a class above Pitcairn Pro and it's clocked higher/higher GFLOPS than 7850.

Avatar image for btk2k2
btk2k2

440

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#411 btk2k2
Member since 2003 • 440 Posts
[QUOTE="22Toothpicks"][QUOTE="AMD655"]

Do Tormentos and Ron not realise they have gone in circles 17 times throughout this thread?

ronvalencia
This literally happens in every PS4 v. X1 spec thread they post in. On a side note I am convinced that Ron is a robot of some form.

Tormentos didn't do computer science 101.

Neither did you.
Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#412 ronvalencia
Member since 2008 • 29612 Posts

[QUOTE="ronvalencia"][QUOTE="22Toothpicks"] This literally happens in every PS4 v. X1 spec thread they post in. On a side note I am convinced that Ron is a robot of some form.btk2k2
Tormentos didn't do computer science 101.

Neither did you.

Neither did you.

Your formulas didn't work for 8 ROPS and your multi-layer counter statement was a joke.

Avatar image for deactivated-58e448fd89d82
deactivated-58e448fd89d82

4494

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#413 deactivated-58e448fd89d82
Member since 2010 • 4494 Posts

Formula all you want, you are pig ignorant to basic GPU understanding Ron, let alone anything more advanced. 

Avatar image for btk2k2
btk2k2

440

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#414 btk2k2
Member since 2003 • 440 Posts
You demonstrated a different GPU design.

For 7950, L1 bandwidth has 64 byte per cycle per CU @ 800Mhz = 47 GB/s x 28 CUs = 1.3 TB/s.

Your "250Gb/s bandwidth on the xbox one would it beat the 7950" is just plain stupidity. Stop adding words to my posts.

Again, you can't read a simple bar graph that shows a prototype 7850 with 768 stream processors.

ronvalencia
The problem with the '7830' prototype is that it has more ROPs than the X1 and it has considerably more usable bandwidth than the X1. Besides the 7790 is pretty close to the '7830' prototype so it seems like a pretty good card to use as a rough guide. It is obvious that the PS4 will outperform the X1. The DF article that only took into account shader performance showed a 20-30% performance gap. Include other factors and that can easily be up to a 45% performance advantage for the PS4. I know you keep spouting 133 GB/s for alpha blend on the ESRAM but there is no way that it will be in constant use because the data can be used faster than it can be refilled. The GPU will at times have to access the DDR3 for data which will incur a huge performance penalty. This is why using the 7790 is not that bad because even though its bandwidth is lower than the X1 in optimal conditions it is also higher than the X1 bandwidth in worst case conditions meaning that on average it is going to be pretty close to the mark. In the end we will see it in the games, where the games push the X1 the PS4 will have smoother and / or prettier games. There will be some games that are more artistic that look and play the same on both consoles, as well as the indie games that are not graphically intensive but those that push the consoles will be better on the PS4.
Avatar image for btk2k2
btk2k2

440

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#415 btk2k2
Member since 2003 • 440 Posts

[QUOTE="btk2k2"][QUOTE="ronvalencia"] Tormentos didn't do computer science 101.ronvalencia

Neither did you.

Neither did you.

Your formulas didn't work for 8 ROPS and your multi-layer counter statement was a joke.

Newtonian physics does not work at the quantum scale, does that mean we should never use it to describe the world around us at a scale we know it works at? If you can give me the vantage pixel fill rate for an 8 ROP GCN card and we can test the scaling. By my scaling the 8790M @ 850 Mhz with 72GB/s of memory bandwidth should be able to hit 3.14 GPixels/s in the vantage pixel fill test. Since I cannot find a result for this card it will have to remain unconfirmed for the time being but if a result does surface we can compare it and see how close I am. My formulas worked to within a very small margin of error for 16 ROPs and 32 ROPs with clock speeds that varied from 800Mhz up to 1050Mhz. That covers the range of values in the X1 and the PS4 so even if the 8 ROP scaling is a little bit off it does not matter because nothing I am comparing has a GCN GPU with 8 ROPs. I think you need to learn about a thing called 'scope'. It is this thing where a certain method can work within a set range of values so as long as you are within the range of values that the method applies to then you can use it with reasonable accuracy. Velocity is a great example of this, on earth we can use w = V1 + V2 to get the total velocity of two objects approaching each other. However at relativistic speeds we need to use w = (V1 + V2) / (1 + V1 * V2 / c2). Does that mean we need to always use the more complicated but more accurate method? No of course not because in every day life the first equation fits within the scope where its accuracy is very good.
Avatar image for tormentos
tormentos

33793

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#416 tormentos
Member since 2003 • 33793 Posts

Memory bandwidth means squat after a certain point and the difference between the X1 with DDR3+esram and PS4's GDDR5 will be negligible. The difference seen will be the gpu's processing power04dcarraher

 

At least we agree for once now tell that to Ron he refuse to admit it,even that i show him how the 660Ti performed side by side with the 7950 hell even surpass it while having almost 100GB/s less bandwidth.

Avatar image for tormentos
tormentos

33793

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#417 tormentos
Member since 2003 • 33793 Posts

 

I didn't imply anything. My post includes information on CU's max compute processing memory bandwidth from L1 cache and L2 cache I/O.

 

For X1's GCN, L1 bandwidth has 64 byte per cycle per CU = 50.8425 GB/s x 12 CUs = 610.11 GB/s.

 

 

One should look for any bottlenecks and seek the reasons on why it doesn't use extra memory bandwidth.

 

I have shown you why 7770/7790 @ 800 Mhz would gimp any memory faster than ~96 GB/s. AMD would need to redesign 7770/7790's crossbar/memory controller/L2 cache stack for faster memory based on GDDR6. For the X1, it's easier to "copy-and-paste" from the Pitcairn's crossbar/L2 cache/memory controller designs.

 

7770/7790 has a massive crossbar/L2 cache I/O bottleneck that gimps higher memory bandwidth e.g. not ready for GDDR6.

 

Reference http://www.behardware.com/articles/848-4/amd-radeon-hd-7970-crossfirex-review-28nm-and-gcn.html


------------------------

 

NVIDIA Kelper is different fish when it comes to handling it's compute workloads.

Entry level GK104 (e.g 660 TI) is another class of GPU above AMD Pitcairn i.e. it needs AMD Tahiti LE to battle NV 660 Ti.


K20 vs 7970 vs GTX680 vs M2050 vs GTX580 from http://wili.cc/blog/gpgpu-faceoff.html

 

K20 = GK110

 

There's very difference between NVIDIA's OpenCL and CUDA.

 

App 1. Digital Hydraulics code is all about basic floating point arithmetics, both algebraic and transcendental. No dynamic branching, very little memory traffic.

App 2. Ambient Occlusion code is a very mixed load of floating point and integer arithmetics, dynamic branching, texture sampling and memory access. Despite the memory traffic, this is a very compute intensive kernel.

App 3. Running Sum code, in contrast to the above, is memory intensive. It shuffles data through at a high rate, not doing much calculations on it. It relies heavily on the on-chip L1 cache, though, so it's not a raw memory bandwidth test.

App 4. Geometry Sampling code is texture sampling intensive. It sweeps through geometry data in "waves", and stresses samplers, texture caches, and memory equally. It also has a high register usage and thus low occupancy.

 

 

Minimise App 2 type workloads, Kepler's would perform fine.

ronvalencia

 

W5000/7850-768 @ 853hz, L2 cache I/O bandwidth: 256 bytes x 853Mhz = 203.37 GB/s. ESRAM was design for Pitcairn level I/O.

 

Yeah you did..

In fact a W5000 768 @853 mhz = and over clock GPU.

The speed of the W5000 is 825mhz to 853mhz would mean is slightly over clocked,so how many over clocked GPU have you see on consoles.?

ESRAM wasn't design for Pitcairn ESRAM is a damn band aid use by MS to help those starved components because MS chose DDR3,if the xbox one used GDDR5 even 102Gbs the xbox now would not even had ESRAM.

 

So for you a gimped 7790 will perform better than a 7950 if you give it more bandiwdth you are a joke,i proved my point 100GB/s less yet the performance is the same.

 

Avatar image for tormentos
tormentos

33793

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#418 tormentos
Member since 2003 • 33793 Posts

 

Nice to miss the point and changed the argument by using a different GPU design. Your not even factoring the GK104's cache designs.

 

 

My 7950 is NOT running at 800Mhz; it's running at 1Ghz (from 900Mhz "out-of-the box" 7950).

http://www.hardocp.com/article/2013/02/01/xfx_radeon_hd_7950_black_edition_video_card_review/8#.UhHM3dB--po

the stock XFX Radeon HD 7950 with the factory overclock stood its ground, and provided nearly identical performance to the Radeon HD 7970. The overclock we achieved pushed performance beyond the Radeon HD 7970.

 

XFX Radeon HD 7950 Black Edition is "out-of-the-box" 900Mhz version.

 

I have already shown you what a flagship PC GPU design (i.e. 7950 @ 950Mhz) with GDDR5-2400 + 1080p Tomb Raider.

 

With 660 Ti, you're missing the fact that the entry level GK104 is a class above Pitcairn Pro and it's clocked higher/higher GFLOPS than 7850.

 

ronvalencia

You are a joke of a poster where the fu** did i say that your 7950 was stock.? I compare the stock 7950 vs the stock 660Ti for a fair comparison,and the excuse you use against the 660Ti is a joke because the 7950 is not a Pitcairn pro is Tahiti,so yeah even with 100GB/s bandwidth the 660Ti beat the 7950 i could care less about your particular 7950 since i used the first model that came and not the one that suited you best.

 

My point was to preve Bandwidth mean sh** not that the 7950 is sh**.

:lol:

 

How hypocrite you are you accuse me of using different hardware for a comparison,when up until today without a single link to back you up you claim the xbox one has a W5000 no wait it has an over clocked W5000 since we all know the GPU clock on the W5000 is not 1ghz is 825mhz,so yeah you want to imply that the xbox one has an over clocked Pitcairn..:lol:

You loss the arguments be a fu**ing man and grow up admit that you loss.

 

Avatar image for marklarmer
marklarmer

3883

Forum Posts

0

Wiki Points

0

Followers

Reviews: 3

User Lists: 0

#419 marklarmer
Member since 2004 • 3883 Posts

pancakes500.jpg.

Avatar image for deactivated-58e448fd89d82
deactivated-58e448fd89d82

4494

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#420 deactivated-58e448fd89d82
Member since 2010 • 4494 Posts

pancakes500.jpg.

marklarmer

 

 

Dolphins-Sunset-Photos.jpg

Avatar image for Mystery_Writer
Mystery_Writer

8351

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#421 Mystery_Writer
Member since 2004 • 8351 Posts

If only Microsoft made their GPU at least 16 CUs instead of 12.

How much would've been the cost to Microsoft if they added 4 more CUs to their design?

also, Ronvalencia,

That last video you posted about physix contributed to convincing myself of taking the plunge and ordering nVidia Titan from amazon (couldn't wait for the 8xxx series as it's taking AMD forever to release it).

Athough my current HD 6990 runs pretty much everything maxed out with superb framerates, I wanted a signle GPU solution as my HD 6990 is having issues with the Oculus Dev Kit (inability to clone monitors in full screen mode due to the dual GPU nature of the card) which is forcing audio to go through only the active screen (i.e. I won't have audio if I play a game through the Oculus).

Avatar image for ManatuBeard
ManatuBeard

1121

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#422 ManatuBeard
Member since 2012 • 1121 Posts

If only Microsoft made their GPU at least 16 CUs instead of 12.

How much would've been the cost to Microsoft if they added 4 more CUs to their design?

also, Ronvalencia,

That last video you posted about physix contributed to convincing myself of taking the plunge and ordering nVidia Titan from amazon (couldn't wait for the 8xxx series as it's taking AMD forever to release it).

Athough my current HD 6990 runs pretty much everything maxed out with superb framerates. But I just wanted a signle GPU solution as my HD 6990 is having issues with the Oculus Dev Kit (innability to clone monitors in full screen mode due to the dual GPU nature of the card) which is forcing audio to go through only the active screen (i.e. I won't have audio if I play a game through the Oculus).

Mystery_Writer

I think the GPU on the X1 has less CUs mostly because of die space. The eSRAM takes 1/4 of the chip size, so they had to cut the number of CUs down to 12.

Avatar image for xhawk27
xhawk27

12194

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#423 xhawk27
Member since 2010 • 12194 Posts

If only Microsoft made their GPU at least 16 CUs instead of 12.

How much would've been the cost to Microsoft if they added 4 more CUs to their design?

also, Ronvalencia,

That last video you posted about physix contributed to convincing myself of taking the plunge and ordering nVidia Titan from amazon (couldn't wait for the 8xxx series as it's taking AMD forever to release it).

Athough my current HD 6990 runs pretty much everything maxed out with superb framerates. But I just wanted a signle GPU solution as my HD 6990 is having issues with the Oculus Dev Kit (innability to clone monitors in full screen mode due to the dual GPU nature of the card) which is forcing audio to go through only the active screen (i.e. I won't have audio if I play a game through the Oculus).

Mystery_Writer

If you care about CUs units so much you would just buy a PC. 

Avatar image for 04dcarraher
04dcarraher

23857

Forum Posts

0

Wiki Points

0

Followers

Reviews: 2

User Lists: 0

#425 04dcarraher
Member since 2004 • 23857 Posts

If only Microsoft made their GPU at least 16 CUs instead of 12.

How much would've been the cost to Microsoft if they added 4 more CUs to their design?

also, Ronvalencia,

That last video you posted about physix contributed to convincing myself of taking the plunge and ordering nVidia Titan from amazon (couldn't wait for the 8xxx series as it's taking AMD forever to release it).

Athough my current HD 6990 runs pretty much everything maxed out with superb framerates. But I just wanted a signle GPU solution as my HD 6990 is having issues with the Oculus Dev Kit (innability to clone monitors in full screen mode due to the dual GPU nature of the card) which is forcing audio to go through only the active screen (i.e. I won't have audio if I play a game through the Oculus).

Mystery_Writer
Its not really the cost for more stream processors its the size of the die along with the constrains in the thermal output of the APU. Part of the problem is that MS wants to avoid another RRoD scandal, and the the decision to go with DDR3 instead of GDDR5 required MS to include a large buffer to offset the bandwidth for the system bus. You should have gotten a GTX 780 much better price to performance ratio.
Avatar image for Mystery_Writer
Mystery_Writer

8351

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#426 Mystery_Writer
Member since 2004 • 8351 Posts

Its not really the cost for more stream processors its the size of the die along with the constrains in the thermal output of the APU. Part of the problem is that MS wants to avoid another RRoD scandal, and the the decision to go with DDR3 instead of GDDR5 required MS to include a large buffer to offset the bandwidth for the system bus. You should have gotten a GTX 780 much better price to performance ratio.

04dcarraher

Actually I wanted to get the GTX 780, but didn't know if it'll be a noticable upgrade from my 2.5 year old HD 6990.

Somehow Titan, apparently based on older Geforce 600 technology, is outperforming the new 700 series single GPU cards (according to anandtech).

I'm puzzled as to why nVidia didn't release a 700 series with the same CUDA cores count of Titan.

Do you think Titan was a good choice? (aside from the price drawback)

Edit: Also, that's a really interesting take regarding why MS didn't increase the CU count. I thought it was purely due to trying to keep the design within certain cost bracket and Kinect upset that bracket so they skimped on CUs to compensate.

Edit 2: Is Titan considered a Geforce 600 or 700 series GPU?

Avatar image for 04dcarraher
04dcarraher

23857

Forum Posts

0

Wiki Points

0

Followers

Reviews: 2

User Lists: 0

#427 04dcarraher
Member since 2004 • 23857 Posts

[QUOTE="04dcarraher"]

Its not really the cost for more stream processors its the size of the die along with the constrains in the thermal output of the APU. Part of the problem is that MS wants to avoid another RRoD scandal, and the the decision to go with DDR3 instead of GDDR5 required MS to include a large buffer to offset the bandwidth for the system bus. You should have gotten a GTX 780 much better price to performance ratio.

Mystery_Writer

Actually I wanted to get the GTX 780, but didn't know if it'll be a noticable upgrade from my 2.5 year old HD 6990.

Somehow Titan, apparently based on older Geforce 600 technology, is outperforming the new 700 series single GPU cards (according to anandtech).

I'm puzzled as to why nVidia didn't release a 700 series with the same CUDA cores count of Titan.

Do you think Titan was a good choice? (aside from the price drawback)

Edit: Also, that's a really interesting take regarding why MS didn't increase the CU count. I thought it was purely due to trying to keep the design within certain cost bracket and Kinect upset that bracket so they skimped on CUs to compensate.

Here is performance summary for GTX 780

perfrel_2560.gif

Avatar image for Mystery_Writer
Mystery_Writer

8351

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#428 Mystery_Writer
Member since 2004 • 8351 Posts

Here is performance summary for GTX 780

perfrel_2560.gif

04dcarraher

You're 100% right. Here is what Anand had to say about the GTX 780

The end result is that with the GTX 780 delivering an average of 90% of Titans gaming performance for 65% of the price, this is by all rights the Titan Mini, the cheaper video card Titan customers have been asking for.

From that perspective the GTX 780 is nothing short of an amazing deal for the level of performance offered, especially since it maintains the high build quality and impressive acoustics that helped to define Titan.

anandtech

Although I don't know how much difference this will make, but the Titan I got is supposedly an overclocked edition link

btw, do you know by any chance the overall performance difference between HD 6990 and Titan?

Avatar image for 04dcarraher
04dcarraher

23857

Forum Posts

0

Wiki Points

0

Followers

Reviews: 2

User Lists: 0

#429 04dcarraher
Member since 2004 • 23857 Posts

[QUOTE="04dcarraher"]

Here is performance summary for GTX 780

perfrel_2560.gif

Mystery_Writer

You're 100% right. Here is what Anand had to say about the GTX 780

The end result is that with the GTX 780 delivering an average of 90% of Titans gaming performance for 65% of the price, this is by all rights the Titan Mini, the cheaper video card Titan customers have been asking for.

From that perspective the GTX 780 is nothing short of an amazing deal for the level of performance offered, especially since it maintains the high build quality and impressive acoustics that helped to define Titan.

anandtech

Although I don't know how much difference this will make, but the Titan I got is supposedly an overclocked edition link

btw, do you know by any chance the overall performance difference between HD 6990 and Titan?

From what I can tell if the game/s take full advantage of the 6990, its about 35% slower then Titan.
Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#430 ronvalencia
Member since 2008 • 29612 Posts

[QUOTE="04dcarraher"]

Its not really the cost for more stream processors its the size of the die along with the constrains in the thermal output of the APU. Part of the problem is that MS wants to avoid another RRoD scandal, and the the decision to go with DDR3 instead of GDDR5 required MS to include a large buffer to offset the bandwidth for the system bus. You should have gotten a GTX 780 much better price to performance ratio.

Mystery_Writer

Actually I wanted to get the GTX 780, but didn't know if it'll be a noticable upgrade from my 2.5 year old HD 6990.

Somehow Titan, apparently based on older Geforce 600 technology, is outperforming the new 700 series single GPU cards (according to anandtech).

I'm puzzled as to why nVidia didn't release a 700 series with the same CUDA cores count of Titan.

Do you think Titan was a good choice? (aside from the price drawback)

Edit: Also, that's a really interesting take regarding why MS didn't increase the CU count. I thought it was purely due to trying to keep the design within certain cost bracket and Kinect upset that bracket so they skimped on CUs to compensate.

Edit 2: Is Titan considered a Geforce 600 or 700 series GPU?

Both Titan and 780 is based on existing NVIDIA's GK110 design. 780 has gimped double precision floating point performance and has slightly less CUDA cores. I'm waiting for Radeon HD 9950 or 9970 for late Sep 2013 to Nov 2013.
Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#431 ronvalencia
Member since 2008 • 29612 Posts

[QUOTE="Mystery_Writer"]

If only Microsoft made their GPU at least 16 CUs instead of 12.

How much would've been the cost to Microsoft if they added 4 more CUs to their design?

also, Ronvalencia,

That last video you posted about physix contributed to convincing myself of taking the plunge and ordering nVidia Titan from amazon (couldn't wait for the 8xxx series as it's taking AMD forever to release it).

Athough my current HD 6990 runs pretty much everything maxed out with superb framerates. But I just wanted a signle GPU solution as my HD 6990 is having issues with the Oculus Dev Kit (innability to clone monitors in full screen mode due to the dual GPU nature of the card) which is forcing audio to go through only the active screen (i.e. I won't have audio if I play a game through the Oculus).

ManatuBeard

I think the GPU on the X1 has less CUs mostly because of die space. The eSRAM takes 1/4 of the chip size, so they had to cut the number of CUs down to 12.

ESRAM consumed ~32 precent from the ~5 billion transistor budget i.e. 1.6 billion for T6 SRAM.
Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#432 ronvalencia
Member since 2008 • 29612 Posts

[QUOTE="ronvalencia"]

I didn't imply anything. My post includes information on CU's max compute processing memory bandwidth from L1 cache and L2 cache I/O.

For X1's GCN, L1 bandwidth has 64 byte per cycle per CU = 50.8425 GB/s x 12 CUs = 610.11 GB/s.

One should look for any bottlenecks and seek the reasons on why it doesn't use extra memory bandwidth.

I have shown you why 7770/7790 @ 800 Mhz would gimp any memory faster than ~96 GB/s. AMD would need to redesign 7770/7790's crossbar/memory controller/L2 cache stack for faster memory based on GDDR6. For the X1, it's easier to "copy-and-paste" from the Pitcairn's crossbar/L2 cache/memory controller designs.

7770/7790 has a massive crossbar/L2 cache I/O bottleneck that gimps higher memory bandwidth e.g. not ready for GDDR6.

Reference http://www.behardware.com/articles/848-4/amd-radeon-hd-7970-crossfirex-review-28nm-and-gcn.html

------------------------

NVIDIA">http://www.behardware.com/articles/848-4/amd-radeon-hd-7970-crossfirex-review-28nm-and-gcn.html">http://www.behardware.com/articles/848-4/amd-radeon-hd-7970-crossfirex-review-28nm-and-gcn.html


------------------------

NVIDIA Kelper is different fish when it comes to handling it's compute workloads.

Entry level GK104 (e.g 660 TI) is another class of GPU above AMD Pitcairn i.e. it needs AMD Tahiti LE to battle NV 660 Ti.


K20 vs 7970 vs GTX680 vs M2050 vs GTX580 from http://wili.cc/blog/gpgpu-faceoff.html

K20">http://wili.cc/blog/gpgpu-faceoff.html">http://wili.cc/blog/gpgpu-faceoff.html

K20 = GK110

There's very difference between NVIDIA's OpenCL and CUDA.

App 1. Digital Hydraulics code is all about basic floating point arithmetics, both algebraic and transcendental. No dynamic branching, very little memory traffic.

App 2. Ambient Occlusion code is a very mixed load of floating point and integer arithmetics, dynamic branching, texture sampling and memory access. Despite the memory traffic, this is a very compute intensive kernel.

App 3. Running Sum code, in contrast to the above, is memory intensive. It shuffles data through at a high rate, not doing much calculations on it. It relies heavily on the on-chip L1 cache, though, so it's not a raw memory bandwidth test.

App 4. Geometry Sampling code is texture sampling intensive. It sweeps through geometry data in "waves", and stresses samplers, texture caches, and memory equally. It also has a high register usage and thus low occupancy.

Minimise App 2 type workloads, Kepler's would perform fine.

tormentos

W5000/7850-768 @ 853hz, L2 cache I/O bandwidth: 256 bytes x 853Mhz = 203.37 GB/s. ESRAM was design for Pitcairn level I/O.

Yeah you did..

In fact a W5000 768 @853 mhz = and over clock GPU.

The speed of the W5000 is 825mhz to 853mhz would mean is slightly over clocked,so how many over clocked GPU have you see on consoles.?

ESRAM wasn't design for Pitcairn ESRAM is a damn band aid use by MS to help those starved components because MS chose DDR3,if the xbox one used GDDR5 even 102Gbs the xbox now would not even had ESRAM.

So for you a gimped 7790 will perform better than a 7950 if you give it more bandiwdth you are a joke,i proved my point 100GB/s less yet the performance is the same.

I didn't imply anything.

7950 @ 800Mhz, L2 cache I/O bandwidth: L2 cache I/O bandwidth: 364 bytes per cycle x 800 Mhz = 300 GB/s.

7950 @ 950Mhz (fastest out-of-the-box edition), L2 cache I/O bandwidth: L2 cache I/O bandwidth: 364 bytes per cycle x 950 Mhz = ~337.7GB/s.

X1's L2 cache I/O bandwidth is not even close to 7950 @800Mhz. Your wrong again.

AMD.com's W5000 @ 1.3 TFLOPS would need about 853Mhz clock speed. There are W5000s with slightly clock speeds. AMD doesn't ultimately dictate the final GPU clock speed i.e. that's AIB (Add In Board)'s or OEM's decisions.

10Mhz overclock is minor i.e. I overclocked my 7950-900mhz to 950Mhz without any voltage increase. MS overclocked X1's GPU by 53 Mhz. Your issue on this subject is just noise i.e. it's a "who cares" episode.

-------

X1 has GCN's 256bit memory controllers and related L2 cache I/O bandwidth i.e. 203.37 GB/s which can cover 133 GB/s (via ESRAM) alpha blend was the effective bandwidth for the prototype X1.

A 7770 or 7790's L2 cache I/O bandwidth would not cover 133 GB/s (via ESRAM) alpha blend. This is why the prototype-7850 with 768 stream processors @ 860Mhz is a closer fit to the X1's GCN.

Giving 7790 more bandwidth is nearly pointless since it's 128byte per cycle L2 cache I/O would gimp it. Each memory controller, has 64byte per cycle L2 cache bandwidth. The joke is you since you haven't identified the bottleneck in the 7790.

Also, 7950 has substantially larger internal SRAM storage than 7790.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#433 ronvalencia
Member since 2008 • 29612 Posts

If only Microsoft made their GPU at least 16 CUs instead of 12.

How much would've been the cost to Microsoft if they added 4 more CUs to their design?

also, Ronvalencia,

That last video you posted about physix contributed to convincing myself of taking the plunge and ordering nVidia Titan from amazon (couldn't wait for the 8xxx series as it's taking AMD forever to release it).

Athough my current HD 6990 runs pretty much everything maxed out with superb framerates, I wanted a signle GPU solution as my HD 6990 is having issues with the Oculus Dev Kit (inability to clone monitors in full screen mode due to the dual GPU nature of the card) which is forcing audio to go through only the active screen (i.e. I won't have audio if I play a game through the Oculus).

Mystery_Writer
The posted PS4's physics was running via Havoc GpGPU physics.
Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#434 ronvalencia
Member since 2008 • 29612 Posts

[QUOTE="04dcarraher"] Memory bandwidth means squat after a certain point and the difference between the X1 with DDR3+esram and PS4's GDDR5 will be negligible. The difference seen will be the gpu's processing powertormentos

At least we agree for once now tell that to Ron he refuse to admit it,even that i show him how the 660Ti performed side by side with the 7950 hell even surpass it while having almost 100GB/s less bandwidth.

Not for Eyefinity level resolutions when 240 GB/s (theoretical) memory bandwidth is actually count for something. You haven't applied the principle on why server CPUs has larger caches.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#435 ronvalencia
Member since 2008 • 29612 Posts
[QUOTE="ronvalencia"]You demonstrated a different GPU design.

For 7950, L1 bandwidth has 64 byte per cycle per CU @ 800Mhz = 47 GB/s x 28 CUs = 1.3 TB/s.

Your "250Gb/s bandwidth on the xbox one would it beat the 7950" is just plain stupidity. Stop adding words to my posts.

Again, you can't read a simple bar graph that shows a prototype 7850 with 768 stream processors.

btk2k2
The problem with the '7830' prototype is that it has more ROPs than the X1 and it has considerably more usable bandwidth than the X1. Besides the 7790 is pretty close to the '7830' prototype so it seems like a pretty good card to use as a rough guide. It is obvious that the PS4 will outperform the X1. The DF article that only took into account shader performance showed a 20-30% performance gap. Include other factors and that can easily be up to a 45% performance advantage for the PS4. I know you keep spouting 133 GB/s for alpha blend on the ESRAM but there is no way that it will be in constant use because the data can be used faster than it can be refilled. The GPU will at times have to access the DDR3 for data which will incur a huge performance penalty. This is why using the 7790 is not that bad because even though its bandwidth is lower than the X1 in optimal conditions it is also higher than the X1 bandwidth in worst case conditions meaning that on average it is going to be pretty close to the mark. In the end we will see it in the games, where the games push the X1 the PS4 will have smoother and / or prettier games. There will be some games that are more artistic that look and play the same on both consoles, as well as the indie games that are not graphically intensive but those that push the consoles will be better on the PS4.

For 128bit color, both 7850 and X1 has about 8 GPixel/s and most games are not color ROPS limited. As for 32MB storage size issue, Intel has stated that the 32 MB size cache has about 95 precent hit rate for current PC gaming workloads.
Avatar image for btk2k2
btk2k2

440

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#436 btk2k2
Member since 2003 • 440 Posts
[QUOTE="btk2k2"][QUOTE="ronvalencia"]You demonstrated a different GPU design.

For 7950, L1 bandwidth has 64 byte per cycle per CU @ 800Mhz = 47 GB/s x 28 CUs = 1.3 TB/s.

Your "250Gb/s bandwidth on the xbox one would it beat the 7950" is just plain stupidity. Stop adding words to my posts.

Again, you can't read a simple bar graph that shows a prototype 7850 with 768 stream processors.

ronvalencia
The problem with the '7830' prototype is that it has more ROPs than the X1 and it has considerably more usable bandwidth than the X1. Besides the 7790 is pretty close to the '7830' prototype so it seems like a pretty good card to use as a rough guide. It is obvious that the PS4 will outperform the X1. The DF article that only took into account shader performance showed a 20-30% performance gap. Include other factors and that can easily be up to a 45% performance advantage for the PS4. I know you keep spouting 133 GB/s for alpha blend on the ESRAM but there is no way that it will be in constant use because the data can be used faster than it can be refilled. The GPU will at times have to access the DDR3 for data which will incur a huge performance penalty. This is why using the 7790 is not that bad because even though its bandwidth is lower than the X1 in optimal conditions it is also higher than the X1 bandwidth in worst case conditions meaning that on average it is going to be pretty close to the mark. In the end we will see it in the games, where the games push the X1 the PS4 will have smoother and / or prettier games. There will be some games that are more artistic that look and play the same on both consoles, as well as the indie games that are not graphically intensive but those that push the consoles will be better on the PS4.

For 128bit color, both 7850 and X1 has about 8 GPixel/s and most games are not color ROPS limited. As for 32MB storage size issue, Intel has stated that the 32 MB size cache has about 95 precent hit rate for current PC gaming workloads.

1) 7790 and the '7830' prototype are pretty similar in performance and we know that the 7790 is faster than the X1 in general so as I stated the X1 performance falls between the 7770 and the 7790. PS4 performance will fall between the 7850 and the 7870Ghz. The gap will be between 35% and 45%. 2) I was not concerned with the size, at 32MB it is fine for a 1080p scene. The issue is that the bus that feeds the ESRAM is slower than the bus between the ESRAM and the GPU. It will mean there are instances where the GPU requires data that is not in the ESRAM because the system did not have the time to transfer it from the main pool into the ESRAM. 3) Is the ESRAM a cache or a managed memory pool?
Avatar image for tormentos
tormentos

33793

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#437 tormentos
Member since 2003 • 33793 Posts

 

Not for Eyefinity level resolutions when 240 GB/s (theoretical) memory bandwidth is actually count for something. You haven't applied the principle on why server CPUs has larger caches.

 

ronvalencia

 

Not for a resolution that basically killed both card because the test you showed killed both cards the performance was under 30 FPS on PC that is horrible since PC has higher variable frame rate than console.

 Don't come with the whole theorical crap the 660Ti 144Gb/s is also theorical all peak on everything is theorical,so save your lame excuses.

Avatar image for Mystery_Writer
Mystery_Writer

8351

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#438 Mystery_Writer
Member since 2004 • 8351 Posts

[QUOTE="Mystery_Writer"]

If only Microsoft made their GPU at least 16 CUs instead of 12.

How much would've been the cost to Microsoft if they added 4 more CUs to their design?

also, Ronvalencia,

That last video you posted about physix contributed to convincing myself of taking the plunge and ordering nVidia Titan from amazon (couldn't wait for the 8xxx series as it's taking AMD forever to release it).

Athough my current HD 6990 runs pretty much everything maxed out with superb framerates. But I just wanted a signle GPU solution as my HD 6990 is having issues with the Oculus Dev Kit (innability to clone monitors in full screen mode due to the dual GPU nature of the card) which is forcing audio to go through only the active screen (i.e. I won't have audio if I play a game through the Oculus).

xhawk27

If you care about CUs units so much you would just buy a PC. 

tbh, I wouldn't buy consoles if it weren't for the exclusives

Avatar image for Mystery_Writer
Mystery_Writer

8351

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#439 Mystery_Writer
Member since 2004 • 8351 Posts

[QUOTE="ManatuBeard"]

[QUOTE="Mystery_Writer"]

If only Microsoft made their GPU at least 16 CUs instead of 12.

How much would've been the cost to Microsoft had they added 4 more CUs to their design?

ronvalencia

I think the GPU on the X1 has less CUs mostly because of die space. The eSRAM takes 1/4 of the chip size, so they had to cut the number of CUs down to 12.

ESRAM consumed ~32 precent from the ~5 billion transistor budget i.e. 1.6 billion for T6 SRAM.

Ron, if you were to design the X1 APU, which solution would you pick as the more elegant architecture;

a) 12 CUs + ESRAM + DDR3

b) 18 CUs + GDDR5

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#440 ronvalencia
Member since 2008 • 29612 Posts

[QUOTE="ronvalencia"]

Not for Eyefinity level resolutions when 240 GB/s (theoretical) memory bandwidth is actually count for something. You haven't applied the principle on why server CPUs has larger caches.

tormentos

Not for a resolution that basically killed both card because the test you showed killed both cards the performance was under 30 FPS on PC that is horrible since PC has higher variable frame rate than console.

Don't come with the whole theorical crap the 660Ti 144Gb/s is also theorical all peak on everything is theorical,so save your lame excuses.

660 Ti doesn't win on AMD bias games. You not factoring ALU behaviour differences.

----

660Ti has a larger 64KB L1 cache per SMX unit that covers 192 stream processors. There are 7 SMX units = 448 KB L1 cache @ 915Mhz.

7850 has 16 CUs x 16 KB L1 cache = 256 KB L1 cache @ 860Mhz.

7950@ 800 Mhz: 28 CU x 16 KB L1 = 448 KB @ 800 Mhz.

When games are not bound by GK014's register file limits, GK104 can match Tahiti.

Avatar image for tormentos
tormentos

33793

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#441 tormentos
Member since 2003 • 33793 Posts

 

660 Ti doesn't win on AMD bias games. You not factoring ALU behaviour differences.

----

660Ti has a larger 64KB L1 cache per SMX unit that covers 192 stream processors. There are 7 SMX units = 448 KB L1 cache @ 915Mhz.

7850 has 16 CUs x 16 KB L1 cache = 256 KB L1 cache @ 860Mhz.

 

7950@ 800 Mhz: 28 CU x 16 KB L1 = 448 KB @ 800 Mhz.

 

When games are not bound by GK014's register file limits, GK104 can match Tahiti.ronvalencia

 

But on other games it does,with 100GB/s less there is not a single excuse you can use to justify it,bandwidth mean little when your GPU is weak i proved that.

Avatar image for tormentos
tormentos

33793

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#442 tormentos
Member since 2003 • 33793 Posts

 ESRAM consumed ~32 precent from the ~5 billion transistor budget i.e. 1.6 billion for T6 SRAM. Mystery_Writer

Ron, if you were to design the X1 APU, which solution would you pick as the more elegant architecture;

a) 12 CUs + ESRAM + DDR3

b) 18 CUs + GDDR5

 

Oh he saw your post and ignores it because everything he knows about PC tell him that.

18CU +GDDR5 >>> 12CU +ESRAM+ DDR3.

Avatar image for drakekratos
drakekratos

2311

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#443 drakekratos
Member since 2011 • 2311 Posts
Shadow of the Beast is PS4 exclusive too
Avatar image for 04dcarraher
04dcarraher

23857

Forum Posts

0

Wiki Points

0

Followers

Reviews: 2

User Lists: 0

#444 04dcarraher
Member since 2004 • 23857 Posts
Shadow of the Beast is PS4 exclusive toodrakekratos
should fix your sig its not 50% only 30% more powerful.
Avatar image for mitu123
mitu123

155290

Forum Posts

0

Wiki Points

0

Followers

Reviews: 32

User Lists: 0

#445 mitu123
Member since 2006 • 155290 Posts

[QUOTE="drakekratos"]Shadow of the Beast is PS4 exclusive too04dcarraher
should fix your sig its not 50% only 30% more powerful.

Even that's quite a bit of a difference.:P

Avatar image for GravityX
GravityX

865

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#446 GravityX
Member since 2013 • 865 Posts

[QUOTE="Mystery_Writer"]

 ESRAM consumed ~32 precent from the ~5 billion transistor budget i.e. 1.6 billion for T6 SRAM. tormentos

Ron, if you were to design the X1 APU, which solution would you pick as the more elegant architecture;

a) 12 CUs + ESRAM + DDR3

b) 18 CUs + GDDR5

 Oh he saw your post and ignores it because everything he knows about PC tell him that.

18CU +GDDR5 >>> 12CU +ESRAM+ DDR3.

So I wonder what MS is going to to talk about at Hot Chips?

http://www.hotchips.org/

An off the shelf AMD 7790 gimped GPU? They may get laughed out of there huh.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#447 ronvalencia
Member since 2008 • 29612 Posts

[QUOTE="ronvalencia"]

660 Ti doesn't win on AMD bias games. You not factoring ALU behaviour differences.

----

660Ti has a larger 64KB L1 cache per SMX unit that covers 192 stream processors. There are 7 SMX units = 448 KB L1 cache @ 915Mhz.

7850 has 16 CUs x 16 KB L1 cache = 256 KB L1 cache @ 860Mhz.

7950@ 800 Mhz: 28 CU x 16 KB L1 = 448 KB @ 800 Mhz.

When games are not bound by GK014's register file limits, GK104 can match Tahiti.tormentos

But on other games it does,with 100GB/s less there is not a single excuse you can use to justify it,bandwidth mean little when your GPU is weak i proved that.

Your not comparing apple to apples. Both X1 and PS4 doesn't have NVIDIA Kepler GK104.

For 5760x1080p resolutions, 660 Ti doesn't scale as good as 7950. BattleField 3's 26.5 fps results can be made to reach 30 fps by slightly reducing the detail settings (a PC user feature) e.g. shadows @ ultra to high, MSAA 4X to 2X. Also, Techpowerup did not use the latest 7950 Boost Edition (aka 8950) variant.

-----

NV SMX's L1 cache/shared memory can do 256 byte per cycle.

660 Ti's 7 SMX x 256 byte per cycle x 915Mhz = 1.6 TB/s. L1/shared memory size is 448 KB.

AMD CU's L1 cache can do 64 byte per cycle.

7950's 28 CU x 64 byte per cycle x 800 Mhz = 1.4 TB/s, L1/shared memory size is 448 KB.

Again, you are not factoring GK104's SRAM storage capability and it's clocked faster than 7950 @800Mhz.

There's a reason why AMD updated 7950 into 7950 Boost Edition. My 7950 doesn't need to be updated into 7950 BE since it's already clocked at 900Mhz out-of-the-box.

Larger SRAM storage = less trip to the external memory.

AMD has somewhat duplicated 660 Ti's setup with 7870 XT (925Mhz with 975Mhz boost, 192 GB/s). Again, AMD updated 7950 to 7950 Boost Edition so it wouldn't be in conflict with 7870 XT.

The prototype 7850 with 768 stream processors would be closer to X1's GCN.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#448 ronvalencia
Member since 2008 • 29612 Posts

[QUOTE="ronvalencia"][QUOTE="ManatuBeard"]

I think the GPU on the X1 has less CUs mostly because of die space. The eSRAM takes 1/4 of the chip size, so they had to cut the number of CUs down to 12.

Mystery_Writer

ESRAM consumed ~32 precent from the ~5 billion transistor budget i.e. 1.6 billion for T6 SRAM.

Ron, if you were to design the X1 APU, which solution would you pick as the more elegant architecture;

a) 12 CUs + ESRAM + DDR3

b) 18 CUs + GDDR5

Selection B. It's closer to my PC specs.

Since Q2 2013, Intel is already using PS4's high density GDDR5 with their Xeon Phi products i.e. 512bit 16 GB GDDR5, hence 8 GB GDDR5 at 256bit.

http://ark.intel.com/products/75799/Intel-Xeon-Phi-Coprocessor-7120P-16GB-1_238-GHz-61-core

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#449 ronvalencia
Member since 2008 • 29612 Posts

Oh he saw your post and ignores it because everything he knows about PC tell him that.

18CU +GDDR5 >>> 12CU +ESRAM+ DDR3.

tormentos

Fictional post.

I haven't bought any GCN products with non-GDDR5.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#450 ronvalencia
Member since 2008 • 29612 Posts

Formula all you want, you are pig ignorant to basic GPU understanding Ron, let alone anything more advanced. 

AMD655
LOL, what a joke post.