Cross-Platform is a GAMECHANGER... Xbox-PC Brotherhood!

  • 51 results
  • 1
  • 2

This topic is locked from further discussion.

Avatar image for fpsgod
FPSGOD

200

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#1 FPSGOD
Member since 2015 • 200 Posts

This really is the ace in the hole for Microsoft. They are closing the gap between console and master race. Scorpio is going to kick ass, and be the next evolution in the console game. Expect the Halo series to experience a rebirth with Halo 6. Cross-platform will introduce the game to a whole new demographic of master race players. All this is possible, because the Scorpio is the real deal. Large and in charge. Size matters, and Scorpio is packing heat.

I'm excited.

Avatar image for fpsgod
FPSGOD

200

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#2 FPSGOD
Member since 2015 • 200 Posts

I'm really excited.

Avatar image for GarGx1
GarGx1

10934

Forum Posts

0

Wiki Points

0

Followers

Reviews: 4

User Lists: 0

#3 GarGx1
Member since 2011 • 10934 Posts

Not really wanting to burst your bubble but....

Halo 1 and 2 were on PC, it's not a new demographic.

As for cross play, console gamers and their controllers are not going to be wanting to play against PC gamers with m/kb set ups for very long. There's only so many times a person can categorically lose before they stop playing.

Scorpio is only a rumour right now but even if those rumours are true my PC built in 2014/15 is already more powerful.

Avatar image for Basinboy
Basinboy

14559

Forum Posts

0

Wiki Points

0

Followers

Reviews: 19

User Lists: 0

#4 Basinboy
Member since 2003 • 14559 Posts

How many drain-clogging threads should we expect from you each day?

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#5  Edited By ronvalencia
Member since 2008 • 29612 Posts

@GarGx1 said:

Not really wanting to burst your bubble but....

Halo 1 and 2 were on PC, it's not a new demographic.

As for cross play, console gamers and their controllers are not going to be wanting to play against PC gamers with m/kb set ups for very long. There's only so many times a person can categorically lose before they stop playing.

Scorpio is only a rumour right now but even if those rumours are true my PC built in 2014/15 is already more powerful.

Your PC built in 2014/2015 is missing double rate 'float16' feature.

Vega 11 with 6 TFLOPS float32 yields 12 TFLOPS float16.

Avatar image for Shewgenja
Shewgenja

21456

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#6  Edited By Shewgenja
Member since 2009 • 21456 Posts

Isn't this what happens any time a new console comes out?

The gap won't stay in place. By 2018, the new graphics cards will be a generation apart all over again. This is some basic bitch posting.

Avatar image for GarGx1
GarGx1

10934

Forum Posts

0

Wiki Points

0

Followers

Reviews: 4

User Lists: 0

#7 GarGx1
Member since 2011 • 10934 Posts

@ronvalencia: Yeah cause 16 bit floating point is going to make all the difference.

Avatar image for cainetao11
cainetao11

38065

Forum Posts

0

Wiki Points

0

Followers

Reviews: 77

User Lists: 1

#8  Edited By cainetao11
Member since 2006 • 38065 Posts

Well I cant comment on Scorpio as I don't really know much that is concrete about it. But for MS, cross support between PC and Xbox is a no brainer. Th great majority and not like 55%, we're talking Monopoly levels, of PCs use MS' Windows OS. Obviously they want to support that product and will their Xbox. Seems obvious.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#9  Edited By ronvalencia
Member since 2008 • 29612 Posts

@GarGx1 said:

@ronvalencia: Yeah cause 16 bit floating point is going to make all the difference.

GeForce FX's float16 shaders path has existed before DX10's float32 hardware changes.

Tegra X1 and GP100 has double rate float16 shaders. GP102 and GP104 has broken float16 shaders i.e. 1/64 speed of float32 rate which is exposed by CUDA 7.5. Float32 units can emulate float16 without any performance benefits.

VooFoo Dev Was Initially Doubtful About PS4 PRO GPU’s Capabilities But Was Pleasantly Surprised Later

“I was actually very pleasantly surprised. Not initially – the specs on paper don’t sound great, as you are trying to fill four times as many pixels on screen with a GPU that is only just over twice as powerful, and without a particularly big increase in memory bandwidth,” he explained, echoing the sentiment that a lot of us seem to have, before adding, “But when you drill down into the detail, the PS4 Pro GPU has a lot of new features packed into it too, which means you can do far more per cycle than you can with the original GPU (twice as much in fact, in some cases). You’ve still got to work very hard to utilise the extra potential power, but we were very keen to make this happen in Mantis Burn Racing.

“In Mantis Burn Racing, much of the graphical complexity is in the pixel detail, which means most of our cycles are spent doing pixel shader work. Much of that is work that can be done at 16-bit rather than 32-bit precision, without any perceivable difference in the end result – and PS4 Pro can do 16 bit-floating point operations twice as fast as the 32-bit equivalent.”

Mantis Burn Racing PS4 Pro version reached 4K/60 fps which is 4X effectiveness over the original PS4 version's 1080p/60 fps.

FP16 optimisation is similar to GeForce FX's era The Way Meant to be Played optimisation paths.

On the original PS4, developers already using FP16 for raster hardware e.g. Kill Zone ShadowFall.

Sebbbi a dev on Beyond3D had this to say about FP16

Sebbbi (A dev on Beyond3D) comment on FP16.

Originally Posted by Sebbbi on Beyond3D 2 years ago

Sometimes it requires more work to get lower precision calculations to work (with zero image quality degradation), but so far I haven't encountered big problems in fitting my pixel shader code to FP16 (including lighting code). Console developers have a lot of FP16 pixel shader experience because of PS3. Basically all PS3 pixel shader code was running on FP16.

It is still is very important to pack the data in memory as tightly as possible as there is never enough bandwidth to lose. For example 16 bit (model space) vertex coordinates are still commonly used, the material textures are still dxt compressed (barely 8 bit quality) and the new HDR texture formats (BC6H) commonly used in cube maps have significantly less precision than a 16 bit float. All of these can be processed by 16 bit ALUs in pixel shader with no major issues. The end result will still be eventually stored to 8 bit per channel back buffer and displayed.

Could you give us some examples of operations done in pixel shaders that require higher than 16 bit float processing?

EDIT: One example where 16 bit float processing is not enough: Exponential variance shadow mapping (EVSM) needs both 32 bit storage (32 bit float textures + 32 bit float filtering) and 32 bit float ALU processing.

However EVSM is not yet universally possible on mobile platforms right now, as there's no standard support for 32 bit float filtering in mobile devices (OpenGL ES 3.0 just recently added support for 16 bit float filtering, 32 bit float filtering is not yet present). Obviously GPU manufacturers can have OpenGL ES extensions to add FP32 filtering support if their GPU supports it (as most GPUs should as this has been a required feature in DirectX since 10.0).

#33sebbbi, Oct 18, 2014 Last edited by a moderator: Oct 18, 2014

http://wccftech.com/flying-wild-hog-ps4-pro-gpu-beast/

What do you think of Sony’s PlayStation 4 Pro in terms of performance? Is it powerful enough to deliver 4K gaming?

PS4Pro is a great upgrade over base PS4. the CPU didn’t get a big upgrade, but GPU is a beast. It also has some interesting hardware features, which help with achieving 4K resolution without resorting to brute force.

PS4 Pro’s architect Mark Cerny said that the console introduces the ability to perform two 16-bit operations at the same time, instead of one 32-bit operation. He suggested that this has the potential to “radically increase the performance of games” – do you agree with this assessment?

Yes. Half precision (16 bit) instructions are a great feature. They were used some time ago in Geforce FX, but didn’t manage to gain popularity and were dropped. It’s a pity, because most operations don’t need full float (32 bit) precision and it’s a waste to use full float precision for them. With half precision instructions we could gain much better performance without sacrificing image quality.

MS , AMD , Sony & Intel think it will work for graphics

GCN 3 /4 has single rate native float 16 which reduces memory bandwidth usage i.e. on memory bandwidth bound situations, native float16 shaders support can double the effective floating point operations within the same memory bandwidth. A double rate float16 occurs at the GPU level.

With SM6's native float16, AMD is going to do another Async compute like gimping on exiting NVIDIA GPUs.

AMD is using their relationship with MS to drive SM6's direction.

Avatar image for GarGx1
GarGx1

10934

Forum Posts

0

Wiki Points

0

Followers

Reviews: 4

User Lists: 0

#10 GarGx1
Member since 2011 • 10934 Posts

@ronvalencia said:
@GarGx1 said:

@ronvalencia: Yeah cause 16 bit floating point is going to make all the difference.

GeForce FX's float16 shaders path has existed before DX10's float32 hardware changes.

Tegra X1 and GP100 has double rate float16 shaders. GP102 and GP104 has broken float16 shaders i.e. 1/64 speed of float32 rate.

I heard it's known to cause memory conflicts on AMD cards as well

Avatar image for oflow
oflow

5185

Forum Posts

0

Wiki Points

0

Followers

Reviews: 40

User Lists: 0

#11 oflow
Member since 2003 • 5185 Posts

Seems to work pretty well so far that's a plus.

As far as console players getting tired of playing against PC players it's a non issue. In its current state they don't allow ranked competitive modes to crossplay. I imagine they will when Scorpio gets keyboard and mouse support then again it's a non issue.

Avatar image for tormentos
tormentos

33793

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#12 tormentos
Member since 2003 • 33793 Posts

@ronvalencia said:

Your PC built in 2014/2015 is missing double rate 'float16' feature.

Vega 11 with 6 TFLOPS float32 yields 12 TFLOPS float16.

That is total bullshit the PS4 Pro has the same feature and is not pumping 8.4TF of power,not all process can be float16.

And even so is not like the GPU will double its performance,stop making shitty ass assumptions man you are like the new misterXmedia..

The same shit was say about DX12 and nothing happen.

Avatar image for oflow
oflow

5185

Forum Posts

0

Wiki Points

0

Followers

Reviews: 40

User Lists: 0

#13 oflow
Member since 2003 • 5185 Posts

@tormentos: you don't even play games so it's a non issue with you too.

Avatar image for n64dd
N64DD

13167

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#14 N64DD
Member since 2015 • 13167 Posts

@Basinboy said:

How many drain-clogging threads should we expect from you each day?

3 in the same day. SW deserves a better tier of troll.

Avatar image for kvally
kvally

8445

Forum Posts

0

Wiki Points

0

Followers

Reviews: 7

User Lists: 9

#15 kvally
Member since 2014 • 8445 Posts

@oflow said:

@tormentos: you don't even play games so it's a non issue with you too.

Avatar image for blueinheaven
blueinheaven

5566

Forum Posts

0

Wiki Points

0

Followers

Reviews: 2

User Lists: 0

#16 blueinheaven
Member since 2008 • 5566 Posts

Whose alt are we dealing with this time? Is that you again Nyad? Okay I'll play along with this nonsense. What is the advantage exactly OP? Have you just found a new buzzword (cross-platform) and you don't know what it means but it sounds good so you're going to bleat about it?

As has been explained, console gamers v PC gamers will get completely ruined and won't come back for a second humiliation.

It's like this 'play anywhere' crap, like Xbox players will buy a PC so they can play the same game on it or vice versa lol, just buzzwords, in the real world they mean nothing. And Scorpio will have nothing and I mean 'nothing' that you can't already play on PC or Xbox One and with very few exceptions, PS4.

/thread

Avatar image for the_master_race
the_master_race

5226

Forum Posts

0

Wiki Points

0

Followers

Reviews: 11

User Lists: 0

#17 the_master_race
Member since 2015 • 5226 Posts
@GarGx1 said:

Not really wanting to burst your bubble but....

Halo 1 and 2 were on PC, it's not a new demographic.

As for cross play, console gamers and their controllers are not going to be wanting to play against PC gamers with m/kb set ups for very long. There's only so many times a person can categorically lose before they stop playing.

Scorpio is only a rumour right now but even if those rumours are true my PC built in 2014/15 is already more powerful.

well ,FPS is not the only genre , I played Rocket League and Gwent with Xbox players , gotta say it was really a cool experience

Avatar image for GarGx1
GarGx1

10934

Forum Posts

0

Wiki Points

0

Followers

Reviews: 4

User Lists: 0

#18 GarGx1
Member since 2011 • 10934 Posts

@the_master_race said:
@GarGx1 said:

Not really wanting to burst your bubble but....

Halo 1 and 2 were on PC, it's not a new demographic.

As for cross play, console gamers and their controllers are not going to be wanting to play against PC gamers with m/kb set ups for very long. There's only so many times a person can categorically lose before they stop playing.

Scorpio is only a rumour right now but even if those rumours are true my PC built in 2014/15 is already more powerful.

well ,FPS is not the only genre , I played Rocket League and Gwent with Xbox players , gotta say it was really a cool experience

It's Halo 6 speculation thread, which is why I mentioned the controller/mouse disparity in FPS games.

Avatar image for MonsieurX
MonsieurX

39858

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#19 MonsieurX
Member since 2008 • 39858 Posts

cool blog

Avatar image for Pedro
Pedro

73912

Forum Posts

0

Wiki Points

0

Followers

Reviews: 72

User Lists: 0

#20 Pedro  Online
Member since 2002 • 73912 Posts

I play Overwatch on the PC with the Xbox controller. So no complaints here.

Avatar image for ShadowDeathX
ShadowDeathX

11699

Forum Posts

0

Wiki Points

0

Followers

Reviews: 4

User Lists: 0

#21 ShadowDeathX
Member since 2006 • 11699 Posts

For games that won't have much balance issues, I welcome cross platform multiplayer. It is always nice to have a larger player base to play with.

Once Xbox One supports Keyboard and Mouse for gaming, they might allow the option for a unified and split community system.

Avatar image for Alucard_Prime
Alucard_Prime

10107

Forum Posts

0

Wiki Points

0

Followers

Reviews: 23

User Lists: 0

#22  Edited By Alucard_Prime
Member since 2008 • 10107 Posts

Been playing PC gamers all week on Gears 4 social versus, its pretty cool, haven't noticed much of a difference to be honest....still kicking ass. But its definitely a plus as the game has a larger population, more people get to play the game.....glad they are doing it.

Avatar image for Gatygun
Gatygun

2709

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#23 Gatygun
Member since 2010 • 2709 Posts

@ronvalencia said:
@GarGx1 said:

@ronvalencia: Yeah cause 16 bit floating point is going to make all the difference.

GeForce FX's float16 shaders path has existed before DX10's float32 hardware changes.

Tegra X1 and GP100 has double rate float16 shaders. GP102 and GP104 has broken float16 shaders i.e. 1/64 speed of float32 rate which is exposed by CUDA 7.5. Float32 units can emulate float16 without any performance benefits.

VooFoo Dev Was Initially Doubtful About PS4 PRO GPU’s Capabilities But Was Pleasantly Surprised Later

“I was actually very pleasantly surprised. Not initially – the specs on paper don’t sound great, as you are trying to fill four times as many pixels on screen with a GPU that is only just over twice as powerful, and without a particularly big increase in memory bandwidth,” he explained, echoing the sentiment that a lot of us seem to have, before adding, “But when you drill down into the detail, the PS4 Pro GPU has a lot of new features packed into it too, which means you can do far more per cycle than you can with the original GPU (twice as much in fact, in some cases). You’ve still got to work very hard to utilise the extra potential power, but we were very keen to make this happen in Mantis Burn Racing.

“In Mantis Burn Racing, much of the graphical complexity is in the pixel detail, which means most of our cycles are spent doing pixel shader work. Much of that is work that can be done at 16-bit rather than 32-bit precision, without any perceivable difference in the end result – and PS4 Pro can do 16 bit-floating point operations twice as fast as the 32-bit equivalent.”

Mantis Burn Racing PS4 Pro version reached 4K/60 fps which is 4X effectiveness over the original PS4 version's 1080p/60 fps.

FP16 optimisation is similar to GeForce FX's era The Way Meant to be Played optimisation paths.

On the original PS4, developers already using FP16 for raster hardware e.g. Kill Zone ShadowFall.

Sebbbi a dev on Beyond3D had this to say about FP16

Sebbbi (A dev on Beyond3D) comment on FP16.

Originally Posted by Sebbbi on Beyond3D 2 years ago

Sometimes it requires more work to get lower precision calculations to work (with zero image quality degradation), but so far I haven't encountered big problems in fitting my pixel shader code to FP16 (including lighting code). Console developers have a lot of FP16 pixel shader experience because of PS3. Basically all PS3 pixel shader code was running on FP16.

It is still is very important to pack the data in memory as tightly as possible as there is never enough bandwidth to lose. For example 16 bit (model space) vertex coordinates are still commonly used, the material textures are still dxt compressed (barely 8 bit quality) and the new HDR texture formats (BC6H) commonly used in cube maps have significantly less precision than a 16 bit float. All of these can be processed by 16 bit ALUs in pixel shader with no major issues. The end result will still be eventually stored to 8 bit per channel back buffer and displayed.

Could you give us some examples of operations done in pixel shaders that require higher than 16 bit float processing?

EDIT: One example where 16 bit float processing is not enough: Exponential variance shadow mapping (EVSM) needs both 32 bit storage (32 bit float textures + 32 bit float filtering) and 32 bit float ALU processing.

However EVSM is not yet universally possible on mobile platforms right now, as there's no standard support for 32 bit float filtering in mobile devices (OpenGL ES 3.0 just recently added support for 16 bit float filtering, 32 bit float filtering is not yet present). Obviously GPU manufacturers can have OpenGL ES extensions to add FP32 filtering support if their GPU supports it (as most GPUs should as this has been a required feature in DirectX since 10.0).

#33sebbbi, Oct 18, 2014 Last edited by a moderator: Oct 18, 2014

http://wccftech.com/flying-wild-hog-ps4-pro-gpu-beast/

What do you think of Sony’s PlayStation 4 Pro in terms of performance? Is it powerful enough to deliver 4K gaming?

PS4Pro is a great upgrade over base PS4. the CPU didn’t get a big upgrade, but GPU is a beast. It also has some interesting hardware features, which help with achieving 4K resolution without resorting to brute force.

PS4 Pro’s architect Mark Cerny said that the console introduces the ability to perform two 16-bit operations at the same time, instead of one 32-bit operation. He suggested that this has the potential to “radically increase the performance of games” – do you agree with this assessment?

Yes. Half precision (16 bit) instructions are a great feature. They were used some time ago in Geforce FX, but didn’t manage to gain popularity and were dropped. It’s a pity, because most operations don’t need full float (32 bit) precision and it’s a waste to use full float precision for them. With half precision instructions we could gain much better performance without sacrificing image quality.

MS , AMD , Sony & Intel think it will work for graphics

GCN 3 /4 has single rate native float 16 which reduces memory bandwidth usage i.e. on memory bandwidth bound situations, native float16 shaders support can double the effective floating point operations within the same memory bandwidth. A double rate float16 occurs at the GPU level.

With SM6's native float16, AMD is going to do another Async compute like gimping on exiting NVIDIA GPUs.

AMD is using their relationship with MS to drive SM6's direction.

Even the dude mentions in your own post you need 32 bit for some tasks, so double performance aint going to happen.

Now showcase me a game that uses fp16 that we can compare? because so far i see is that fp16 on the ps4 pro not really helping them out very much. It does exactly what you would aspect from what the console would push in the fp32 department.

6tflops on top of it, if that's amd flops isn't going to yield more then 980 gtx performance at it's absolute best. Let alone titan xp performance.

I wonder why all that magic double peformance 16 floating point performance is going into then. Should be easy performance gain then.

fp16 sounds more and more like our dx12 update, loads of hot air that results in zero performance gain. Or almost nothing.

Avatar image for hrt_rulz01
hrt_rulz01

22681

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#25 hrt_rulz01
Member since 2006 • 22681 Posts

@oflow said:

@tormentos: you don't even play games so it's a non issue with you too.

Lol, spot on.

Avatar image for xhawk27
xhawk27

12194

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#26 xhawk27
Member since 2010 • 12194 Posts

Need a name for these guys? I am thinking Tigers!

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#27  Edited By ronvalencia
Member since 2008 • 29612 Posts

@GarGx1 said:
@ronvalencia said:
@GarGx1 said:

@ronvalencia: Yeah cause 16 bit floating point is going to make all the difference.

GeForce FX's float16 shaders path has existed before DX10's float32 hardware changes.

Tegra X1 and GP100 has double rate float16 shaders. GP102 and GP104 has broken float16 shaders i.e. 1/64 speed of float32 rate.

I heard it's known to cause memory conflicts on AMD cards as well

Please share the links with FP16 vs FP32 and memory conflicts.

Double packed FP16 in a SIMD format shouldn't cause any memory conflicts with FP32.

Double packed FP16 is still 32bit data with virtual divider between low FP16 and high FP16 for two FP16 operations.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#28  Edited By ronvalencia
Member since 2008 • 29612 Posts

@Gatygun said:
@ronvalencia said:
@GarGx1 said:

@ronvalencia: Yeah cause 16 bit floating point is going to make all the difference.

GeForce FX's float16 shaders path has existed before DX10's float32 hardware changes.

Tegra X1 and GP100 has double rate float16 shaders. GP102 and GP104 has broken float16 shaders i.e. 1/64 speed of float32 rate which is exposed by CUDA 7.5. Float32 units can emulate float16 without any performance benefits.

VooFoo Dev Was Initially Doubtful About PS4 PRO GPU’s Capabilities But Was Pleasantly Surprised Later

“I was actually very pleasantly surprised. Not initially – the specs on paper don’t sound great, as you are trying to fill four times as many pixels on screen with a GPU that is only just over twice as powerful, and without a particularly big increase in memory bandwidth,” he explained, echoing the sentiment that a lot of us seem to have, before adding, “But when you drill down into the detail, the PS4 Pro GPU has a lot of new features packed into it too, which means you can do far more per cycle than you can with the original GPU (twice as much in fact, in some cases). You’ve still got to work very hard to utilise the extra potential power, but we were very keen to make this happen in Mantis Burn Racing.

“In Mantis Burn Racing, much of the graphical complexity is in the pixel detail, which means most of our cycles are spent doing pixel shader work. Much of that is work that can be done at 16-bit rather than 32-bit precision, without any perceivable difference in the end result – and PS4 Pro can do 16 bit-floating point operations twice as fast as the 32-bit equivalent.”

Mantis Burn Racing PS4 Pro version reached 4K/60 fps which is 4X effectiveness over the original PS4 version's 1080p/60 fps.

FP16 optimisation is similar to GeForce FX's era The Way Meant to be Played optimisation paths.

On the original PS4, developers already using FP16 for raster hardware e.g. Kill Zone ShadowFall.

Sebbbi a dev on Beyond3D had this to say about FP16

Sebbbi (A dev on Beyond3D) comment on FP16.

Originally Posted by Sebbbi on Beyond3D 2 years ago

Sometimes it requires more work to get lower precision calculations to work (with zero image quality degradation), but so far I haven't encountered big problems in fitting my pixel shader code to FP16 (including lighting code). Console developers have a lot of FP16 pixel shader experience because of PS3. Basically all PS3 pixel shader code was running on FP16.

It is still is very important to pack the data in memory as tightly as possible as there is never enough bandwidth to lose. For example 16 bit (model space) vertex coordinates are still commonly used, the material textures are still dxt compressed (barely 8 bit quality) and the new HDR texture formats (BC6H) commonly used in cube maps have significantly less precision than a 16 bit float. All of these can be processed by 16 bit ALUs in pixel shader with no major issues. The end result will still be eventually stored to 8 bit per channel back buffer and displayed.

Could you give us some examples of operations done in pixel shaders that require higher than 16 bit float processing?

EDIT: One example where 16 bit float processing is not enough: Exponential variance shadow mapping (EVSM) needs both 32 bit storage (32 bit float textures + 32 bit float filtering) and 32 bit float ALU processing.

However EVSM is not yet universally possible on mobile platforms right now, as there's no standard support for 32 bit float filtering in mobile devices (OpenGL ES 3.0 just recently added support for 16 bit float filtering, 32 bit float filtering is not yet present). Obviously GPU manufacturers can have OpenGL ES extensions to add FP32 filtering support if their GPU supports it (as most GPUs should as this has been a required feature in DirectX since 10.0).

#33sebbbi, Oct 18, 2014 Last edited by a moderator: Oct 18, 2014

http://wccftech.com/flying-wild-hog-ps4-pro-gpu-beast/

What do you think of Sony’s PlayStation 4 Pro in terms of performance? Is it powerful enough to deliver 4K gaming?

PS4Pro is a great upgrade over base PS4. the CPU didn’t get a big upgrade, but GPU is a beast. It also has some interesting hardware features, which help with achieving 4K resolution without resorting to brute force.

PS4 Pro’s architect Mark Cerny said that the console introduces the ability to perform two 16-bit operations at the same time, instead of one 32-bit operation. He suggested that this has the potential to “radically increase the performance of games” – do you agree with this assessment?

Yes. Half precision (16 bit) instructions are a great feature. They were used some time ago in Geforce FX, but didn’t manage to gain popularity and were dropped. It’s a pity, because most operations don’t need full float (32 bit) precision and it’s a waste to use full float precision for them. With half precision instructions we could gain much better performance without sacrificing image quality.

MS , AMD , Sony & Intel think it will work for graphics

GCN 3 /4 has single rate native float 16 which reduces memory bandwidth usage i.e. on memory bandwidth bound situations, native float16 shaders support can double the effective floating point operations within the same memory bandwidth. A double rate float16 occurs at the GPU level.

With SM6's native float16, AMD is going to do another Async compute like gimping on exiting NVIDIA GPUs.

AMD is using their relationship with MS to drive SM6's direction.

Even the dude mentions in your own post you need 32 bit for some tasks, so double performance aint going to happen.

Now showcase me a game that uses fp16 that we can compare? because so far i see is that fp16 on the ps4 pro not really helping them out very much. It does exactly what you would aspect from what the console would push in the fp32 department.

6tflops on top of it, if that's amd flops isn't going to yield more then 980 gtx performance at it's absolute best. Let alone titan xp performance.

I wonder why all that magic double peformance 16 floating point performance is going into then. Should be easy performance gain then.

fp16 sounds more and more like our dx12 update, loads of hot air that results in zero performance gain. Or almost nothing.

Again, AMD's TFLOPS is not the problem!!!

YOU: Now showcase me a game that uses fp16 that we can compare?

Again,

Mantis Burn Racing PS4 Pro version reached 4K/60 fps which is 4X effectiveness over the original PS4 version's 1080p/60 fps.

"Console developers have a lot of FP16 pixel shader experience because of PS3.Basically all PS3 pixel shader code was running on FP16."

You can compare FLOPS only under specific conditions.

AMD's shader FLOPS are not the problem!!!!!!!!!

From https://developer.nvidia.com/dx12-dos-and-donts

On DX11 the driver does farm off asynchronous tasks to driver worker threads where possible – this doesn’t happen anymore under DX12

Nvidia DX11 driver already using key DX12 style speed up methods i.e.

1. asynchronous tasks

2. threads i.e. more than one threads

Nvidia DX11 driver has at least 1.72X the draw call headroom.

Under Vulkan and DX12, AMD GPU nullifies NVIDIA's DX11 driver advantage.

https://www.computerbase.de/2016-07/doom-vulkan-benchmarks-amd-nvidia/

R9-Fury X's raw 8.6 TFLOPS exposed with Vulkan+Async Compute+AMD Intrinsics+No Nvidia tessellation over draw politics (it wouldn't exist on consoles).

Async Compute workloads are usually out of phase with Sync Graphics Command's memory bandwidth access.

980 Ti's 6.4 TFLOPS gap with Fury X's 8.6 TFLOPS is 1.34X

Doom Vulkan's framerate gap between 980 Ti and Fury X is 1.30X

AMD GCN hardware is fine, but the AMD DX11/OGL driver is not on par with NVidia's DX11/OGL driver.

When there's sufficient shader (FLOPS) resources, the major bottleneck is the effective memory bandwidth i.e. ALUs needs to read and write the results to memory.

This is why AMD is pushing for HBM2 and why NAVI will get a new memory design.

Async compute's memory bandwidth consumption is out of sync from Sync compute's memory bandwidth consumption

---------------

As for RX-480, any overclock editions will be bounded by effective memory bandwidth.

For reference RX-480

((((256 bit x 8000Mhz) / 8) / 1024) x Polaris's 77.6 percent memory bandwidth efficiency) x Polaris's compression booster 1.36X = 263.84 GB/s

--

Scorpio's "more than 320 GB/s memory bandwidth" claim.

((((384 bit x GDDR5-6900 Mhz) / 8) / 1024) x Polaris's 77.6 percent memory bandwidth efficiency) x Polaris's compression booster 1.36X = 341 .34 GB/s

PS;

((384 bit x GDDR5-6900 Mhz) / 8) / 1024) = 323 GB/s physical memory bandwidth.

((384 bit x GDDR5-7000 Mhz) / 8) / 1024) = 328 GB/s physical memory bandwidth.

Comparison.

The memory bandwidth gap between Fury X and R9-290 = 1.266X (random textures)

With Fury X, it's memory compression is inferior to NVIDIA's Maxwell.

The FLOPS gap between Fury X and R9-290 = 1.48X

The frame rate gap between R9-290X and Fury X is 1.19X.

Random texture memory bandwidth gap's 1.266X factor is closer to frame rate gap's 1.19X.FLOPS gap between R9-290X (5.8 TFLOPS)and Fury X (8.6 TFLOPS) plays very little part with frame rate gap.

With 980 Ti (5.63 TFLOPS), it's superior memory compression enables it to match Fury X's results.

Conclusion:

1. When there's enough FLOPS for a particular workload, effective memory bandwidth is better prediction method for higher grade GPUs.

2. The effective memory bandwidth between Fury X and 980 Ti are similar hence similar results.

-------------------

Example of near brain dead Xbox One ports running PC GPUs.

Frame rate difference between 980 Ti and R9-290X is 1.31X with Forza 6 Apex

Effective memory bandwidth between 980 Ti and R9-290X is 1.38X

Again, Fury X and 980 Ti has similar effective memory bandwidth, hence similar results for most cases.

Forza 6 Apex is another example for effective memory bandwidth influencing the frame rate result.

For RE7 PC build, R9-390X's 5.9 TFLOPS nears 980 Ti's 5.63 TFLOPS and 1070's 6.4 TFLOPS.

RE7's geometry load would be optimised for AMD GPUs.

Fury X's 4 GB VRAM gimps the GPU.

For AMD Vega http://www.anandtech.com/show/11002/the-amd-vega-gpu-architecture-teaser/3

ROPs & Rasterizers: Binning for the Win(ning)

We’ll suitably round-out our overview of AMD’s Vega teaser with a look at the front and back-ends of the GPU architecture. While AMD has clearly put quite a bit of effort into the shader core, shader engines, and memory, they have not ignored the rasterizers at the front-end or the ROPs at the back-end. In fact this could be one of the most important changes to the architecture from an efficiency standpoint.

Back in August, our pal David Kanter discovered one of the important ingredients of the secret sauce that is NVIDIA’s efficiency optimizations. As it turns out, NVIDIA has been doing tile based rasterization and binning since Maxwell, and that this was likely one of the big reasons Maxwell’s efficiency increased by so much. Though NVIDIA still refuses to comment on the matter, from what we can ascertain, breaking up a scene into tiles has allowed NVIDIA to keep a lot more traffic on-chip, which saves memory bandwidth, but also cuts down on very expensive accesses to VRAM.

For Vega, AMD will be doing something similar. The architecture will add support for what AMD calls the Draw Stream Binning Rasterizer, which true to its name, will give Vega the ability to bin polygons by tile. By doing so, AMD will cut down on the amount of memory accesses by working with smaller tiles that can stay-on chip. This will also allow AMD to do a better job of culling hidden pixels, keeping them from making it to the pixel shaders and consuming resources there.

As we have almost no detail on how AMD or NVIDIA are doing tiling and binning, it’s impossible to say with any degree of certainty just how close their implementations are, so I’ll refrain from any speculation on which might be better. But I’m not going to be too surprised if in the future we find out both implementations are quite similar. The important thing to take away from this right now is that AMD is following a very similar path to where we think NVIDIA captured some of their greatest efficiency gains on Maxwell, and that in turn bodes well for Vega.

Meanwhile, on the ROP side of matters, besides baking in the necessary support for the aforementioned binning technology, AMD is also making one other change to cut down on the amount of data that has to go off-chip to VRAM. AMD has significantly reworked how the ROPs (or as they like to call them, the Render Back-Ends) interact with their L2 cache. Starting with Vega, the ROPs are now clients of the L2 cache rather than the memory controller, allowing them to better and more directly use the relatively spacious L2 cache.

http://gamingbolt.com/ps4-pro-bandwidth-is-potential-bottleneck-for-4k-but-a-thought-through-tradeoff-little-nightmares-dev

PS4 Pro's 4.2 TFLOPS GPU bottlenecked by memory bandwidth.

GeForce GTX 980 has superior delta memory compression and tile render which reduces external memory bandwidth.

The main purpose for native FP16 is to reduce memory bandwidth which is the alternative Maxwell's superior delta memory compression and tile render memory bandwidth savings, but Vega has Maxwell style tiling render.

YOU blame AMD FLOPS != NVIDIA FLOPS when the real problem is AMD's non-FLOPS hardware that gimps AMD's shader FLOPS.

IF FP16 shaders are useless for games, how come NVIDIA Volta has rumoured to have double rate FP16 feature? Would your view change when NVIDIA gains their own double rate FP16 feature for GTX 2080 or GTX 2080 Ti or Titan X Volta?

http://www.mobipicker.com/nvidia-geforce-gtx-1080-ti-last-card-pascal-architecture/

GTX 1080 Ti is the last card with Pascal/Maxwell architecture.

Avatar image for Shewgenja
Shewgenja

21456

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#29 Shewgenja
Member since 2009 • 21456 Posts

@ronvalencia said:
@Gatygun said:
@ronvalencia said:
@GarGx1 said:

@ronvalencia: Yeah cause 16 bit floating point is going to make all the difference.

GeForce FX's float16 shaders path has existed before DX10's float32 hardware changes.

Tegra X1 and GP100 has double rate float16 shaders. GP102 and GP104 has broken float16 shaders i.e. 1/64 speed of float32 rate which is exposed by CUDA 7.5. Float32 units can emulate float16 without any performance benefits.

VooFoo Dev Was Initially Doubtful About PS4 PRO GPU’s Capabilities But Was Pleasantly Surprised Later

“I was actually very pleasantly surprised. Not initially – the specs on paper don’t sound great, as you are trying to fill four times as many pixels on screen with a GPU that is only just over twice as powerful, and without a particularly big increase in memory bandwidth,” he explained, echoing the sentiment that a lot of us seem to have, before adding, “But when you drill down into the detail, the PS4 Pro GPU has a lot of new features packed into it too, which means you can do far more per cycle than you can with the original GPU (twice as much in fact, in some cases). You’ve still got to work very hard to utilise the extra potential power, but we were very keen to make this happen in Mantis Burn Racing.

“In Mantis Burn Racing, much of the graphical complexity is in the pixel detail, which means most of our cycles are spent doing pixel shader work. Much of that is work that can be done at 16-bit rather than 32-bit precision, without any perceivable difference in the end result – and PS4 Pro can do 16 bit-floating point operations twice as fast as the 32-bit equivalent.”

Mantis Burn Racing PS4 Pro version reached 4K/60 fps which is 4X effectiveness over the original PS4 version's 1080p/60 fps.

FP16 optimisation is similar to GeForce FX's era The Way Meant to be Played optimisation paths.

On the original PS4, developers already using FP16 for raster hardware e.g. Kill Zone ShadowFall.

Sebbbi a dev on Beyond3D had this to say about FP16

Sebbbi (A dev on Beyond3D) comment on FP16.

Originally Posted by Sebbbi on Beyond3D 2 years ago

Sometimes it requires more work to get lower precision calculations to work (with zero image quality degradation), but so far I haven't encountered big problems in fitting my pixel shader code to FP16 (including lighting code). Console developers have a lot of FP16 pixel shader experience because of PS3. Basically all PS3 pixel shader code was running on FP16.

It is still is very important to pack the data in memory as tightly as possible as there is never enough bandwidth to lose. For example 16 bit (model space) vertex coordinates are still commonly used, the material textures are still dxt compressed (barely 8 bit quality) and the new HDR texture formats (BC6H) commonly used in cube maps have significantly less precision than a 16 bit float. All of these can be processed by 16 bit ALUs in pixel shader with no major issues. The end result will still be eventually stored to 8 bit per channel back buffer and displayed.

Could you give us some examples of operations done in pixel shaders that require higher than 16 bit float processing?

EDIT: One example where 16 bit float processing is not enough: Exponential variance shadow mapping (EVSM) needs both 32 bit storage (32 bit float textures + 32 bit float filtering) and 32 bit float ALU processing.

However EVSM is not yet universally possible on mobile platforms right now, as there's no standard support for 32 bit float filtering in mobile devices (OpenGL ES 3.0 just recently added support for 16 bit float filtering, 32 bit float filtering is not yet present). Obviously GPU manufacturers can have OpenGL ES extensions to add FP32 filtering support if their GPU supports it (as most GPUs should as this has been a required feature in DirectX since 10.0).

#33sebbbi, Oct 18, 2014 Last edited by a moderator: Oct 18, 2014

http://wccftech.com/flying-wild-hog-ps4-pro-gpu-beast/

What do you think of Sony’s PlayStation 4 Pro in terms of performance? Is it powerful enough to deliver 4K gaming?

PS4Pro is a great upgrade over base PS4. the CPU didn’t get a big upgrade, but GPU is a beast. It also has some interesting hardware features, which help with achieving 4K resolution without resorting to brute force.

PS4 Pro’s architect Mark Cerny said that the console introduces the ability to perform two 16-bit operations at the same time, instead of one 32-bit operation. He suggested that this has the potential to “radically increase the performance of games” – do you agree with this assessment?

Yes. Half precision (16 bit) instructions are a great feature. They were used some time ago in Geforce FX, but didn’t manage to gain popularity and were dropped. It’s a pity, because most operations don’t need full float (32 bit) precision and it’s a waste to use full float precision for them. With half precision instructions we could gain much better performance without sacrificing image quality.

MS , AMD , Sony & Intel think it will work for graphics

GCN 3 /4 has single rate native float 16 which reduces memory bandwidth usage i.e. on memory bandwidth bound situations, native float16 shaders support can double the effective floating point operations within the same memory bandwidth. A double rate float16 occurs at the GPU level.

With SM6's native float16, AMD is going to do another Async compute like gimping on exiting NVIDIA GPUs.

AMD is using their relationship with MS to drive SM6's direction.

Even the dude mentions in your own post you need 32 bit for some tasks, so double performance aint going to happen.

Now showcase me a game that uses fp16 that we can compare? because so far i see is that fp16 on the ps4 pro not really helping them out very much. It does exactly what you would aspect from what the console would push in the fp32 department.

6tflops on top of it, if that's amd flops isn't going to yield more then 980 gtx performance at it's absolute best. Let alone titan xp performance.

I wonder why all that magic double peformance 16 floating point performance is going into then. Should be easy performance gain then.

fp16 sounds more and more like our dx12 update, loads of hot air that results in zero performance gain. Or almost nothing.

AMD's TFLOPS is not the problem!!!

You can compare FLOPS only under specific conditions.

From https://developer.nvidia.com/dx12-dos-and-donts

On DX11 the driver does farm off asynchronous tasks to driver worker threads where possible – this doesn’t happen anymore under DX12

Nvidia DX11 driver already using key DX12 style speed up methods i.e.

1. asynchronous tasks

2. threads i.e. more than one threads

Nvidia DX11 driver has at least 1.72X the draw call headroom.

Under Vulkan and DX12, AMD GPU nullifies NVIDIA's DX11 driver advantage.

https://www.computerbase.de/2016-07/doom-vulkan-benchmarks-amd-nvidia/

R9-Fury X's raw 8.6 TFLOPS exposed with Vulkan+Async Compute+AMD Intrinsics+No Nvidia tessellation over draw politics (it wouldn't exist on consoles).

Async Compute workloads are usually out of phase with Sync Graphics Command's memory bandwidth access.

980 Ti's 6.4 TFLOPS gap with Fury X's 8.6 TFLOPS is 1.34X

Doom Vulkan's framerate gap between 980 Ti and Fury X is 1.30X

AMD GCN hardware is fine, but the AMD DX11/OGL driver is not on par with NVidia's DX11/OGL driver.

When there's sufficient shader (FLOPS) resources, the major bottleneck is the effective memory bandwidth i.e. ALUs needs to read and write the results to memory.

This is why AMD is pushing for HBM2 and why NAVI will get a new memory design.

Async compute's memory bandwidth consumption is out of sync from Sync compute's memory bandwidth consumption

---------------

As for RX-480, any overclock editions will be bounded by effective memory bandwidth.

For reference RX-480

((((256 bit x 8000Mhz) / 8) / 1024) x Polaris's 77.6 percent memory bandwidth efficiency) x Polaris's compression booster 1.36X = 263.84 GB/s

--

Scorpio's "more than 320 GB/s memory bandwidth" claim.

((((384 bit x GDDR5-6900 Mhz) / 8) / 1024) x Polaris's 77.6 percent memory bandwidth efficiency) x Polaris's compression booster 1.36X = 341 .34 GB/s

PS;

((384 bit x GDDR5-6900 Mhz) / 8) / 1024) = 323 GB/s physical memory bandwidth.

((384 bit x GDDR5-7000 Mhz) / 8) / 1024) = 328 GB/s physical memory bandwidth.

Comparison.

The memory bandwidth gap between Fury X and R9-290 = 1.266X (random textures)

With Fury X, it's memory compression is inferior to NVIDIA's Maxwell.

The FLOPS gap between Fury X and R9-290 = 1.48X

The frame rate gap between R9-290X and Fury X is 1.19X.

Random texture memory bandwidth gap's 1.266X factor is closer to frame rate gap's 1.19X.FLOPS gap between R9-290X (5.8 TFLOPS)and Fury X (8.6 TFLOPS) plays very little part with frame rate gap.

With 980 Ti (5.63 TFLOPS), it's superior memory compression enables it to match Fury X's results.

Conclusion:

1. When there's enough FLOPS for a particular workload, effective memory bandwidth is better prediction method for higher grade GPUs.

2. The effective memory bandwidth between Fury X and 980 Ti are similar hence similar results.

-------------------

Example of near brain dead Xbox One ports running PC GPUs.

Frame rate difference between 980 Ti and R9-290X is 1.31X with Forza 6 Apex

Effective memory bandwidth between 980 Ti and R9-290X is 1.38X

Forza 6 Apex is another example for effective memory bandwidth influencing the frame rate result.

For RE7 PC build, R9-390X's 5.9 TFLOPS number nears 980 Ti's 5.63 TFLOPS and 1070's 6.4 TFLOPS.

RE7's geometry load would be optimised for AMD GPUs.

Interesting observations. So, you twist the flux capacitor counter clockwise then insert the soloflange.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#30  Edited By ronvalencia
Member since 2008 • 29612 Posts

@Shewgenja said:

Interesting observations. So, you twist the flux capacitor counter clockwise then insert the soloflange.

Note that RE7 PS4 pro has native 4K/60 fps result... perhaps another double rate FP16 boost game title...

http://www.eurogamer.net/articles/digitalfoundry-2017-resident-evil-7-face-off

Away from display-related issues the PC version is otherwise excellent, although it only provides a modest visual jump over the base PS4 and Xbox One outside of resolution. For the most part, the core graphical make-up at max settings is very close to the PS4 Pro version, with only the occasional additional refinement on show. If there are improvements in shadows quality and reflections, the game's heavy use of post-processing means that these don't stand out during gameplay, or in like-for-like footage, where the visuals appear very closely matched. Even with high-end effects enabled (such as HBAO+ and very high shadows) it certainly feels like the console versions are holding their own here. That said, the PC version manages to avoid the frequent texture switching issues that are present on PS4 and the Pro, which provides a more consistent presentation as these artefacts rarely manifest at all.

RE7 PS4 non-Pro has 1080p/mostly 60 fps result. PS4 Pro's version is effectively 4X over PS4's version!

PS4 Pro's double rate FP16 feature can easily contain HDR10 (10 bits).

Mantis Burn Racing PS4 Pro version reached 4K/60 fps which is 4X effectiveness over the original PS4 version's 1080p/60 fps.

Avatar image for DragonfireXZ95
DragonfireXZ95

26712

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#31  Edited By DragonfireXZ95
Member since 2005 • 26712 Posts

They aren't closing the gap. They are just bringing xbox console exclusives to PC. Lol.

But still, crossplay is a nice feature that allows you to only buy a game once(and even has some cross platform abilities).

Avatar image for ribhu672
ribhu672

173

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#32 ribhu672
Member since 2014 • 173 Posts

@ronvalencia: Numbers don't mean shit. Take rx 480 and gtx 1060, rx has better tflops than gtx 1060 still it doesn't beat it in most of the games.(except for few dx12 title which right now are so few in numbers that they don't even matter)

Avatar image for dynamitecop
dynamitecop

6395

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#33  Edited By dynamitecop
Member since 2004 • 6395 Posts

@ribhu672 said:

@ronvalencia: Numbers don't mean shit. Take rx 480 and gtx 1060, rx has better tflops than gtx 1060 still it doesn't beat it in most of the games.(except for few dx12 title which right now are so few in numbers that they don't even matter)

Numbers do mean shit when you're speaking of relative architecture and floating point performance attached to specific of same ilk hardware, the fact that you're trying to compare AMD and Nvidia cards in terms of Teraflops goes to show that you shouldn't be in this conversation to begin with.

Avatar image for thedork_knight
thedork_knight

2664

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#34 thedork_knight
Member since 2011 • 2664 Posts

I feel that with cross play coupled with scorpio and XBO with KB/M that usual pc only developers may start released what would be PC exclusives on xbox to test the water.

Avatar image for ribhu672
ribhu672

173

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#35 ribhu672
Member since 2014 • 173 Posts

@dynamitecop: Just making the point where everyone is concerned with teraflops and the big numbers. They are not sole reason for performance boost.

Avatar image for dynamitecop
dynamitecop

6395

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#36  Edited By dynamitecop
Member since 2004 • 6395 Posts

@ribhu672 said:

@dynamitecop: Just making the point where everyone is concerned with teraflops and the big numbers. They are not sole reason for performance boost.

They're not, but when you know the performance output of a specific manufacturers cards relative to Teraflops dating back multiple hardware generations and years, it's pretty easy to get an idea of where things fall.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#37  Edited By ronvalencia
Member since 2008 • 29612 Posts

@ribhu672 said:

@ronvalencia: Numbers don't mean shit. Take rx 480 and gtx 1060, rx has better tflops than gtx 1060 still it doesn't beat it in most of the games.(except for few dx12 title which right now are so few in numbers that they don't even matter)

Your post don't mean shit.

Any high TFLOPS claims will be bound by effective memory bandwidth!

Read http://gamingbolt.com/ps4-pro-bandwidth-is-potential-bottleneck-for-4k-but-a-thought-through-tradeoff-little-nightmares-dev

PS4 pro's 4.2 TFLOPS GPU is memory bandwidth bound!

You can compare FLOPS only under specific conditions.

Again, AMD's TFLOPS is not the problem!

From https://developer.nvidia.com/dx12-dos-and-donts

On DX11 the driver does farm off asynchronous tasks to driver worker threads where possible – this doesn’t happen anymore under DX12

Nvidia DX11 driver already using key DX12 style speed up methods i.e.

1. Asynchronous tasks

2. Threads i.e. more than one threads

Nvidia DX11 driver has at least 1.72X the draw call headroom.

Under Vulkan and DX12, AMD GPU nullifies NVIDIA's DX11 driver advantage.

https://www.computerbase.de/2016-07/doom-vulkan-benchmarks-amd-nvidia/

R9-Fury X's raw 8.6 TFLOPS exposed with Vulkan+Async Compute+AMD Intrinsics+No Nvidia tessellation over draw politics (it wouldn't exist on consoles).

Async Compute workloads are usually out of phase with Sync Graphics Command's memory bandwidth access.

980 Ti's 6.4 TFLOPS gap with Fury X's 8.6 TFLOPS is 1.34X

Doom Vulkan's framerate gap between 980 Ti and Fury X is 1.30X

AMD GCN hardware is fine, but the AMD DX11/OGL driver is not on par with NVidia's DX11/OGL driver.

When there's sufficient shader (FLOPS) resources, the major bottleneck is the effective memory bandwidth i.e. ALUs needs to read and write the results to memory.

This is why AMD is pushing for HBM2 and why NAVI will get a new memory design..

For Vega GPUs, read http://www.anandtech.com/show/11002/the-amd-vega-gpu-architecture-teaser/3

ROPs & Rasterizers: Binning for the Win(ning)

We’ll suitably round-out our overview of AMD’s Vega teaser with a look at the front and back-ends of the GPU architecture. While AMD has clearly put quite a bit of effort into the shader core, shader engines, and memory, they have not ignored the rasterizers at the front-end or the ROPs at the back-end. In fact this could be one of the most important changes to the architecture from an efficiency standpoint.

Back in August, our pal David Kanter discovered one of the important ingredients of the secret sauce that is NVIDIA’s efficiency optimizations. As it turns out, NVIDIA has been doing tile based rasterization and binning since Maxwell, and that this was likely one of the big reasons Maxwell’s efficiency increased by so much. Though NVIDIA still refuses to comment on the matter, from what we can ascertain, breaking up a scene into tiles has allowed NVIDIA to keep a lot more traffic on-chip, which saves memory bandwidth, but also cuts down on very expensive accesses to VRAM.

For Vega, AMD will be doing something similar. The architecture will add support for what AMD calls the Draw Stream Binning Rasterizer, which true to its name, will give Vega the ability to bin polygons by tile. By doing so, AMD will cut down on the amount of memory accesses by working with smaller tiles that can stay-on chip. This will also allow AMD to do a better job of culling hidden pixels, keeping them from making it to the pixel shaders and consuming resources there.

As we have almost no detail on how AMD or NVIDIA are doing tiling and binning, it’s impossible to say with any degree of certainty just how close their implementations are, so I’ll refrain from any speculation on which might be better. But I’m not going to be too surprised if in the future we find out both implementations are quite similar. The important thing to take away from this right now is that AMD is following a very similar path to where we think NVIDIA captured some of their greatest efficiency gains on Maxwell, and that in turn bodes well for Vega.

Meanwhile, on the ROP side of matters, besides baking in the necessary support for the aforementioned binning technology, AMD is also making one other change to cut down on the amount of data that has to go off-chip to VRAM. AMD has significantly reworked how the ROPs (or as they like to call them, the Render Back-Ends) interact with their L2 cache. Starting with Vega, the ROPs are now clients of the L2 cache rather than the memory controller, allowing them to better and more directly use the relatively spacious L2 cache.

---------------------

As for RX-480, any overclock editions will be bounded by effective memory bandwidth.

For reference RX-480

((((256 bit x 8000Mhz) / 8) / 1024) x Polaris's 77.6 percent memory bandwidth efficiency) x Polaris's compression booster 1.36X = 263.84 GB/s

--

Scorpio's "more than 320 GB/s memory bandwidth" claim.

((((384 bit x GDDR5-6900 Mhz) / 8) / 1024) x Polaris's 77.6 percent memory bandwidth efficiency) x Polaris's compression booster 1.36X = 341 .34 GB/s

PS;

((384 bit x GDDR5-6900 Mhz) / 8) / 1024) = 323 GB/s physical memory bandwidth.

((384 bit x GDDR5-7000 Mhz) / 8) / 1024) = 328 GB/s physical memory bandwidth.

Comparison.

The memory bandwidth gap between Fury X and R9-290 = 1.266X (random textures)

With Fury X, it's memory compression is inferior to NVIDIA's Maxwell.

The FLOPS gap between Fury X and R9-290 = 1.48X

The frame rate gap between R9-290X and Fury X is 1.19X.

Random texture memory bandwidth gap's 1.266X factor is closer to frame rate gap's 1.19X.FLOPS gap between R9-290X (5.8 TFLOPS)and Fury X (8.6 TFLOPS) plays very little part with frame rate gap.

With 980 Ti (5.63 TFLOPS), it's superior memory compression enables it to match Fury X's results.

Conclusion:

1. When there's enough FLOPS for a particular workload, effective memory bandwidth is better prediction method for higher grade GPUs.

2. The effective memory bandwidth between Fury X and 980 Ti are similar hence similar results.

-------------------

Example of near brain dead Xbox One ports running PC GPUs.

Frame rate difference between 980 Ti and R9-290X is 1.31X with Forza 6 Apex

Effective memory bandwidth between 980 Ti and R9-290X is 1.38X

Forza 6 Apex is another example for effective memory bandwidth influencing the frame rate result.

--------------

Another examples with near brain dead Xbox One ports running on PC GPUs.

R9-390X has 5.9 TFLOPS

GTX 980 Ti has 5.63 TFLOPS

GTX 1070 has 6.4 TFLOPS

R9-390X being near NVIDIA GPUs with similar TFLOPS range.

R9-290X has 5.6 TFLOPS

980 Ti has 5.63 TFLOPS.

R9-390X being near NVIDIA GPUs with similar TFLOPS range.

Avatar image for ribhu672
ribhu672

173

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#38  Edited By ribhu672
Member since 2014 • 173 Posts

@ronvalencia: alright I get what you're saying but this shows that amd only perform better in dx12 titles and all the dx12 titles will favour amd but that's the problem there are very few dx12 games out there.

Avatar image for Gatygun
Gatygun

2709

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#40  Edited By Gatygun
Member since 2010 • 2709 Posts

@ronvalencia said:
@Shewgenja said:

Interesting observations. So, you twist the flux capacitor counter clockwise then insert the soloflange.

Note that RE7 PS4 pro has native 4K/60 fps result... perhaps another double rate FP16 boost game title...

http://www.eurogamer.net/articles/digitalfoundry-2017-resident-evil-7-face-off

Away from display-related issues the PC version is otherwise excellent, although it only provides a modest visual jump over the base PS4 and Xbox One outside of resolution. For the most part, the core graphical make-up at max settings is very close to the PS4 Pro version, with only the occasional additional refinement on show. If there are improvements in shadows quality and reflections, the game's heavy use of post-processing means that these don't stand out during gameplay, or in like-for-like footage, where the visuals appear very closely matched. Even with high-end effects enabled (such as HBAO+ and very high shadows) it certainly feels like the console versions are holding their own here. That said, the PC version manages to avoid the frequent texture switching issues that are present on PS4 and the Pro, which provides a more consistent presentation as these artefacts rarely manifest at all.

RE7 PS4 non-Pro has 1080p/mostly 60 fps result. PS4 Pro's version is effectively 4X over PS4's version!

PS4 Pro's double rate FP16 feature can easily contain HDR10 (10 bits).

Mantis Burn Racing PS4 Pro version reached 4K/60 fps which is 4X effectiveness over the original PS4 version's 1080p/60 fps.

Resident evil 7 on the pro only runs at 1260p and gets upscaled to 4k, its a resolution boost but nowhere near 4k resolution.

Game runs at around ~80 fps constantly on a 970 1440p on my setup. Game hardly is taxing for any modern gpu.

Mantis burn racing, have you seen the game? thing would run on a potato 4k resolution. they probably locked it at 1080p as people would aspect it to be on that resolution. I can't see how that game couldn't come close to 4k already on a base ps4, unless it didn't support the output for it.

Also at vulcan it's a AMD focused API, it's bound to run better on that. doesn't mean a whole lot to this discussion.

Like i said before, fp16 is going to be a lot more expensive to develop for, as it will take a lot more time to get things going. and if it's going to work for complex games further down the line is the question.

Your statement about double the tflops is just false as it's not going to push anywhere near double the performance in any new game.

If fp16 was that great, we would already have had it in PC's decades ago. There is a reason why we didn't ( or it didn't succeeded ).

That nvidia and amd bolts those 16 fp onto there hardware again, is probably nothing else then boasting about theoretical peformance gain to sell there crap which is nothing new for them. Rather then useful advantage.

Avatar image for SecretPolice
SecretPolice

45609

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#41 SecretPolice
Member since 2007 • 45609 Posts

For gamers, by gamers, jazz. :P

Avatar image for LegatoSkyheart
LegatoSkyheart

29733

Forum Posts

0

Wiki Points

0

Followers

Reviews: 16

User Lists: 1

#42 LegatoSkyheart
Member since 2009 • 29733 Posts

@GarGx1 said:

Not really wanting to burst your bubble but....

Halo 1 and 2 were on PC, it's not a new demographic.

As for cross play, console gamers and their controllers are not going to be wanting to play against PC gamers with m/kb set ups for very long. There's only so many times a person can categorically lose before they stop playing.

Scorpio is only a rumour right now but even if those rumours are true my PC built in 2014/15 is already more powerful.

The only genre I see this being an issue with are First Person Shooters.

But MMOs and Fighters seem to do pretty well with the Cross Platform gaming. Heck even Co-Op games like Rocket League benefit for having Cross Platform Multiplayer.

First Person Shooters is a given taken that Controllers in general don't have the same precision that a mouse would have, but I don't see why other games have to be separated.

Avatar image for NathanDrakeSwag
NathanDrakeSwag

17392

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#43 NathanDrakeSwag
Member since 2013 • 17392 Posts

@makemefamous07 said:

Microsoft is loosing everything from exclusives being cancelled like Scalebound, Fable Legends etc. To games selling poorly like Gears 4. MS new IPs selling poorly cuz Phil Spencer Half Ass the games like Recore & never to see a sequel cuz Phil Spencer is a cheap ass. The bad media is killing the Xbox with 500,000 plus of people listening to how bad MS is doing with bad articles & Youtube videos

The XB1 is WAY behind the PS4 by 25 million consoles. Imagine how worse it will be by the time Scorpio comes out...sweet jesus. The PS4 has exclusives with mega hype along with the Nintendo Switch which is getting huge hype & SuperBowl Ads. The consoles will be cheap during the holidays along with the PS4 Pro bundled with games which will HURT Scorpio which WILL be $500 or $600. The Xbox S 2TB model was $400 so Scorpio will be much more expensive. Scorpio can't compete with the cheaper consoles, along with PS4/ Switch Exclusives.

The Xbox launched in 2001 after the PS2 came out & did not make a dent on Sony. Now MS is in worse shape then ever thx to Phil Axemen Spencer as i've stated. Xbox is in big trouble unless they do what they have to & get rid of Phil Spencer the stupid idiot or else the future looks like shit for Xbox cuz of PHIL!!

It is actually behind by 31 million now but the rest of your post is spot on.

Avatar image for GarGx1
GarGx1

10934

Forum Posts

0

Wiki Points

0

Followers

Reviews: 4

User Lists: 0

#44 GarGx1
Member since 2011 • 10934 Posts

@LegatoSkyheart said:
@GarGx1 said:

Not really wanting to burst your bubble but....

Halo 1 and 2 were on PC, it's not a new demographic.

As for cross play, console gamers and their controllers are not going to be wanting to play against PC gamers with m/kb set ups for very long. There's only so many times a person can categorically lose before they stop playing.

Scorpio is only a rumour right now but even if those rumours are true my PC built in 2014/15 is already more powerful.

The only genre I see this being an issue with are First Person Shooters.

But MMOs and Fighters seem to do pretty well with the Cross Platform gaming. Heck even Co-Op games like Rocket League benefit for having Cross Platform Multiplayer.

First Person Shooters is a given taken that Controllers in general don't have the same precision that a mouse would have, but I don't see why other games have to be separated.

As I said in a previous reply, this is a Halo 6 thread with some extra dressing, thus the comment about KB/M v controllers

Avatar image for blue_hazy_basic
blue_hazy_basic

30854

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#45 blue_hazy_basic  Moderator
Member since 2002 • 30854 Posts

Brotherhood with lemmings?

Loading Video...

Avatar image for xxyetixx
xxyetixx

3041

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#46 xxyetixx
Member since 2004 • 3041 Posts

@GarGx1: that's and easy fix though. You can either A.) let Xbox users use Kb mouse, or B.) have a separate game mode for cross play on PC that forces the PC players to use a controller.

Either one is fine with me.

Avatar image for ronvalencia
ronvalencia

29612

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#47  Edited By ronvalencia
Member since 2008 • 29612 Posts

@Gatygun said:
@ronvalencia said:
@Shewgenja said:

Interesting observations. So, you twist the flux capacitor counter clockwise then insert the soloflange.

Note that RE7 PS4 pro has native 4K/60 fps result... perhaps another double rate FP16 boost game title...

http://www.eurogamer.net/articles/digitalfoundry-2017-resident-evil-7-face-off

Away from display-related issues the PC version is otherwise excellent, although it only provides a modest visual jump over the base PS4 and Xbox One outside of resolution. For the most part, the core graphical make-up at max settings is very close to the PS4 Pro version, with only the occasional additional refinement on show. If there are improvements in shadows quality and reflections, the game's heavy use of post-processing means that these don't stand out during gameplay, or in like-for-like footage, where the visuals appear very closely matched. Even with high-end effects enabled (such as HBAO+ and very high shadows) it certainly feels like the console versions are holding their own here. That said, the PC version manages to avoid the frequent texture switching issues that are present on PS4 and the Pro, which provides a more consistent presentation as these artefacts rarely manifest at all.

RE7 PS4 non-Pro has 1080p/mostly 60 fps result. PS4 Pro's version is effectively 4X over PS4's version!

PS4 Pro's double rate FP16 feature can easily contain HDR10 (10 bits).

Mantis Burn Racing PS4 Pro version reached 4K/60 fps which is 4X effectiveness over the original PS4 version's 1080p/60 fps.

Resident evil 7 on the pro only runs at 1260p and gets upscaled to 4k, its a resolution boost but nowhere near 4k resolution.

Game runs at around ~80 fps constantly on a 970 1440p on my setup. Game hardly is taxing for any modern gpu.

Mantis burn racing, have you seen the game? thing would run on a potato 4k resolution. they probably locked it at 1080p as people would aspect it to be on that resolution. I can't see how that game couldn't come close to 4k already on a base ps4, unless it didn't support the output for it.

Also at vulcan it's a AMD focused API, it's bound to run better on that. doesn't mean a whole lot to this discussion.

Like i said before, fp16 is going to be a lot more expensive to develop for, as it will take a lot more time to get things going. and if it's going to work for complex games further down the line is the question.

Your statement about double the tflops is just false as it's not going to push anywhere near double the performance in any new game.

If fp16 was that great, we would already have had it in PC's decades ago. There is a reason why we didn't ( or it didn't succeeded ).

That nvidia and amd bolts those 16 fp onto there hardware again, is probably nothing else then boasting about theoretical peformance gain to sell there crap which is nothing new for them. Rather then useful advantage.

Resolution is 2240x1260 which has 2,822,400 pixels.

This is why prefixed it with the word perhaps.

Avatar image for putaspongeon
PutASpongeOn

4897

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 5

#48 PutASpongeOn
Member since 2014 • 4897 Posts

All it does is give all the xbox one games to pc, congrats on making the xbox one irrelevant.

Avatar image for FLOPPAGE_50
FLOPPAGE_50

4500

Forum Posts

0

Wiki Points

0

Followers

Reviews: 1

User Lists: 0

#49 FLOPPAGE_50
Member since 2004 • 4500 Posts

The Future of Xbox is going to be great.


We're going to see some PC only exclusives head towards Xbox only.. this is what MS is doing long term.

Avatar image for uninspiredcup
uninspiredcup

62689

Forum Posts

0

Wiki Points

0

Followers

Reviews: 86

User Lists: 2

#50 uninspiredcup
Member since 2013 • 62689 Posts

Microsoft have been trying to turn pc gaming into an xbox for at least a decade.

Nobody cares about Xbox ports or their shitty windows steam.