@04dcarraher said:
Farcry 5 is one of those games where memory bandwidth does play a role. But there are multiple things that allow the X1X to perform as it does with Farcry 5. First is its customized settings and assets while some settings look like low, or high or ultra as on PC , they may not even be exact. ie "“slightly blurrier detail”. so it not equal to so less detail equals less demanding... The X1X gpu has around 300GB/s of memory bandwidth while both the 1060 and 580 have only 192gb/s and 256gb/s.
For the hell of it I tested FC5 on my GTX 1080 at multiple clockrates and settings at 4k.
at 1.125 ghz (5.7 TFLOP) it got 25/30/37 results at DF suggested X1X settings.
at 1.4 ghz (7.1 TFLOP) 31/36/45 results at DF suggested X1X settings. Max settings 30/35/44. Then I knocked the GDDR5x down to 9000mhz ( 285GB/s) it got 29/34/42.
. X1X extra memory bandwidth helps in few cases not all the time. X1X's gpu performance can range wildly because of its cpu as well.
There is no way we will see TR at 60 FPS at 4k without having compromises.
At 4K resolution, it's mostly GPU bound instead of CPU bound.
X1X's 32 ROPS bottleneck wasn't a large factor since FC5 is AMD optimised title i.e. TMUs for significant read/write operations and X1X's GPU has extra memory bandwidth when compared to RX-580.
https://blogs.msdn.microsoft.com/directx/2018/03/19/announcing-microsoft-directx-raytracing/
You may have noticed that DXR does not introduce a new GPU engine to go alongside DX12’s existing Graphics and Compute engines. This is intentional – DXR workloads can be run on either of DX12’s existing engines. The primary reason for this is that, fundamentally, DXR is a compute-like workload. It does not require complex state such as output merger blend modes or input assembler vertex layouts. A secondary reason, however, is that representing DXR as a compute-like workload is aligned to what we see as the future of graphics, namely that hardware will be increasingly general-purpose, and eventually most fixed-function units will be replaced by HLSL code. The design of the raytracing pipeline state exemplifies this shift through its name and design in the API. With DX12, the traditional approach would have been to create a new CreateRaytracingPipelineState method. Instead, we decided to go with a much more generic and flexible CreateStateObject method. It is designed to be adaptable so that in addition to Raytracing, it can eventually be used to create Graphics and Compute pipeline states, as well as any future pipeline designs.
AMD's optimisation such as forward plus render via compute shaders reduces output merger blend modes path (this is ROPS path).
Depending on output merger blend modes (ROPS path) usage percentage, RX-580 can land close to GTX 1070 or land close to 1060. X1X's version has RX-580's behaviour with more bandwidth and 2 MB render cache for 32 ROPS bottleneck reduction, but it's not the complete solution as GTX 1070's 64 ROPS with 2MB L2 cache.
In crypto-currency, ROPS path is not used for read/write operations i.e. TMUs are used for read/write operations.
TFLOPS is nothing without the associated read/write units i.e. it's TMUs or ROPS.
This is applicable for Vega 64 LC reaching close to Titan X Pascal.
X1X's result is about half of GTX 1080 Ti or Titan XP.
Log in to comment