@Wasdie said:
@spitfire-six said:
@Wasdie: Correct me if I'm wrong as Ive said Im still learning this stuff. It sounds like theoretically you can write to both sets of ram. They are not independent of each other. Does this mean that you could write certain aspects of the frame buffer into esram things that need to stream faster, and things that need to read slower through the ddr3?
Well yeah, that's exactly why they have that 32mbs of fast ram. The problem is that 32mbs isn't enough for a 1080p image, especially when you need 60 frames per second. It's a straight up hardware bottleneck. They should have gone with GDDR 5 system ram like Sony did if they wanted 1 pool of unified memory. They could have done a more PC like architecture and gone like 2 gbs of GDDR5 vram (more than enough for 1080p) and then like 6 gbs of DDR3 system ram.
Ok I was under the impression that you don't have to write everything to just the 32mb. The example of a frame buffer I saw it had specific things targeted to specific ram addresses. I figured you should be able to write certain objects such as background textures, sky's, and etc to the ddd 3 ram and save the 32mb for the things in the foreground as well as updating the camera position. Ill also find this post where a developer from trials gave an example of writing a frame buffer that would fit in esram
edit: posted by Sebbi
"MJPs g-buffer layout is actually only two RTs in the g-buffer rendering stage and one RT in the lighting stage. And a depth buffer of course. Quite normal stuff.
On GCN you want to pack your data to 64 bpp (4 x 16 bit integer) render targets because that doubles your fill rate compared to using more traditional 32 bpp RTs (GCN can do 64 bit filling at same ROP rate as 32 bit filling).
I assume that the packing is like this:
Gbuffer1 = normals + tangents (64 bit)
Gbuffer2 = diffuse + brdf + specular + roughness (64 bits)
Depth buffer (32 bits)
Without any modifications this takes 40 megabytes of memory (1080p).
The lighting step doesn't need extra 8 MB for the 4x16f RT, because compute shader can simultaneously read and write to the same resource, allowing you to to lighting "in-place", writing the output over the existing g-buffer. This is also very cache friendly since the read pulls the cache lines to L1 and the write thus never misses L1 (GCN has fully featured read & write caches).
It's also trivial to get this layout down to 32 MB from the 40 MB. Replace gbuffer1 with a 32 bit RT (32 MB target reached at 1080p). Store normal as 11+11 bit using lambert azimuth equal area projection. You can't see any quality difference. 5+5 bits for tangents is enough (4 bits for exponent = mip level + 1 bit mantissa). 11+11+5+5=32. Also if you only use the tangents for shadow mapping / other planar projections, you don't need them at all, since you can analytically calculate the derivatives from the stored normal vector.
This layout is highly efficient for both g-buffer rendering and lighting. And of course also for post processing since all your heavy data fits in the fast memory. Shadow maps obviously need to be sampled from main memory during the lighting, but this is actually a great idea since the lighting pass woudn't otherwise use any main memory BW at all (it would be completely unused = wasted)."
Posted in this thread: http://forum.beyond3d.com/showthread.php?p=1830129#post1830129
Log in to comment