The Larrabee is Intel's new graphics chip and this appears to be very...promising...
Larrabee, faster graphics, and real-time ray tracing (RTRT)
Intel employee Daniel Pohl was on hand at last week's IDF to give demonstrations of a version of Quake 4 that uses real-time ray tracing running on Intel hardware. Charlie at the Inquirer managed to catch the demo, and he published an account of it this morning that attempts to get at where the GPU is eventually headed as a product.
Pohl is the German computer science student behind the ray-traced versions of Quake 3 and 4 that have been featured on Digg and Slashdot. For his masters' thesis, he built a version of Quake 4 that uses real-time ray tracing to achieve some pretty remarkable effects-shadows are correctly cast and rendered in real-time, water has the proper reflections, indirect lighting looks like it's supposed to, etc. He was later hired by Intel, and now he's working within their graphics unit on real-time ray tracing for games.
We've covered Intel's ray tracing research in the past, and it's there's no doubt that Intel is serious about bringing this technology to real-time 3D games. The real questions, however, concern what kind of ray-tracing will be used, to what extent, and to what effect. I'll do my best to untangle these questions briefly by relying on email feedback and exchanges from folks who know much more than I about this issue but who've asked not to be named.
When non-graphics people hear "ray tracing," the think of the kind that takes hours per frame to render. This kind of ray tracing is called "global illumination," and it involves computing the paths of rays that come directly from the light sources in a scene (direct rays) and rays that reach the viewer as a result of reflections (indirect rays). Doing ray tracing for the former types of rays can be hard, but it's not nearly has difficult as doing both types at once. Because of the difficulty of solving the global illumination problem for a scene, most ray tracing involves various tricks and approximations for simulating the effects of indirect rays.
The problem with the tricks and approximations are that the indirect rays are what make the scene look realistic in ways that standard raster graphics can't accomplish. If you're only doing direct rays, then ray tracing has no visual advantage over rasterization.
The limited types of non-diffuse "indirect rays" traced in the Quake 4 demo (multibounce specular reflections and glass refraction) have computational demands that are similar to calculating simple eye rays (i.e., they have a high degree of coherence, so they can be calculated using wide SIMD bundles). These effects do indeed look better than rasterized knockoffs, but the tradeoff is that they're still very computationally intense, and the overall look of the game engine isn't really that much more photorealistic than what you can do with shaders.
Therein lies the drawback to the Quake 4 demo: because of the demands of the ray-tracing engine, Pohl had to swap out many of the game's detailed textures in favor of reflective surfaces that exploit ray tracing, because there was no horsepower left over for texturing. It's also the case that the demo required four quad-core machines ganged together, and even then it didn't run at a playable framerate. So those reflective and refractive effects are nice, but they're not worth that kind of horsepower requirement, especially when you compare the overall look of the resulting game to what single G80 can do with the right coding.
Comparison with the Cell Broadband Engine
Larrabee's philosophy of using many small, simple cores is similar to the ideas behind the Cell processor. There are some further commonalities, such as the use of a high-bandwidth ring bus to communicate between cores.[7]However, there are many significant differences in implementation which should make programming Larrabee simpler.
- The Cell processor includes one main processor which controls many smaller processors. Additionally, the main processor can run an operating system. In contrast, all of Larrabee's cores are the same, and the Larrabee is not expected to run an OS.
- Each compute core in the Cell (SPE) has a local store, for which explicit (DMA) operations are used for all accesses to DRAM. Ordinary reads/writes to DRAM are not allowed. In Larrabee, all on-chip and off-chip memories are under automatically-managed coherent cache hierarchy, so that its cores virtually share a uniform memory space through standard load/store instructions.[7].
- Because of the cache coherency noted above, each program running in Larrabee has virtually a large linear memory just as in traditional general-purpose CPU; whereas an application for Cell should be programmed taking into consideration limited memory footprint of the local store associated with each SPE (for details see this article) but with theoretically higher bandwidth.
- Cell uses DMAfor data transfer to/from on-chip local memories, which has a merit in flexibility and throughput; whereas Larrabee uses special instructions for cache manipulation (notably cache eviction hints and pre-fetch instructions), which has a merit in that it can maintain cache coherence (hence the standard memory hierarchy) while boosting performance for e.g. rendering pipelines and other stream-like computation.[7].
- Each compute core in the Cell runs only one thread at a time, in-order. A core in Larrabee runs up to four threads. Larrabee's hyperthreading helps hide latencies and compensates for lack of out-of-order execution.
Intel has purchased Project Offset and plans to use it as a show game for the Larrabee
(old screens of project offset)
Log in to comment