Applications that already use SSE can easily be extended to use AVX. For instance LLVM, which is used by one of SwiftShader's back ends, already adds support for AVX so developers can prepare their code for AVX even before the hardware is sold.
LLVM does not magically convert serial code into parallel code.
Thanks but I already read that back in June. Note that what started the whole discussion is a paper by Intel that concluded the following:
"In the past few years there have been many studies claiming GPUs deliver substantial speedups (between 10X and 1000X) over multi-core CPUs on these kernels. To understand where such large performance difference comes from, we perform a rigorous performance analysis and find that after applying optimizations appropriate for both CPUs and GPUs the performance gap between an Nvidia GTX280 processor and the Intel Core i7-960 processor narrows to only 2.5x on average."
Same issues with NVIDIA marketing. Anyway, GTX280 is already EOL. Such deduction irrelevant to this topic i.e. running current games.
Although you have to take that with a grain of salt as well, it is definitely correct that GPUs are not orders of magnitude faster than CPUs. A lot of GPGPU benchmarks shamelessly compare GPU code that has been worked on for months, with C code they haven't even bothered to enable SSE2 optimizations in the compiler for, let alone makes use of intrinsics.
I already know this e.g. I have posted PhysX X87 mess in this forum.
I fully agree with the Beyond3D forum members that gather/scatter is still lacking. But there really isn't anything stopping them from adding it to AVX in due time. And when that happens the raw performance advantage of the GPU won't suffice to achieve superior performance in a wide range of applications.
Unlike AMD, both Intel and NVIDIA doesn't havethe balance view. Only AMD Bulldozer has AVX on FMA hardware.
No. The topic for this particular side discussion was HPC.
No. before this topic was side tracked, the mainissue is about running the current games.
But if you really want to talk about the consumer market; a Phenom II X6 1055T costs as little as 179 USD. This will also buy you a GeForce GTX 460. DP performance is 67.2 versus 75.6 GFLOPS respectively, and for SP it's 134.4 versus 907.2 GFLOPS. A nice lead for the GPU in SP performance, but in practical applications with DP calculations the CPU will always outperform the GPU.
At this time, DP is almost useless in running most of the heavy workload type in the current games. If "CPU is so good", how come Ghostbuster's raytacing pass is done on GpGPU? I would like see it run on my 45 watt Intel Core i7QC Mobile btw.
AMD Phenom II X6 1055T's 134.4 GFLOPs consumes 125 or 95 watts while AMD Mobility Radeon HD5730's 520 GFLOPS consumes 26 watts. AMD Phenom II X4 mobile consumes around 35 watts to 45 watts.
As early as January 9'th, you'll be able to buy a quad-core Sandy Bridge CPU that delivers 198.4 SP and 99.2 DP GFLOPS for only 184 USD. And lets not forget, this CPU has a 35% lower TDP than the GTX 460. So even for the theoretical numbers the GPU is not the clear winner. It will take FMA and gather/scatter support for the CPU to really catch up, but it's pretty clear that the GPU is not unrivaled.
In pure GPU GFLOPs (e.g. SGEMM,DGEMM) race, refer to AMD Radeon HD GPUs instead.
Also note that it's not actually fair to compare the CPU against the GPU at price alone, because you always need a CPU anyway. Heck, you need a CPU to make your GPU worth anything at all! So we should probably compare a system with a 300$ CPU against a system with a 100$ CPU and a 200$ GPU. I'll leave that excercise up to you.
Funny, i bought a laptop with bothIntel Core i7 Quad CPU and AMD Mobility Radeon HD 5730 GpGPU. I didn't skimp on CPU side for a dual core mobile part.
In conclusion the GPU will always be faster at specific applications, but the results drop rapidly when people start running a wider range of applications that the GPU wasn't designed to run. To achieve better performance at more generic tasks, they have no other choice but to implement CPU features, which costs computing density. Silicon is silicon and they play by the same rules. So in the long run it's inevitable that they converge.
Hardware GPU doesn't have to worry about legacy ISA i.e.Radeon HD "Cayman" includes a new VLIW4based cores.
Please try SwiftShader on a fast Core i7 with Crysis settings on 'high' (I don't know what those 'custom' settings are, I used 'high' for them as well). I'm getting an average of 3.5 FPS for the benchmark (on the second run - you have to let the shader caches warm up). That's not a lot but the HD Graphics doesn't appear to do much better.
It's better on performance per watts i.e. it doesn't consume 130 watts to do it.
Intel Core i7-740 Mobile QC (45 watts) is about 68 percent of Intel Core i7-920 QC (130 watts).
So AVX with FMA and gather/scatter could totally render the IGP useless.
It depends on
1. performance per watt,
2. who made the IGP, AMD Fusion APU includes GpGPU i.e. includes 480 stream processor version.
3. bill of materials.
For pure Intel solutions, it doesn't matter if Intel removes it's IGP or use AVX since it's all in one chip anyway.
For my laptop, CPU+GPU has combined power consumption about 71 watts.
Your pure Intel Core i7-920 @130 watts idea is inferior this current setup. Factor in Core i7 6x0 Mobile with dual core and Intel IGP's 35 watt power consumption.
The future battles would be AMD Fusion (CGPU)vs Intel Sandybridge (CGPU).
Log in to comment