Forum Posts Following Followers
2048 21 22

usefulidiot316 Blog

What's with the r500s?

I've come down to two theories as to why Nvidia made the G71 a 24 pipeline solution instead of the 32 pipeline solution that they're more than capable of doing. It's either because the G71 was designed for the PS3, and the desktop version just happened to be cheap; or because a 32 pipe solution wouldn't be as cost effective when compared to a 24 pipe solution like the current G71. A 32 pipeline G71 would cost $500 for a 7900gt, rather than just $300, and since it would use so many transistors, wouldn't clock very high or offer a large performance increase.

What I can't figure out is why the r520 and r580 are designed the way they are. A 16 pipeline r520 should have fewer transistors than a 24 pipeline G70, right? Apparently having 8 fewer pipelines actually adds another 18 million transistors to the die. The extra performance on the r520 actually comes from a high clock speed due to a small 90nm process, compared to 110nm on the G70, and because of the high memory bandwidth on the x1800xt. The advantage is taken away, though, when the 7800gtx 512MB shows up with slightly faster memory, slightly slower core, and about a 30% increase in performance while still using fewer transisitors. When you shrink the die on the G70 and give it that extra bandwith, like with the G71, you have still fewer transistors and a 30% performance gain, but this time while using less electricity and a smaller heatsink.

The r580 carries the same problem, but this time it also has triple the number of pixel shaders. Now, you would think that increasing the number of pixel shaders by 300% would double the performance at the very least. Instead, all the extra shaders just use up more transistors, making the chip more expensive and hotter than it's predecessor while only adding a 30% performance increase. When compared to it's competition, it offers slightly more performance while using a lot more electricity and producing a lot more heat with it's 100 million extra transistors. Someone once told me that ATI didn't want to have to increase the number of TMUs because they were all ready 1.5x as effective as the previous generation, so that 16 TMUs from the r580 is just as good as 24 from the competition. If that were true, then why not increase the number of TMUs to maximize performance? And why is it that all the pixel shaders off no performance increase over the competition when paired with 16 TMUS that act like 24 of the competitions?

The only advantages the r500 series cards have over their Nvidia counterparts is faster, higher quality AF, which doesn't really drop performance that much, and AA, while it's really hard to tell the different between 2xAA and anything higher than that. They also have the ability to use AA while using HDR in all 3, count them, 3 games that support it, and 3dMark06. Then, there is Avivo. All the benchmarks I've seen say PureVideo is superior, and even if that changes in the future, I don't see spending $300 for a video card to tell your friends about how pretty your DVDs look.  Then, people keep saying how future proof they are, but what do they offer over the G80 and r600 generations?  Furture games will use more pixel shaders?  Says who, other than ATI?  DirectX 10 requires 1 TMU for every pixel shaders, not 1 TMU for every 3 pixel shaders.  Will the r580 offer better performance on DX10 games?  What's the point in that?

I wish, for ATI, that the problems were only with the r520 and r580 high end cards, but all their cards have these problems, too. The rv530 shares the same problems as the r580, but in the midrange. 12 pixel shaders doesn't come close to beating 12 pixel pipelines, and really has a hard time beating the 8 pixel pipelines on the 6600gt. The x1800gto would be worth something, if it were cheaper and could overclock as well as the 7600gt. The x1300 is really the only thing that beats the competition in everything, at least until the 6600gt goes down further in price.

I'm just having trouble in seeing why ATI thought the architecture for the r580 and r520 and everything below was a good idea.

7600 and 7900

Well, I was planning on the 7600s being 16 pipe cards clocked at 600mhz or so, and the 7900s being 32 pipe cards clocked at around 700mhz.  The high power of the 7600s and the huge jump to the 7900s would make the 7800s obsolete.  Now I find out that the 7900s are 24 pipe cards at 650mhz and the 7600 is a 12 pipe card at 550mhz, and both cards will be priced pretty high, at least when compared to the 7800s.  The 7600, in fact, should be less powerful, but just as expensive as a 7800gt when it's released, both costing around $250.  Why would anyone want to buy a 7600 then, you ask?  That's a good question, and one I don't plan on finding an answer to, now that I plan on getting a 7800gt to last me until DirectX 10 is implimented into video cards.

Though I might go ahead and switch over to ATI, if they make a card that has a good price/performance ratio, like the 7800gt.

x1800xt compared to 7800gtx

I've been thinking a lot about how ATI claims their x1800xt can beat the 7800gtx in any benchmark. I've heard rumors that ATI have increased the effectiveness of their pipelines 30% more than their previous generation. I've also read on the Inquirer that the x1800 is to be clocked at 625/1500. Since the x800xl was clocked at 400/1000 to compete with the 6800gt clocked at 350/1000, and still didn't win a whole lot of benchmarks, the GeForce 6 series cards had over 14% more efficiency per pipeline with it's GeForce 6 series over the x800 series. Putting those numbers together, the x1800 series should have a 14% increase per pipeline when compared to the GeForce 6 series. To compete with the x1800xt, a 6800gt or ultra would have to be clocked at 1112/1000, assuming that memory speed is as influencial as core speed in this hypothetical benchmark.

Anyway, looking at benchmarks between the 7800gtx and 6800ultra using Doom3 with highest settings and resolution I can find(Anandtech has 1 with Doom3 4xAA 2048x1536), I've estimated the 7800gtx to be only 181% as powerful as the 6800ultra. Since the 6800ultra has 16 pipelines and the 7800gtx has 24, the 7800gtx has a 21% gain in effectiveness per pipeline over the previous generation. To compete with a 7800gtx, a 6800gt or ultra would have to clocked at 778/1200, or 934/1000.

When comparing the reference 7800gtx with the reference x1800xt, it's 934 to 1112. In reference designs, the x1800xt has the lead. I've heard rumors that ATIs press conference reguarding the r520 featured a way for guests to benchmark the new cards against Nvidias competition card. To compete with the x1800xt, the 7800gtx would have to be clocked at 511/1200, and since some of the EVGA and XFX cards are clocked at 490/1300, the core would have to clocked at 472/1300 to compete. Apparently, the 7800gtx cards at the ATI conference are reference designes, and the x1800xt can't compete with the higher clocked 7800gtx models, and ATI knows it. To compete with the overclocked 7800gtx cards, x1800xt cards would have to be 675/1500, which it should be able to, since it's using a 90nm process and a huge dual slot cooler.

Then again, the 7800gtx can still overclock a little beyond 490/1300. In fact, without volt modding, you should be able to get 525/1400. That should compete with an x1800xt clocked at 780/1500. The memory on the r520 is supposed to be able to clock as high as 1600mhz, but no higher. If that's true, the r520 would only have to be clocked at 740/1600 to compete with the max clock on a 7800gtx. Both should be able to get the same benchmark results with a max overclock. Without the overclock, the x1800xt beats the reference designs of the 7800gtx, but not the higher clocked models. The x1800xt is to be priced at around $600, while the XFX overclocked 7800gtx is $500. Which would you rather have?

Then again, how many programs do you know that require as much memory bandwith as it requires in pixel power? If you ignore the memory speed, the 7800gtx has a theoretical pixel fill rate of 10320 million pixels per second with a 21% per pixel performance increase over the GeForce 6 series, while the x1800xt has 10,000 million pixels per second with a 14% per pixel performance increase over the GeForce6 series. The 7800gtx clearly has a large advantage. Though, if memory bandwith is a large factor, then the r520 has the clear advantage. Seriously, though, how many programs require a large amount of memory bandwith?

The PS3

So, I've been reading up on some specs from the PS3 and certain things like IBMs floating point benchmarks on their Cell Server board and Nvidias floating point benchmarks on the GeForce 6800 ultra. Turns out the cell server board runs VERY hot at about 90nm and only gets about 200 or 300 GFlops when clocked at 3ghz. The 6800ultra gets about 40GFlops. I have a hard time believing that the Cell processor can even get 2 or 300 GFlops when it's supposedly 10x as powerful as a PC processor, when a 3ghz P4 is about 6GFlops.
Let's just assume that somehow a single core PPC with 8 specialized cores is somehow 10x as powerful as a 3.2ghz Athlon 64. Athlon64s have about an 8:5 processor clockspeed/efficiency compared to P4s clocked at the same speed. So, a 3.2ghz Athlon64 would equal 31GFlops.
The RSX inside the PS3 is based on the architecture that the G70 is based on, and supposedly they're both 2x as powerful as an SLI system using 2 GeForce 6800 ultras. Since the 6800ultra has a floating point calculation speed of 40GFlops, a G70 should have 80, maybe slightly more, but about 80.
When the power of the G70 and the power of the Cell server board are added together, the total is about 111GFlops. Sony, on the other hand, claims that their PS3(possibly even just their Cell processor alone) is capable of doing almost 20x that.

Also, something else to think about. Gamespot and IGN, and possibly other sites, report that the RSX uses only a 700mhz memory bus. The G70 uses a 1.4ghz memory bus, twice as fast as the RSX. Now, I have a hard time believing that a 24 pipeline card with a slight increase in clock speeds can pull off a100% performance increase(even theoretical), but now the same card with half the memory bus is working at the same speed? Come on, Sony. You might be able to fool dumb fanboys and hardware reviewers that just report what their told, since they can't test the hardware; but you can't fool me.

Most powerful PC available in 2005

motherboard - http://www.tyan.com/products/html/thunderk8we.html - $550

Processor - http://www.monarchcomputer.com/Merchant2/merchant.mv?Screen=PROD&Store_Code=M&Product_Code=120191 $825 x2

RAM - http://www.monarchcomputer.com/Merchant2/merchant.mv?Screen=PROD&Store_Code=M&Product_Code=140258 $569 x4

GPU - http://www.monarchcomputer.com/Merchant2/merchant.mv?Screen=PROD&Store_Code=M&Product_Code=190457 $368 x2

Drives - http://www.monarchcomputer.com/Merchant2/merchant.mv?Screen=PROD&Store_Code=M&Product_Code=150034 $210 x4

http://www.monarchcomputer.com/Merchant2/merchant.mv?Screen=PROD&Store_Code=M&Product_Code=600312 $535

http://www.newegg.com/app/ViewProductDesc.asp?description=27-131-338&depa=1 $124 x2

http://www.newegg.com/app/ViewProductDesc.asp?description=27-106-233&depa=1 $67 x2

Monitor - http://www.newegg.com/app/viewproductdesc.asp?DEPA=0&description=24-116-234&CMP=OTC-pr1c3watch&ATT=Monitors $575

Sound Card - http://www.newegg.com/app/ViewProductDesc.asp?description=29-102-181&depa=0 $286

Total - $7800

The case will have to be modded, since it's probably impossible, or next to impossible, to find a case that holds a 12"x13" motherboard, a 6800gt video card, 6x 3.5" harddrives and 1x 3.5" floppy drive, and 4x 5.25" CD and DVD drives, and still has enough air flow to keep 2x high end video cards, 2x high end processors, a 2ghz HTT chip, 8 memory dimms, and all 11 drives cool without overheating.

The processors could be the new dual core opterons coming out this year, if you really need the 4 cores, and I don't think they'll be 2.4ghz or faster, probably 2.2 or less, and still cost the same. With all that's in this system, I don't think that 200 or 400mhz is going to make a huge difference. The 4 cores might help with some physics or AI more than a faster processor would. The opteron 252 and 852 processors are the first opterons to use the 2000mhz Hyper Transport and Dual Channel memory. I'd guess that the dual cores after those would also use the 2000mhz HTT and Dual Channel memory controller.

The SCSI drive is for System Cache, set in Windows. 32bit windows XP would only allow for 4GB of RAM and Cache total, so Windows XP x64 would have to be put on the system. The drive adds another 73GB to the Cache for whatever you would need the extra cache for. The drive would also have a 320MBps memory transfer rate, and since the RAM is dual channel, the RAM would transfer at 6400mbps and the drive would transfer at 3200mbps. Since the system has 8GB of memory anway, the SCSI harddrive is optional.

The monitor has a 2048x1536 max res. which is the max. that the video cards can get anyway.

Adding a PPU would definetly benefit, especially if it's going to be as good as it's hyped up to be. And if it's not, any reason for not having could be used against having the sound card.

The DVD and CD drives were the fastest I could find. The DVD drives weren't as fast with CD read and write as normal CD drives would be.

This PCs only downfall is that with 2 cores at 5.2ghz, or 4cores at almost 5ghz, and the fact that GPUs get so much more powerful when each new generation comes, you'll have to upgrade the video cards about 3 or so years. 3 years at least, but you could get away with 5 using SLI. Instead of using 2 6800GTs, you could use 2 of the new 6800 ultras with 512MB of RAM, but 512MB should be more than enough for a while with maxed settings and max. AA and AFF at 2048x1526 res., so why SLI 2 of those for a total of 1GB of video RAM with this generation of video cards?(and yes, it is 1GB total, because split frame rendering stores the lower half of the screen on one card, and the top half on the other. And you would use split frame rendering for a game that has larger textures, like Doom3) The total storage is 1.2TB, so you won't have to upgrade anytime soon. The RAM is 8GB, unless you get the SCSI drive, then it would be over 80GB, so you shouldn't have to upgrade that for a long time.

The only advantage this PC would have over a cheaper PC is the multitasking and freakishly high framerates. The ECC and Registers in the memory also provide more stability, and the large cache helps with whatever you'd need a large cache for.

Just realized something

SLI is supposed to get around a 100% performance increase, right? But looking at benchmarks, I've realized that video cards with faster RAM access would have an easier time with games that rely on textures more for graphics, as opposed to polygons and floating points etc. With SLI, texture information is the same in the RAM when using Alternate Frame Rendering, and with Split Frame Rendering, one card does half the screen, and the other does the other half of the screen. Therefore, with games like Doom3 that have no texture compression at max settings, for example, would be better with Split Frame Rendering. Other games that have smaller textures, or use better texture compression would have higher framerates with Alternate Frame Rendering.

I also would guess that with video cards would have higher frame rates when the memory clock speed is overclocked, and react better with AA and AF, or with higher resolutions, with an overclocked core speed. I haven't been able to test my theory, since my monitor's highest res. is only 1280x1024, and I get a headache with 800x600.

Pic and details so far



Notice the SLI like connector at the top. I wander if you can connect a few of these together? Also, the molex connector on the side. Would 25W be too much for the PCI-E 1x slot to handle? Maybe this would just be for the regular PCI slot.

Also, they have on their site a program for you to download to show you how limited your CPU would be for their Novodex API. When I ran the demo called big bang, I was getting 5fps on my athlon64 3500+ and 6600gt while viewing only 14000 blocks moving on the screen, something the PhysX processor could handle easily. Also, I would think that instead of making just one physics engine for all games, they would just make an API like DirectX or OpenGL to run on their card, and let the game designers design their own physics engine using that. Which seems more likely to happen, if not now, then in the future.

Also, they have Ubisoft, Valve, Havok, and Epic taking advantage of the card, so Unreal Tournament 2006, Half-Life 3, and Splinter Cell 4 will all take advantage of this card. If those engines and games all use this card, then so will other game makers, and eventually you may have to buy a PPU along side a GPU to make games run at max. settings.

Although using a PPU may also end up the same as the APU did. Buying a PPU card would be about the same as buying a brand new sound card just for EAX 3.0 support and better sound quality, which in some peoples opinions isn't worth the money.