BoostClock
NVIDIA RTX ray tracing benchmarks - Chasing GigaRays with DXR/OptiX - RTX 2080 | GTX 1080 Ti | GTX 1080 | GTX 980 Ti

When releasing Turing NVIDIA claimed their top-end RTX 2080 Ti can achieve 10+ GigaRays in certain "workloads" thanks to hardware accelerated ray tracing via dedicated RT Cores - in comparison to Pascal based GPUs Turing promises 10x speedup. Can one really achieve 10 GigaRays/Sec rendering complex scenes? What's the truth behind the numbers? Read on!

Rays per second is a common metric for evaluating performance in different ray tracer or path tracer renderers, it is widely used in research papers as well. On the other hand, a number of factors can alter the final "rays per second" assessment - ray tracing efficiency is dependent on scene complexity, distribution of triangles, ray coherence or camera viewpoint just to name a few. It is really hard to come up with a single performance number for all use-cases and workloads. The table below highlights that "rays per second" can potentially triple changing scenes or render methods (primary, ambient occlusion, diffuse).

According to the Turing Architecture Whitepaper NVIDIA measured ray tracing performance with primary rays on 5 different scenes containing a single mesh.

Microsoft DirectX Raytracing (DXR) Simple Lighting sample

It is not clear which API was used by NVIDIA to arrive at the GigaRays scores, but a safe bet is CUDA with OptiX as DXR is still under development. Nonetheless, the D3D12 Raytracing Simple Lighting sample from the DirectX-Graphics-Samples has all the functionality to recreate a similar benchmark setup - one can load up a mesh and gather rays / sec metric for casting primary rays.

Some changes and quirks were required, though. To facilitate loading huge meshes the project was modified to support large index buffers. The ray emitting process was also tweaked to shoot more rays from a single invocation (8*8 tile) in order to provide enough rays to saturate the GPU.

With the increased number of rays we were able to closely reproduce the officially advertised 8 GigaRays / sec on the Happy Buddha mesh and came close to that on the Dragon mesh. Additionally, two high poly meshes were also benchmarked and as expected, produced much lower scores. Oddly, the score for the Asian Dragon mesh doesn't converge as the resolution (and hence the number of rays) is increased - something is not quite right!

Unfortunately, running the benchmark on the GTX 1080 Ti resulted in system hangs, which prevented reliable score polling of the Fallback Layer's performance - GPUs from the Pascal architecture don't have native driver/hardware support so they can use the D3D12 Raytracing Fallback Layer, a library that emulates the DirectX Raytracing API with compute shaders.

Intel ProtoRay with OptiX

In order to have some variety and test out RTX vs GTX ray tracing performance we set up Intel's ProtoRay ray tracing benchmark with OptiX backend. We recompiled the project with VS2017 targeting CUDA SDK 10 and OptiX 5.1. Unfortunately, NVIDIA has not released RTX compatible OptiX just yet, so the test does not use RT Cores found in Turing. Intel released reference results for NVIDIA Tesla P100 using complex, high poly count scenes - we reproduced the same workloads with our RTX/GTX GPUs.

NVlab Fermat

For completeness the Fermat benchmark scores are also update with the RTX 2080.

Conclusion:

Rays/s doesn't represent a theoretical maximum, like how many operations a device can execute in a given time period or cycle, varies by mesh/scene complexity and viewpoint. What will happen if a more efficient algorithm emerges that speeds up ray-triangle intersection or traversal? Moreover, the current score represents the performance of the GPUs on five relatively simple meshes with the most coherent rays possible (primary) - this may indicate the ray-tracing prowess of the current generation of the RTX line-up in relation to each other, but it is highly likely that it won't differentiate future RTX GPUs performance-wise that well.

Hardware setup

PSU: Cooler Master 1000W VANGUARD

MOTHERBOARD: ASRock X470 Taichi

CPU: AMD Ryzen 7 2700X

GPU: MSI GTX 980Ti GAMING 6G

GPU: MSI GTX 1080 GAMING X+ 8G

GPU: MSI GTX 1080Ti GAMING X 11G

GPU: MSI RTX 2080 GAMING X TRIO

OS: Microsoft Windows 10 (10.0) Home 64-bit - Version 1809/RS5 (17763.1)

DRIVER: CUDA 10.0.132 - NV 416.16

RAM: G.Skill FlareX 16GB (2X8GB) DDR4 3200MHz

STORAGE: Samsung 960 EVO NVMe M.2 500GB

COOLER: Wraith Prism with RGB LED

Scene information/summary:

Stanford 3D Scanning Repository - Dragon (871k tris)

Stanford 3D Scanning Repository - Happy Buddha (1,087k tris)

Stanford 3D Scanning Repository - Asian Dragon (7,219k tris)

Austrian Imperial Crown by Martin Lubich (4.8M tris)

San Miguel by Guillermo M. Leal Llaguno (10.5M tris)

Power Plant by UNC Gamma (12.8M tris)