BoostClock
Blender v2.80 beta Cycles rendering performance and tile size analysis - RTX 2080 Ti | TITAN V | GTX 1080 Ti

Blender v2.80 beta is out, this major new version packs tons of features, revamped user interface, a high-end viewport and much more. Blender v2.80 has also many changes regarding Cycles and rendering speed-up. In this write-up our focus is on how tile size can effect GPU rendering times using v2.79 vs v2.80 beta.

Tile size vs rendering speed

In Blender Cycles tile size settings control how the camera viewport is subdiveded into smaller chunks, these smaller "views" of the scene are then dispatched to the compute devices (CUDA GPU, OpenCL GPU, CPU). Each CPU thread can process one tile at a time, so if you have a 4 core/8 thread machine you will get 8 tile rendering simultaneously. On the other hand, one GPU can work on a single tile only, upon completion the device will get a new tile, etc. Modern GPUs need lot of work to get all the ALU resources utilized, large amount of threads required to saturate the GPU cores.

The suggested tile size for Blender v2.79 and earlier editions is 256x256 to get the most out of the GPUs, but developers made strides with v2.80 in using smaller tile sizes for CUDA renders so it no longer needs to use large tiles. We will provide render times for the go-to 256x256 tile size in multiple scenes and show how render time can vary with increased / decreased tile sizes with v2.79b and v2.80 beta. Additionally, you can compare the best render times of the different Blender versions and check out the render outputs as well.

Test methodology

The card in question was used to drive the display - some applications gain performance when a card is a compute only device. Blender is launched headless (no GUI) with a python script responsible to get everything sorted and start the rendering process. Render time is extracted so that it only covers pure path tracing time (pure dGPU performance) - no kernel compilation, scene loading, CPU-side BVH construction, final composition.

At BoostClock, we always repeat the render benchmarks at least 3 times so that any troubled run can be thrown out of the data set or repeated if needed. Unfortunately, the sheer number of benchmark runs meant that every tile size variation was only run once.

The benchmark scenes weren't altered beside the tile size dimensions - this way one can assess how much of an improvement a new GPU would mean compared to his / her setup.

Unfortunately, most of the scenes come with render dimensions that are hard to divide into even tiles - we tried to stick to even tile distribution as residual / sliver tiles can lead to higher render times.

Conclusion

  • it is almost always worthwhile to up the default 256x256 tile size with v2.79b with these powerful GPUs, so pick the nearest one that generates full tiles along your image (partitioning the image into 256x256 tiles results in smaller, sliver tiles along the side of the image for most of the scenes)
  • 256x256 is a safe choice for v2.79 (but you can shave some time off of your render times with even tiles along the whole image as describew above), performance starts to deteriorate quickly below that
  • v2.80 beta render times don't change that drastically with smaller tile sizes, the results are much closer to each other
  • the pavillion_barcelona scene shows slight regression
  • the GTX 1080 Ti produces the fastest render times on the barbershop_interior scene - either the Branched Path Tracer or the material setup has some performance issues with newer NVIDIA architectures

It's important to note that there are some image quality differences using v2.80 vs v2.79b (different random seed generator - pavillion_barcelona, splash_277; missing lights - classroom; slightly different fur shading - fishy_cat; noisy specular highlight - barbershop_interior). Morover, the koro and the splash_278 scene don't generate final render image, the output is blank, so the results for v2.80 are omitted (seemingly it is only a post-processing / final composition issue as the tiles are rendered without any problem).

Future work

According to the release notes of v2.80, 32x32 tiles should be just as fast (or faster). Due to time constraints we couldn't explore the full range of tile sizes. It would be great to test more scenes with Branched Path Tracer. The Agent 327 scene (v2.79b splash screen) would be a great addition - unfortunately, it failed to render when launched without GUI.

Hardware setup

PSU: Cooler Master 1000W VANGUARD

MOTHERBOARD: ASRock X470 Taichi

CPU: AMD Ryzen 7 2700X

GPU: NVIDIA GTX 1080Ti FE

GPU: NVIDIA RTX 2080Ti FE

GPU: NVIDIA TITAN V

OS: Microsoft Windows 10 (10.0) Home 64-bit - Version 1809/RS5 (17763.195)

SW: blender-2.79b-windows64

SW: blender-2.80.0-git.7c438e5366b2-windows64

DRIVER: NV 417.35

RAM: G.Skill FlareX 16GB (2X8GB) DDR4 3200MHz

STORAGE: Samsung 960 EVO NVMe M.2 500GB

COOLER: Wraith Prism with RGB LED