Thursday, March 7, 2013

Performance work: High-Speed, Off-Screen Particles

The clouds in Thunder Moon are the nicest clouds I've ever made, but they do unfortunately take a pretty big toll on performance. There are many layers of textures and drawing all of them takes a lot of the GPU's fill rate. Yesterday, I finally got around to implementing the concepts described in this article: High-Speed, Off-Screen Particles which helped to increase the frame rate by a good amount.

The basic idea is you render objects like the clouds into a smaller texture, and then later composite that texture into the full scene render, using linear filtering in the process to smooth out the slightly lower resolution clouds as they are drawn. This is a big performance saver because the number of pixels drawn by the the cloud system is much less than when drawing at the full screen resolution. There is a catch though; a smaller texture rendered over the scene like this will have artifacts where the texture meets another part of the scene even with the linear filtering. Unfortunately I didn't capture a screenshot of that for this post, and I shouldn't take the time to go set this up again but if you check out the linked article they have good examples of the artifacts I'm talking about. They are most noticeable when the camera is moving around.

So fixing the artifacts requires detecting the edges in the low res buffer and drawing the full resolution clouds into those places the edges are detected.

The thumbnails below all link to the 1080p versions.

So, here is the original scene:
The original scene with clouds rendered at full resolution.
The first part of the algorithm is the clouds are rendered into an off-screen blend buffer and then processed to detect the edges as shown below the edge buffer.
The edge detection mask rendered over the scene.
Then the blend buffer is drawn over the game scene during another part of the rendering, using the edge buffer to limit where it is drawn. This avoids the edges and prevents the creation of artifacts. While it is drawing, it writes stencil values so the next stage can detect where the low res clouds were and weren't drawn.
The low res buffer without drawing the full resolution edges.
Finally, the full resolution clouds are drawn again, but this time reading the stencil values so it only drawn in the detected edge areas. The end result is something that looks very similar to the original full resolution with a worthy performance boost.
The low res buffer composited the full resolution clouds drawn only where edges were detected.
Fun stuff, and the game is faster now as a result.