Working in real-time allows for great speed, flexibility and experimentation vs offline rendering / video. But projects designed to run in real-time have one key consideration that video doesn’t: performance.

Performance of most real-time 3D scenes is limited by a number of factors: the size of the canvas; the complexity of lighting, materials and shading; post processing; visual effects (particles+fields etc) in the scene; and the 3D assets themselves. The scene is always limited by the slowest thing in it – it’s no good optimising the assets but not the particles, for example.

Preparing 3D Assets for Real-time

Firstly, and obviously, the fastest thing to render is the thing that isn’t rendered. Graphics hardware still processes polygons through the pipeline even if they aren’t visible on screen. Notch culls meshes that lie outside the planes of the rendering camera, but can only do this on a per-mesh basis: if an object crosses the edge of the camera, it will be rendered in entirety. Therefore in some cases, e.g. flying down a tunnel, it may be worth cutting up the mesh into chunks if some chunks can be completely outside of the camera view at some points and therefore not processed. This also applies to polygons that face away from the camera or are hidden behind other meshes. A mesh that exists in the scene but is entirely hidden behind other objects will still be rendered completely and still cost time to process: it should be removed manually by the artist.

A 3D object is often rendered in multiple passes by the engine – for example, each shadow map rendered requires re-rendering the object. Often large parts of the scene have no real effect on the shadow map or can even adversely affect it, and it pays to turn “Cast Shadows” off on those. The floor is a particularly common example: if you have a flat floor in your scene, always disable cast shadows for it. It’ll improve the quality of the shadow map too.

What makes a 3D asset perform badly? The major reasons are, in order:
- Number of batches
- Material and texture complexity
- Deformation, animation, skins
- Polygon and vertex counts

The number of batches is by far the biggest factor in many scenes. A “batch” is a piece of a mesh that can be rendered in one go by the graphics hardware. Each separate object node, each separate mesh inside each object node, and each block of polys using a different material inside a mesh, all result in a new batch. Graphics hardware likes to deal with a small number of large batches, and hates to deal with lots of small batches. In fact, with less than around 2000 vertices in a batch the graphics hardware doesn’t even properly spin up – it wastes time doing any less. That’s right – if you have a batch of less than 2000 vertices, you could probably add more vertices and it would take the same amount of time to render!
Worse than that, each batch that has to be processed adds a lot of overhead on CPU and GPU. There’s a limit of the number of batches you can actually process in a frame before it impacts the frame rate heavily: more than a few hundred in total is an issue. Considering that a scene is often rendered multiple times – each shadow map rendered requires a full rendering pass of the scene – then batch counts quickly add up. A badly optimised 3D file can easily destroy the frame rate.

How can you minimise batch counts? Firstly, 3D packages often favour creation of lots of small separate meshes – so an innocent-looking scene made of 100 cubes may contain 100 separate meshes, leading to a very slow real-time render. Notch does not merge meshes on load because the user often intends to keep objects separate in order to modify & animate them separately in Notch. Therefore, the first step to be taken is to merge as much of the scene as possible into one single mesh (even one with lots of materials). So important is this step that it’s often worth sacrificing other “optimisations” in order to achieve it: for example it can often be more efficient to take a scene made of lots of rigidly, independently animating meshes, merge them all and bake all the animation to a huge vertex cache animation – or a skin + bone animation – than it is to run the original scene. This also goes against the previous statement about splitting meshes into chunks for culling, and neatly demonstrates that there are few hard rules in 3D scene optimisation.

The next process is to merge materials. Take for example a cube where each face has a separate texture applied. This will be rendered as 6 batches of 2 quads each – incredibly inefficient and slow. It would probably be faster to render a single 10,000 poly mesh with one texture than to render the cube with six different ones. To reduce this cost textures can be merged into a single atlas per mesh using functionality in the 3D software. One material per mesh is ideal.

The next issue is material and texture complexity. Large textures are often considered to be a cause of poor performance but this is not entirely true: what matters is actually texture density. A large texture squeezed on to a small polygon (small in output render size) may be costly. However, Notch auto-creates mipmaps from textures which mitigate this issue, but it’s still worth – for quality of rendering and reduction of aliasing, not just performance – to ensure that your texture map density is in good relation to the size of the thing on screen: a 1:1 ratio is the goal. A 5 pixel sized object on screen does not need a 512×512 texture. In olden days texture sizes had to be power of 2 (512, 1024, 2048 etc). This is no longer necessary but it aids mipmap generation if abided by.
Texture formats also matter: a 16bit per channel texture is twice as slow to render as an 8 bit per channel texture. Optimising textures to DXT formats where possible reduces their memory footprint but also greatly improves their cost to render. In a PBR workflow, albedo (colour) maps can typically be compressed to DXT1 (RGB) without much noticeable difference; roughness and specular maps can be compressed to greyscale BC4. Normal maps generally should not be compressed below 8 bits per channel.

However, material complexity is the much greater enemy than textures alone. A material is capable of using a number of textures at once: e.g. normal maps, roughness maps, displacement maps etc. Each of those elements carries a cost: in simple terms, a material using 4 textures is 4 times as heavy to render as material that uses one. But some of these stages – such as normal mapping – take additional processing beyond just reading the texture. Consider whether each texture channel really has an impact on the render. Simple materials and materials without textures are faster to process. The cost of material evaluation also heavily impacts lighting. Lighting is often the most demanding part of the overall rendering pipeline, and options in the material such as reflections are costly. Those will be discussed later.

This brings us to polygon and vertex counts. The biggest fallacy in 3D optimisation is the assumption that reducing the polygon count makes everything go faster; it should now be clear why this is not the case after the discussion of batches. GPUs are tremendously powerful nowadays and if the mesh is properly set up and dispatched millions of polygons can be rendered comfortably. However, ultimately and at extremes, polygon counts do matter – particularly when meshes are additionally processed with deformers, or in particular if the object is cloned multiple times! As with everything else, polygon counts should be kept reasonable for the way the object will be seen on screen.

Lighting

Lighting is usually the most GPU-intensive part of 3D scene rendering. Notch has two lighting modes: forward and deferred, and it is deferred rendering that will usually be used for “high quality” 3D scene rendering with shadows etc. Deferred lighting works by first rendering the 3D scene to a number of separate images (called “GBuffers”) at once, containing information like normals, albedo, roughness, depth and so on. Once these buffers are generated the lighting is all calculated as a post-process on the images. This has some great benefits: each pixel may be rendered by overlapping polys multiple times, but it will only be lit once; and the geometry only has to be rendered once. This means that overdraw, hidden polygons, texture sizes, batch counts and polygon and vertex counts are not a concern when thinking about lighting calculations. What does affect lighting calculations is the size of the canvas / output render; doubling the number of (lit) pixels on screen doubles the time taken for lighting. Lighting complexity is likely to be the limiting factor when 3D scenes are rendered in high resolutions – the size of the required render must be a key consideration when determining which lighting techniques to use.

Basic lights – spot, point, directional and area lights – are calculated analytically, making them fast to process even in large numbers unless shadows are enabled. Enabling shadows on a light massively increases its computational complexity. Shadows are generated as shadow maps, which require the entire scene to be rendered again to texture, once (for spot lights), six times (for point lights) or many times (for area lights); the resulting texture must then be sampled by each lit pixel on screen – typically many times per pixel – to calculate the shadow result. Softening shadows requires more samples, making it slower and slower to calculate. Notch optimises the shadow sampling process to try and avoid processing the light and reading the shadow map in areas of the screen which are outside the attenuation area of the light. There are two different modes for shadow map sampling: PCF shadow maps require lots of samples to make soft shadows smooth; Variance shadow maps optimise some of this process into a pre-calculation, making them faster for very soft shadows. Shadow softness may be constant or varying, based on the occlusion of the object; varying softness requires even more calculations to generate, so takes longer. To optimise basic lights first disable shadows where ever possible, reduce the shadow softness otherwise, use constant softness not varying where possible, try variance shadow maps, and then make your shadowing lights as small as possible.

For reflections, environment maps are cheap approximation and can be calculated quickly. The environment map itself can take time to process, so static environment maps should be used where ever possible: video or other animation applied to video nodes linked to the environment map source causes the environment map to be dynamic and will make it much slower. Advanced lighting options such as screen space reflections, sky lights and voxel cone shading are very heavy processes requiring a considerable amount of preprocessing and lots of samples. Screen space ambient occlusion may be a cheaper alternative to sky lights.

Volumetric lighting is available in two forms in Notch. The simple scattering available by setting Scattering Intensity on light nodes is fast but does not allow shadow maps, projected textures or more complex effects as it is calculated analytically, just based on the light cone and the per-pixel depths. The Volumetric Lighting node allows access to these but – as could be expected – has a great impact on performance: the volumetric effect now has to be ray casted per pixel through multiple scattering sources, with shadows evaluated at each step along the ray. Where possible, only use full volumetric lighting where lights really need it: e.g. where the shadow map or projected texture are visually key. The Volumetric Lighting node has “Affected Lights” and “Excluded Lights” inputs which may be used to restrict the set of lights in the scene which use full volumetric lighting.

In many scenes it may be possible to bake some or all of the lighting to textures, massively reducing the computational cost per frame. This is the most effective way to optimise lighting while maintaining a good visual result. Specular lighting and reflections will always change when either the camera, the lights or the objects in the scene move; it is therefore usually impossible to bake. Diffuse illumination, bounce lighting and ambient occlusion on the other hand only change when the objects move; bounces and ambient occlusion are also the slowest things to calculate. As such it is common to bake the bounce / ambient occlusion passes plus some of the light sources into textures (light maps), then use a small number of real-time key lights to provide specular, and shadow cues from a few moving objects. Baked lightmaps can be generated offline in other 3D tools and loaded into the material’s Diffuse Illumination channel.

Antialiasing

A beautifully modelled, lit and textured 3D scene can often be let down be aliasing artefacts. Real-time renderers do not have the massive oversampling options available in offline renderers so it becomes much more of a problem – but antialiasing techniques allow things to be improved. There are a number of methods available in Notch but first we must consider where aliasing actually comes from. There are three major types: – polygon edge (geometric) aliasing, material / texture aliasing, and lighting / shading aliasing.

Geometric aliasing is often easily visible on straight edges of 3D objects at angles – they look jagged. Removing this requires use of one of the antialiasing methods in Notch: MSAA, FXAA or Temporal AA. MSAA specifically treats pixels that lie on the edges of polygons as multiple samples which are later averaged; this costs processing time because each pixel may be processed and lit multiple times rather than once. FXAA is effectively a post-blurring technique that looks for edges, and is cheap but tends to lose sharpness. Temporal AA uses multiple frames and generated motion vectors to smooth pixels over multiple frames, which is reasonably cheap and looks good but breaks down in some scenarios. It’s important to understand that polygon edges don’t always result from the silhouette of an object; cracks in polygons are a common cause. A crack occurs where the edges of two neighbouring polygons do not perfectly match up: e.g. small differences in unwelded vertex positions that can be solved by welding them; or one edge butting up against two. This must be fixed in the artwork: clean meshes are essential for a good render.

Texture aliasing is visible on the interior of a textured 3D object. This occurs when textures – including specular, roughness and normal maps – have hard lines and/or are too large for the number of pixels on screen. The best solution to this is to fix the artwork: smooth it out, get the resolutions right vs the density of the texture on screen. The UV coordinates on the mesh can be a big contributor: uneven UVs or hard joins. Normal map and roughness map aliasing are particularly important because they directly cause the last type: shading aliasing.

Shading aliasing occurs when processing lighting and shading. A sharp change in normal direction may result in a huge change in e.g. specular light result from one pixel to the next – which results in aliasing. This is the most challenging type to remove and must be done by careful work by the artist. Notch has a number of views to help: you can view normals, texture coordinates etc. to try and diagnose the issue.

Shadow aliasing forms a part of shading aliasing but one to consider separately. It shows as jagged edges along the lines of hard shadows, and is caused when shadow maps are too low resolution for the area they are asked to cover. In practice this is a difficult problem to resolve because depends not just on the cone of the light, but the area on to which the shadow map is sampled: it may be fine with a zoomed out camera and a low screen resolution, but zoomed in on a small area of the scene and rendered at high resolution the problem may become more evident. The only options are to reduce the area covered by the light, increase shadow map resolution, or soften shadows to hide the problem.