Introduction
With all the low-level stuff in place it’s time to take a look at how we drive rendering in Stingray, i.e how a final frame comes together. I’ve covered this in various presentations over the years but will try do go through everything again to give a more complete picture of how things fit together.
Stingray features what we call a data-driven rendering pipe, basically what we mean by that is that all shaders, GPU resource creation and manipulation, as well as the entire flow of a rendered frame is defined in data. In our case the data is a set of different json files.
These json-files are hot-reloadable on all platforms, providing a nice workflow with fast iteration times when experimenting with various rendering techniques. It also makes it easy for a project to optimize the renderer for its specific needs (in terms of platforms, features, etc.) and/or to push it in other directions to better suit the art direction of the project.
There are four different types of json-files driving the Stingray renderer:
.render_config
- the heart of a rendering pipe..render_config_extension
- extensions to an existing.render_config
file..shader_source
- shader source and meta data for compiling statically declared shaders..shader_node
- shader source and meta data used by the graph based shader system.
Today we will be looking at the render_config
, both from a user’s perspective as well as how it works on the engine side.
Meet the render_config
The render_config
is a sjson file describing everything from which render settings to expose to the user to the flow of an entire rendered frame. It can be broken down into four parts: render settings, resource sets, layer configurations and resource generators. All of which are fairly simple and minimalistic systems on the engine side.
Render Settings & Misc
Render settings is a simple key:value map exposed globally to the entire rendering pipe as well as an interface for the end user to peek and poke at. Here’s an example of how it might look in the render_config
file:
render_settings = {
sun_shadows = true
sun_shadow_map_size = [ 2048, 2048 ]
sun_shadow_map_filter_quality = "high"
local_lights_shadow_atlas_size = [ 2048, 2048 ]
local_lights_shadow_map_filter_quality = "high"
particles_local_lighting = true
particles_receive_shadows = true
debug_rendering = false
gbuffer_albedo_visualization = false
gbuffer_normal_visualization = false
gbuffer_roughness_visualization = false
gbuffer_specular_visualization = false
gbuffer_metallic_visualization = false
bloom_visualization = false
ssr_visualization = false
}
As you will see we have branching logics for most systems in the render_config
which allows the renderer to take different paths depending on the state of properties in the render_settings
. There is also a block called render_caps
which is very similar to the render_settings
block except that it is read only and contains knowledge of the capabilities of the hardware (GPU) running the engine.
On the engine side there’s not that much to cover about the render_settings
and render_caps
, keys are always strings getting murmur hashed to 32 bits and the value can be a bool
, float
, array of floats
or another hashed string
.
When booting the renderer we populate the render_settings
by first reading them from the render_config
file, then looking in the project specific settings.ini
file for potential overrides or additions, and last allowing to override certain properties again from the user’s configuration file (if loaded).
The render_caps
block usually gets populated when the RenderDevice
is booted and we’re in a state where we can enumerate all device capabilities. This makes the keys and values of the render_caps
block somewhat of a black box with different contents depending on platform, typically they aren’t that many though.
So that covers the render_settings
and render_caps
blocks, we will look at how they are actually used for branching in later sections of this post.
There are also a few other miscellaneous blocks in the render_config
, most important being:
shader_pass_flags
- Array of strings building up a bit flag that can be used to dynamically turn on/off various shader passes.shader_libraries
- Array of whatshader_source
files to load when booting the renderer. Theshader_source
files are libraries with pre-compiled shader libraries mainly used by the resource generators.
Resource Sets
We have the concept of a RenderResourceSet
on the engine side, it simply maps a hashed string to a GPU resource. RenderResourceSets
can be locally allocated during rendering, creating a form of scoping mechanism. The resources are either allocated by the engine and inserted into a RenderResourceSet
or allocated through the global_resources
block in a render_config
file.
The RenderInterface
owns a global RenderResourceSet
populated by the global_resources
array from the render_config
used to boot the renderer.
Here’s an example of a global_resources
array:
global_resources = [
{ type="static_branch" platforms=["ios", "android", "web", "linux"]
pass = [
{ name="output_target" type="render_target" depends_on="back_buffer"
format="R8G8B8A8" }
]
fail = [
{ name="output_target" type="alias" aliased_resource="back_buffer" }
]
}
{ name="depth_stencil_buffer" type="render_target" depends_on="output_target"
w_scale=1 h_scale=1 format="DEPTH_STENCIL" }
{ name="gbuffer0" type="render_target" depends_on="output_target"
w_scale=1 h_scale=1 format="R8G8B8A8" }
{ name="gbuffer1" type="render_target" depends_on="output_target"
w_scale=1 h_scale=1 format="R8G8B8A8" }
{ name="gbuffer2" type="render_target" depends_on="output_target"
w_scale=1 h_scale=1 format="R16G16B16A16F" }
{ type="static_branch" render_settings={ sun_shadows = true }
pass = [
{ name="sun_shadow_map" type="render_target" size_from_render_setting="sun_shadow_map_size"
format="DEPTH_STENCIL" }
]
}
{ name="hdr0" type="render_target" depends_on="output_target" w_scale=1 h_scale=1
format="R16G16B16A16F" }
]
So while the above example mainly shows how to create what we call DependentRenderTargets
(i.e render targets that inherit its properties from another render target and then allow overriding properties locally), it can also create other buffers of various kinds.
We’ve also introduced the concept of a static_branch
, there are two types of branching in the render_config
file: static_branch
and dynamic_branch
. In the global_resource
block only static branching is allowed as it only runs once, during set up of the renderer. (Note: The branch syntax is far from nice and we nowadays have come up with a much cleaner syntax that we use in the shader system, unfortunately it hasn’t made its way back to the render_config
yet.)
So basically what this example boils down to is the creation of a set of render targets. The output_target
is a bit special though, on PC and consoles we simply just setup an alias for an already created render target - the back buffer, while on gl based platforms we create a new separate render target. (This is because we render the scene up-side-down on gl-platforms to get consistent UV coordinate systems between all platforms.)
The other special case from the example above is the sun_shadow_map
which grabs the resolution from a render_setting
called sun_shadow_map_size
. This is done because we want to expose the ability to tweak the shadow map resolution to the user.
When rendering a frame we typically pipe the global RenderResourceSet
owned by the RenderInterface
down to the various rendering systems. Any resource declared in the RenderResourceSet
is accessible from the shader system by name. Each rendering system can at any point decide to create its own local version of a RenderResourceSet
making it possible to scope shader resource access.
Worth pointing out is that the resources declared in the global_resource
block of the render_config
used when booting the engine are all allocated in the set up phase of the renderer and not released until the renderer is closed.
Layer Configurations
A render_config
can have multiple layer_configurations
. A Layer Configuration is essentially a description of the flow of a rendered frame, it is responsible for triggering rendering sub-systems and scheduling the GPU work for a frame. Here’s a simple example of a deferred rendering pipe:
layer_configs = {
simple_deferred = [
{ name="gbuffer" render_targets=["gbuffer0", "gbuffer1", "gbuffer2"]
depth_stencil_target="depth_stencil_buffer" sort="FRONT_BACK" profiling_scope="gbuffer" }
{ resource_generator="lighting" profiling_scope="lighting" }
{ name="emissive" render_targets=["hdr0"]
depth_stencil_target="depth_stencil_buffer" sort="FRONT_BACK" profiling_scope="emissive" }
{ name="skydome" render_targets=["hdr0"]
depth_stencil_target="depth_stencil_buffer" sort="BACK_FRONT" profiling_scope="skydome" }
{ name="hdr_transparent" render_targets=["hdr0"]
depth_stencil_target="depth_stencil_buffer" sort="BACK_FRONT" profiling_scope="hdr_transparent" }
{ resource_generator="post_processing" profiling_scope="post_processing" }
{ name="ldr_transparent" render_targets=["output_target"]
depth_stencil_target="depth_stencil_buffer" sort="BACK_FRONT" profiling_scope="transparent" }
]
}
Each line in the simple_deferred
array specifies either a named layer that the shader system can reference to direct rendering into (i.e a renderable object, like e.g. a mesh, has shaders assigned and the shaders know into which layer they want to render - e.g gbuffer
), or it can trigger a resource_generator
.
The order of execution is top->down and the way the GPU scheduling works is that each line increments a bit in the “Layer System” bit range covered in the post about sorting.
On the engine side the layer configurations are managed by a system called the LayerManager
, owned by the RenderInterface
. It is a tiny system that basically just maps the named layer_config
to an array of “Layers”:
struct Layer {
uint64_t sort_key;
IdString32 name;
render_sorting::DepthSort depth_sort;
IdString32 render_targets[MAX_RENDER_TARGETS];
IdString32 depth_stencil_target;
IdString32 resource_generator;
uint32_t clear_flags;
#if defined(DEVELOPMENT)
const char *profiling_scope;
#endif
};
sort_key
- As mentioned above and in the post about how we do sorting, each layer gets asort_key
assigned from the “Layer System” bit range. By looking up the layer’ssort_key
and using that when recordingCommands
toRenderContexts
we get a simple way to reason about overall ordering of a rendered frame.name
- the shader system can use this name to look up the layer’ssort_key
to group draw calls into layers.depth_sort
- describes how to encode the depth range bits of the sort key when recording aRenderJobPackage
to aRenderContext
.depth_sort
is an enum that indicates if sorting should be done front-to-back or back-to-front.render_targets
- array of named render target resources to bind for this layerdepth_stencil_target
- named render target resource to bind for this layerresource_generator
-clear_flags
- bit flag hinting if color, depth or stencil should be cleared for this layerprofiling_scope
- used to record markers on theRenderContext
that later can be queried for GPU timings and statistics.
When rendering a World
(see: RenderInterface) the user passes a viewport to the render_world
function, the viewport knows which layer_config
to use. We look up the array of Layers
from the LayerManager
and record a RenderContext
with state commands for binding and clearing render targets using the sort_keys
from the Layer
. We do this dynamically each time the user calls render_world
but in theory we could cache the RenderContext
between render_world
calls.
The name Layer
is a bit misleading as a layer also can be responsible for making sure that a ResourceGenerator
runs, in practice a Layer
is either a target for the shader system to render into or it is the execution point for a ResourceGenerator
. It can in theory be both but we never use it that way.
Resource Generators
The Resource Generators is a minimalistic framework for manipulating GPU resources and triggering various rendering sub-systems. Similar to a layer configuration a resource generator is described as an array of “modifiers”. Modifiers get executed in the order they were declared. Here’s an example:
auto_exposure = {
modifiers = [
{ type="dynamic_branch" render_settings={ auto_exposure_enabled=true } profiling_scope="auto_exposure"
pass = [
{ type="fullscreen_pass" shader="quantize_luma" inputs=["hdr0"]
outputs=["quantized_luma"] profiling_scope="quantize_luma" }
{ type="compute_kernel" shader="compute_histogram" thread_count=[40 1 1] inputs=["quantized_luma"]
uavs=["histogram"] profiling_scope="compute_histogram" }
{ type="compute_kernel" shader="adapt_exposure" thread_count=[1 1 1] inputs=["quantized_luma"]
uavs=["current_exposure" "current_exposure_pos" "target_exposure_pos"] profiling_scope="adapt_exposure" }
]
}
]
}
First modifier in the above example is a dynamic_branch
. In contrast to a static_branch
which gets evaluated during loading of the render_config, a dynamic_branch
is evaluated each time the resource generator runs making it possible to take different paths through the rendering pipeline based on settings and other game context that might change over time. Dynamic branching is also supported in the layer_config
block.
If the branch is taken (i.e if auto_exposure_enabled
is true) the modifiers in the pass
array will run.
The first modifier is of the type fullscreen_pass
and is by far the most commonly used modifier type. It simply renders a single triangle covering the entire viewport using the named shader
. Any resource listed in the inputs
array is exposed to the shader. Any resource(s) listed in the outputs
array are bound as a render target(s).
The second and third modifiers are of the type compute_kernel
and will dispatch a compute shader. inputs
array is the same as for the fullscreen_pass
and uavs
lists resources to bind as UAVs.
This is obviously a very basic example, but the idea is the same for more complex resource generators. By chaining a bunch of modifiers together you can create interesting rendering effects entirely in data.
Stingray ships with a toolbox of various modifiers, and the user can also extend it with their own modifiers if needed. Here’s a list of some of the other modifiers we ship with:
cascaded_shadow_mapping
- Renders a cascaded shadow map from a directional light.atlased_shadow_mapping
- Renders a shadow map atlas from a set of spot and omni lights.generate_mips
- Renders a mip chain for a resource by interleaving a resource generator that samples from sub-resource n-1 while rendering into sub-resource n.clustered_shading
- Assign a set of light sources to a clustered shading structure (on CPU at the moment).deferred_shading
- Renders proxy volumes for a set of light sources with specified shaders (i.e. traditional deferred shading).stream_capture
- Reads back the specified resource to CPU (usually multi-buffered to avoid stalls).fence
- Synchronization of graphics and compute queues.copy_resource
- Copies a resource from one GPU to another.
In Stingray we encourage building all lighting and post processing using resource generators. So far it has proved very successful for us as it gives great per project flexibility. To make sharing of various rendering effects easier we also have a system called render_config_extension
that we rolled out last year, which is essentially a plugin system to the render_config
files.
I won’t go into much detail how the resource generator system works on the engine side, it’s fairly simple though; There’s a ResourceGeneratorManager
that knows about all the generators, each time the user calls render_world
we ask the manager to execute all generators referenced in the layer_config
using the layers sort key. We don’t restrain modifiers in any way, they can be implemented to do whatever and have full access to the engine. E.g they are free to create their own ResourceContexts
, spawn worker threads, etc. When the modifiers for all generators are done executing we are handed all RenderContexts
they’ve created and can dispatch them together with the contexts from the regular scene rendering. To get scheduling between modifiers in a resource generators correct we use the 32-bit “user defined” range in the sort key.
Future improvements
Before we wrap up I’d like to cover some ideas for future improvements.
The Stingray engine has had a data-driven renderer from day one, so it has been around for quite some time by now. And while the render_config
has served us good so far there are a few things that we’ve discovered that could use some attention moving forward.
Scalability
The complexity of the default rendering pipe continues to increase as the demand for new rendering features targeting different industries (games, design visualization, film, etc.) increases. While the data-driven approach we have addresses the feature set scalability needs decently well, there is also an increasing demand to have feature parity across lots of different hardware. This tends to result in lots of branching in render_config
making it a bit hard to follow.
In addition to that we also start seeing the need for managing multiple paths through the rendering pipe on the same platform, this is especially true when dealing with stereo rendering. On PC we currently we have 5 different paths through the default rendering pipe:
- Mono - Traditional mono rendering.
- Stereo - Old school stereo rendering, one
render_world
call per eye. Almost identical to the mono path but still there are some stereo specific work for assembling the final image that needs to happen. - Instanced Stereo - Using “hardware instancing” to do stereo propagation to left/right eye. Single scene traversal pass, culling using a uber-frustum. A bunch of shader patch up work and some branching in the
render_config
. - Nvidia Single Pass Stereo (SPS) - Somewhat similar to instanced stereo but using nvidia specific hardware for doing multicasting to left/right eye.
- Nvidia VRSLI - DX11 path for rendering left/right eye on separate GPUs.
We estimate that the number of paths through the rendering pipe will continue to increase also for mono rendering, we’ve already seen that when we’ve experimented with explicit multi-GPU stuff under DX12. Things quickly becomes hairy when you aren’t running on a known platform. Also, depending on hardware it’s likely that you want to do different scheduling of the rendered frame - i.e its not as simple as saying: here are our 4 different paths we select from based on if the user has 1-4 GPUs in their systems, as that breaks down as soon as you don’t have the exact same GPUs in the system.
In the future I think we might want to move to an even higher level of abstraction of the rendering pipe that makes it easier to reason about different paths through it. Something that decouples the strict flow through the rendering pipe and instead only reasons about various “jobs” that needs to be executed by the GPUs and what their dependencies are. The engine could then dynamically re-schedule the frame load depending on hardware automatically… at least in theory, in practice I think it’s more likely that we would end up with a few different “frame scheduling configurations” and then select one of them based on benchmarking / hardware setup.
Memory
As mentioned earlier our system for dealing with GPU resources is very static, resources declared in the global_resource
set are allocated as the renderer boots up and not released until the renderer is closed. On last gen consoles we had support for aliasing memory of resources of different types but we removed that when deprecating those platforms. With the rise of DX12/Vulkan and the move to 4K rendering this static resource system is in need of an overhaul. While we can (and do) try to recycle temporary render targets and buffers throughout the a frame it is easy to break some code path without noticing.
We’ve been toying with similar ideas to the “Transient Resource System” described in Yuriy O’Donnell’s excellent GDC2017 presentation: FrameGraph: Extensible Rendering Architecture in Frostbite but have so far not got around to test it out in practice.
DX12 improvements
Today our system implicitly deals with binding of input resources to shader stages. We expose pretty much everything to the shader system by name and if a shader stage binds a resource for reading we don’t know about it until we create the RenderJobPackage
. This puts us in a somewhat bad situation when it comes to dealing with resource transitions as we end up having to do some rather complicated tracking to inject resource barriers at the right places during the dispatch stage of the RenderContexts
(See: RenderDevice
).
We could instead enforce declaration of all writable GPU resources when they get bound as input to a layer or resource generator. As we already have explicit knowledge of when a GPU resource gets written to by a layer or resource generator, adding the explicit knowledge of when we read from one would complete the circle and we would have all the needed information to setup barriers without complicated tracking.
Wrap up
Last week at GDC 2017 there were a few presentations (and a lot of discussions) around the concepts of having more high-level representations of a rendered frame and what benefits that brings. If you haven’t already I highly encourage you to check out both Yuriy O’Donnell’s presentation “FrameGraph: Extensible Rendering Architecture in Frostbite” and Aras Pranckevičius’s presentation: “Scriptable Render Pipeline”.
In the next post I will briefly cover the feature set of the two render_configs
that we ship as template rendering pipes with Stingray.