sobota, 28 kwietnia 2012

Managing Shaders Combinations

Recently, when working on the engine (BlossomEngine) for a game that I am co-creating (http://gct-game.net) I encountered a problem that ultimtely every engine programmer must face - managing shader combinations.

It is often the case that in the engine we have some super-duper shader(s) with many switches. Such shaders are called uber-shaders. The idea of an uber-shader is that we do not write a separate shader (file) for every single surface, but rather we have on big shader where we can turn some features on and off, like the use of specular lighting on the surface, soft shadows, or anything else that can describe the material.

One way to accomplish this is by using static branching within the shader, using constant values supplied to the shader. This solution has some flaws, however. The first is that we generate more if-then instructions in the shader, which simply makes the shader longer. The other one is that we waste shader constants on feeding the shader with appropriate steering data.

The second way to manage an uber-shader is to generate completely separate shaders by using #ifdefs in the shader and thus avoiding the need to supply the shader with any constants steering data. The downside is that we will now generate a bunch of shaders which we have to manage somehow in the application. Note that the number of generated shaders grows exponentially with the number of #ifdefs. For example, consider a situation where our shader contains two #ifdefs, A and B. The possible combinations are:

No #ifdef
A
B
AB

Now, if we decide to add a third #ifdef C, we will have to double all shaders that we have so far, with new #ifdef C added to each mix:

No #ifdef
A
B
AB
C
AC
BC
ABC

So the general formula for the number of generated shaders is $2^n$, where n is the number of #ifdefs.

In BlossomEngine the most complex shader I have is a forward-shaded mesh sunlight shader. I must say that at the beginning of development, when I had a very, very few #ifdefs (normal mapping, specular lighting) I didn't bother about having any sophisticated manager. So I simply declared each combination as a separate shader, like this:
CShader meshSunlightPixelShader;
CShader meshSunlightPixelShader_NormalMapping;
CShader meshSunlightPixelShader_Specular;
CShader meshSunlightPixelShader_NormalMapping_Specular;
Obviously, as the game engine evolves, you are developing more and more complex shaders. So this way I ended up having 16 different combinations, what was really troublesome to manage in the application code. I decided to write some automatic system that would generate an array of shaders from given parameters. So I did that. My new class CShadersPack was able to generate an array of CShader objects from a single shader file that contains #ifdefs. A call to init function of that class can look like this:
meshSunLightPixelShadersPack.init("./BlossomEngine/shaders/mesh_sun_light.ps",
 stPixelShader, "-DSPECULAR -DCUBE_MAP_REFLECTIONS -DSOFT_SHADOWS 
-DDIFFUSE_SHADING_NORMAL -DDIFFUSE_SHADING_NORMAL_MAPPED 
-DAMBIENT_SHADING_NORMAL_MAPPED -DRIM_LIGHTING");
The list of all #ifdefs is passed as one string, with all #ifdefs separated with a space. The function splits that string and generates all combinations.

Now the question is, when you already have those shaders generated, how to get the one you need from the CShadersPack object. Say we have A, B and C #ifdefs, and we want to get AC shader. One way to do this would be to search through the shaders array in CShadersPack and find the one with name that matches the AC combination. I think I don't have to point out how wasteful in CPU time that would be. There is a much better way. Let's again see how combinations are generated. First, we have #ifdef A. Combinations are:

No #ifdef
A

Now we add B:

No #ifdef
A
B
BA

Now C:

No #ifdef
A
B
BA
C
CA
CB
CBA

So as you can see, there is some regularity. For instance, if we are starting with shaderIndex = 0 and we want to pick a shader with feature A, we will have to add $1$ to shaderIndex. If we want also feature B, we must add $2$, if we want feature C, we must add $4$, and so on... So to get AC combination we get shaderIndex = 1 + 4 = 5. By looking at the list above we see that indeed, CA is 5th on the list (0-based indexing). Adding a couple of integers and indexing the shaders array will probably be much faster than looping through the array and comparing strings. In real code this looks like this:
int shaderIndex = 0;

if (useSpecular)
 shaderIndex += 1;
if (useReflectionsCubeMap)
 shaderIndex += 2;
if (useSoftShadows)
 shaderIndex += 4;
if (meshesInstances[i]->material.diffuseShadingType != CMeshMaterial::dstNone)
 shaderIndex += 8;
if (meshesInstances[i]->material.diffuseShadingType == CMeshMaterial::dstNormalMapped)
 shaderIndex += 16;
if (meshesInstances[i]->material.ambientShadingNormalMapped)
 shaderIndex += 32;
if (useRimLighting)
 shaderIndex += 64;

Okay, so at this point I had a nice system to manage shader combinations. Unfortunately, I quickly discovered an annoying problem - with nested #ifdefs. In my mesh sunlight shader I have #ifdef called DIFFUSE_SHADING_NORMAL. If it is turned on, the shader will perform classical NdotL lighting based on interpolated vertex normal. However, I want to have the ability to perform the shading using normal from a normal map. So inside DIFFUSE_SHADING_NORMAL I have DIFFUSE_SHADING_NORMAL_MAPPED which takes normal from the normal map into the computations. Does this cause a big problem? Yup, it does.

Let's now assume we have a shader with #ifdef A, B and C, but C is located inside the #ifdef B. Something like this:
#ifdef A
 A code
#endif

#ifdef B
 B code
 #ifdef C
  C code
 #endif
 B code
#endif
Combinations generated with my system would still be:

No #ifdef
A
B
BA
C
CA
CB
CBA

But hey, since C is dependent on B, the combination AC will generate exactly the same code as combination A. As we add more and more #ifdefs we will have more and more duplicated shaders. This is wasteful in terms of compilation/assembling time, as well as in terms of wasted GPU memory to store those duplicated shaders.

A way around this problem would be to, put simply, generate combinations in a more intelligent way. This raises some new problems, however, like computing a proper index to the shaders array in CShadersPack, or passing the #ifdefs list to CShadersPack::init. The dependencies would have to marked in some way. This solution can certainly be done, but fortunately to me I figured a much easier to implement and manage way to handle this problem.

First of all, the general idea of generating arguments combinations remains the same, so we always have $ 2^n$ combinations, where n is the number of arguments. However, I do not generate the shaders immediately. Instead, a particular shader combination is generated when a mesh with particular material is to be used. So let's say I want to use a shader that only has rim lighting (see "real" code above). In that case I look for an arguments combination under index $64$. I see that it has not been used, so I generate that shader, store the pointer in the shaders array, and hold index into that shaders array in the arguments combinations list. Here's the code for CShadersPack class:
class CShadersPack
{
 struct ShaderCombination
 {
  std::string args;
  int indexToShadersList;
 };

private:
 std::string fileName;
 ShaderType shaderType;

 int combinationsNum; // 2 raised to number of args
 int shadersNum; // number of actually generated shaders

 ShaderCombination *combinationsList;
 std::vector< CShader* > shaders;

public:
 void init(const std::string &fileName, ShaderType shaderType, const std::string &args = "", const std::string &argsToAffix = "");
 void free();

 CShader* getShader(int index);
};
As you can see, there is ShaderCombination structure. The list of that type (combinationsList) is generated a priori, and has the size of $2^n$, where n is the number of arguments. indexToShadersList is initialized with $-1$ value, indicating that this particular combination does not have a shader generated yet. Once the shader is generated, this value will point to shaders vector.

Let's now see how CShadersPack::getShader looks like:
CShader* CShadersPack::getShader(int index)
{
 if (combinationsList[index].indexToShadersList == -1)
 {
  combinationsList[index].indexToShadersList = shaders.size();

  CShader* shader = new CShader();
  shader->init(fileName, shaderType, combinationsList[index].args);
  shaders.push_back(shader);
 }

 return shaders[combinationsList[index].indexToShadersList];
}
So, if indexToShadersList is still $-1$, it means we have to compile/assemble the shader and store it in the shaders list. Finally, a proper pointer to the shader is returned.

This solution is simple, robust and easy to implement and manage. One flaw is that we cannot generate all shaders (actual shaders) when the game is starting because we don't know what materials are used, so we have to generate them during actual gameplay. This will however introduce some slight jerking once an object with a new material combination shows up. But hey, can't we just cache already used arguments combinations on the disk, and generate shaders with those arguments once the game is starting? Yes, we can! And that is what my engine does.