Indirect Instanced draw calls
Draw calls in OpenGL aren’t the most expensive thing, but Approaching Zero Driver Overhead (AZDO) was always a cool optimization concept for me. Reducing the number of draw calls means a reduction in state changes, which is usually the bottleneck when faced with issues related to draw calls. I wanted to experiment with it, but couldn’t find much examples online, so I had to come up with my own implementation. I had a vector of indirect draw commands, which were created at initialization.
indirectElement baseElement;
baseElement.vertexCount = 6; //2 triangles = 6 vertices
baseElement.instanceCount = 1; //Draw 1 instance
baseElement.firstIndex = 0; //Draw from index 0 for this instance
baseElement.baseVertex = 0; //Starting from baseVert
baseElement.baseInstance = 0; //gl_InstanceID
Before each call, the game has to know how many instances there are, which I then set instance count to. The rest of the variables didn’t really need to be touched, although they can be manipulated, especially baseinstance, if you wanted to draw only a certain subset of objects. I split my draws into multiple indirectElements with particles in one call and main objects in another, which helped me maintain better control over when certain things are drawn. I won’t go into detail on that here, but one big issue I encountered was that I had many different textures, which meant that instancing wasn’t doing much for me. To get around this, I used bindless textures instead.
Bindless textures are another way to pass in textures, instead of calling glbindtexture, we can pass in an gluint64 handle which represents the texture into the shader through a buffer. This meant that I can index into the buffer to pick the right sprite.
My vertex shader simply takes in a buffer, which has the textures IDs of all my objects to be drawn. This was set up so that textureID 0 was the first handle in the buffer.
#version 450 core
.... //other stuff here
layout (location = 2) in uint aAnimationStep; // this is the z value for spritesheets
layout (location = 3) in int aTextureID; //this is what we are looking at
layout (location = 4) in mat4 instanceMatrix;
// Declare VS_OUT as an output interface block
out VS_OUT
{
vec2 TexCoord;
flat uint animationStep;
flat int textureID;
} vs_out;
Then in the fragment shader, it takes in the bindless handles buffer and indexes it with textureID.
#version 450 core
#extension GL_ARB_bindless_texture : require //this is required
uniform sampler2DArray bindlessHandles[200];
in VS_OUT
{
vec2 TexCoord;
flat uint animationStep;
flat int textureID;
} fs_in;
out vec4 color;
void main(void)
{
color = texture(bindlessHandles[fs_in.textureID],vec3(fs_in.TexCoord,fs_in.animationStep));
...//other stuff here
}
This worked great! I only needed a few draw calls to get everything on the screen right.
I wanted to draw this to explore multithreading to populate my buffers and calling the indirect draws, but that was cut for time. With a game as small as ours, there weren’t much in the way of performance gains, but it was a great learning experience.
There were a few limitations though, one big one was that renderdoc doesn’t work with bindless textures, it just straight up crashes. In the beginning I had a macro to disable all the bindless functions so that I can use renderdoc. It was really derpy trying to debug just a blank screen, but it worked fine since I was mostly interested in values passed in.
Another big one was that many iGPus don’t support this at all, so we had to create a fallback.