What is and why do we need explicit_multisample? (or how to do real antialiasing in deferred shading)
16.09.2009 12:08 in 3D graphics, OpenGL
Deferred shading has lately become extremely popular. I’m not huge fan of it, but depending on typical scene in game (preferably indoor, lot of lights) it can be a great advantage. However, antialiasing is a real pain in DS case. Most gamed involved edge filter combined with blur, but the result is visually horrible (especially in low resolutions, where AA is a must). But why can’t we use multisample (MSAA/CSAA) with deferred shading?
Let’s see how multisample works. Up to now, we:
- render the scene
- downsample AA buffer to texture
- render full-screen quad with texture (and probably some postprocess)
This of course won’t do the thing right with deferred shading. Why? Because it will downsample each G-buffer individually. See following picture.

We have 4 pixels, 4 samples each (I won’t go into multisample details, let’s keep it simple) - a normal vector is stored in each sample. We downsample AA buffer and poof! Normals have gone wrong. Everything else will follow the same routine, so at edges we will have blurred normals/diffuse values and other data. Using AA will probably only boost visual artifacts.
But, OpenGL 3.0 and DirectX 10 has a new feature which is called explicit multisample (or custom resolve). It allows us to access each sample in multisample buffer. In this scenario, we don’t downsample AA buffer - we use it like a texture, so in lighting shader we have access to every normal/diffuse, and our computations look like the second picture.

And we still benefit from multisampling (instead of supersampling). Time for some C++.
What do we need to do to upgrade our rendering? First, buffers creating:
glGenTextures(1, &tex); glBindTexture(GL_TEXTURE_RENDERBUFFER_NV, tex); glGenRenderbuffers(1, &buffer); glBindRenderbuffer(GL_RENDERBUFFER, buffer); glRenderbufferStorageMultisample(GL_RENDERBUFFER, 8, GL_RGBA32F, 1024, 768); glTexRenderbuffer(GL_TEXTURE_RENDERBUFFER_NV, buffer); glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_RENDERBUFFER, buffer);
And then, binding texture for FSQ:
glActiveTexture(GL_TEXTURE0); glBindTexture(GL_TEXTURE_RENDERBUFFER_NV, tex); glTexRenderbuffer(GL_TEXTURE_RENDERBUFFER_NV, buffer); glUniform1i(sampler, 0);
Finally, let’s fix shader code. Assume we have following code:
#version 150
sampler2D sampler_diffuse, sampler_position, sampler_normal;
in vec2 texcoord; // [0,1]x[0,1]
out vec4 result;
vec4 compute_lighting(vec3 diffuse, vec3 position, vec3 normal)
{
...
}
void main()
{
vec3 diffuse = texture2D(sampler_diffuse, texcoord).rgb;
vec3 position = texture2D(sampler_position, texcoord).xyz;
vec3 normal = texture2D(sampler_normal, texcoord).xyz;
result = compute_lighting(diffuse, position, normal);
}
We upgrade it to:
#version 150
#extension GL_EXT_gpu_shader4 : enable
#extension GL_NV_explicit_multisample : enable
samplerRenderbuffer sampler_diffuse, sampler_position, sampler_normal;
in vec2 texcoord; // [0,1]x[0,1]
out vec4 result;
vec4 compute_lighting(vec3 diffuse, vec3 position, vec3 normal)
{
...
}
void main()
{
const int samples = 8;
result = vec4(0);
ivec2 texcoord2 = ivec2(textureSizeRenderbuffer(sampler_diffuse) * texcoord);
for (int i = 0; i < samples; i++)
{
// AA renderbuffers are addressed with integers
vec3 diffuse = texelFetchRenderbuffer(sampler_diffuse, texcoord2, i).rgb;
vec3 position = texelFetchRenderbuffer(sampler_position, texcoord2, i).xyz;
vec3 normal = texelFetchRenderbuffer(sampler_normal, texcoord2, i).xyz;
result += compute_lighting(diffuse, position, normal);
}
result /= (float)samples;
}
That’s it! There are various impovements we can do. For example, if we use shadow mapping, we can calculate shadow term per-pixel and then apply it to all samples. And we must hope that ATI would implement OpenGL 3.2 (and explicit multisample) soon.
Update: there is ARB_texture_multisample (now part of OpenGL core) that should do the same thing and be more portable. I'm going to check differences between this and nv_explicit_multisample soon!
Comments:
-
Reavenk:
Wait, how do you specify which sample of the pixel you want to read from from the texelFetchRenderbuffer call? I'm trying to follow it, but it looks like you're just reading the exact same pixel 4 times and then dividing it by 4.
17.09.2009 14:50:06
-
You're right. I've made a typo while formatting code. I've corrected the code.
You pass sample ID as 3rd parameter of texelFetchRenderbuffer. So proper code is:
diffuse = texelFetchRenderbuffer(sampler_diffuse, texcoord2, i);
(i is number 0...3)
Thank you for pointing it out.17.09.2009 15:49:49
-
oscar:
Hi,
this exe crashes at init on ATI 4850 with Catalyst 9.9.
They have nv_explicit_multisample and changed shaders to #version 140
in Nvidia goes well with #version 140 (3.1) and opengl 3.1 Ati supports it
Please share the code so I can fix it!!18.09.2009 00:58:58
-
Ido Ilan:
Hi,
Could you please share the source code + executable for this tutorial.
Thanks,
Ido23.09.2009 16:55:44
-
OK, I will post source code. Stay tuned!
23.09.2009 17:49:02
-
John Aughey:
Any more information on ARB_texture_multisample comparison?
24.09.2009 17:07:23
-
Read my latest post about texture_multisample. :)
24.09.2009 17:24:07
-
maxest:
To be honest, I don't actually see how this overcomes supersampling. In fragment shader, you do calculations for 4 samples, each individually. So what is the difference between this solution and using 4x greater framebuffer?
29.09.2009 16:20:45
-
Well, we save a lot of fill-rate. Compute power (ALU) is getting more and more powerful but fill-rate is still a problem. There are also other advantages of multisampling:
* samples distribution isn't uniform, so the final result has better quality than box-filter-resized 4x greater framebuffer
* you can use alpha to coverage
* using AA 8x with 1920x1080 is possible on high-end graphic card. Try the same with 15360 x 8640. ;)29.09.2009 16:44:58
-
maxest:
"Well, we save a lot of fill-rate."
But with explicit_multisample you do 4 times more computation with each pixel. With 4 times greater framebuffer you got 4 times more pixels to process of course, but 4 times shorter pixel shader.
btw: in your fragment shader you use 4 samples but you create renderbuffer with 8. Was it your intetion or just an omission? :)29.09.2009 21:27:28
-
I repeat: computation is cheap. Operating on memory is not.
And about 4/8 samples: this is just a mistake, thanks for pointing it out.29.09.2009 21:41:41
-
maxest:
Huh, right. I was thinking only about computations and forgot the textures. But is accessing them so expensive?
When I finally get Vista installed I'll check super- vs multi- sampling :)29.09.2009 22:32:33
-
You don't need Vista to use OpenGL 3.2. Windows XP is just fine.
And yes, memory-related operation are more probable to become bottleneck with recent GPUs (such as NV80) than ALU.29.09.2009 22:37:37
-
maxest:
"You don't need Vista to use OpenGL 3.2. Windows XP is just fine. "
I know but I'm "DX fanboy" ;). And currently I don't have time to play with deferred shading :(29.09.2009 23:20:03
-
did:
Hello, very interesting subject you have there!
I am trying to run your sample programs (31nv_2, 32gl and 32nv).
All of them don't work, they go full screen and they exit immediately without showing anything on the screen. I tried running them from the command and no output there also.
The system is running Vista x64, the card is a 8800 ultra and I tried two driver version already the last I tried is 190.89. Both of the drivers I tried expose OpenGL 3.2 and the GL_NV_excplicit_multisampling.01.10.2009 21:43:44