Why tesselation is not the cure for AIDS

25.05.2010 23:17 in 3D graphics

Recently everyone went insane about tesselation. Like every other shitstorm, there is a lot of BS about it.

tess.png

So to be clear: tesselation itself can only smooth objects, not make them more detailed. The "magic" comes from displacement mapping -- and it's not that easy.

For example I was browsing today Gamedev.net's Image of the Day forum and found this example of tesselation:

ess.jpg

Can you tell me that in above picture, tesselation is state-of-the-art technology, allowing extreme 3D experiences? Would you believe yourself?

I would split all bump mapping effects into 2 categories: flat surfaces & others. For flat surfaces we have wonderful raytrace-in-pixel-shader solutions right now, like relief mapping or my favorite Cone Step Mapping. Look:

t1.jpg t2.jpg
from Bruno Evangelista Detailed Surface

Apart from lighting which doesn't match (due to some shader errors probably), can you tell the difference? I mean, can you tell which is tesselated and displacement mapped and which is raytraced in pixel shader?

But flat surfaces are (very) easy case. What about complex meshes? The truth is: both parallax and displacement mapping will help as very little. Have you ever tried to bake heightmaps to use them will those techniques? There just aren't any tools available that would create good heightmaps. For this reason most games stick to normal mapping for complex meshes (or eventually do parallax with very little height which doesn't improve quality much, but can provide nice artifacs -- and that artifacts are growing along with parallax strength). Sure, Unigine graphics team could spend a lot of time to polish their dragon's textures -- well, thorns are quite easy as you can grab flat polygon and draw a circle with gradient on it:

spike.jpg

But what next? How long does it take to create ready-to-use mesh with real displacement map? I'm not saying that tesselation is bad. It's just greatly overrated.

BTW: Displacement mapping seems to have advantage about silhouette modifications. But look at this:

ccc.jpg

It's from Nvidia's SDK -- pure parallax mapping with fins extrusion. And look how does the shadow mapping work on parallax mapped surface. Now, this is something!

SSAO revisited

02.05.2010 22:30 in 3D graphics, programming

There are 2 big problems with SSAO. First is locality. The lower radius, the higher quality. But set radius too low and instead of occlusion you get very slow edge detection filter. And main disadvantage of SSAO is how it looks on flat surfaces. This is typical implementation (similar to Crysis') and looks OK as long as you want to use it as only shading method. If you want to combine it with normal lighting & shadows, it's just too messy.

Read more...

First screenshot of our FPS game

04.01.2010 02:14 in 3D graphics, apps, FPSGame

fps_small.jpg
click to visit gamedev.pl

Initially I didn't want to publish any screens from this FPS game, but sometimes you need a little motivation to keep things running. Especially when after few months the team still doesn't have game ready.

This is pretty old screen, beginning of October 2009. Its basically dynamic-only lighting (but no deferred lighting at that time) with SSAO. HUD was added as separate layer (we were testing a lot of different HUDs at that time). Map was created in Valve Hammer and finetuned in Blender.

My simple raytracer is 10x faster than NVidia OptiX!

21.10.2009 05:20 in raytracing, 3D graphics

rt-dabroz.jpgrt-nvidia.jpg

It's frustrating. I coded this sample about month ago. But the very next day I've upgraded my system to 64-bit Windows 7 and my raytracer magically refused to work. Finally I've got some time to fix it and post.

Basically it's the same demo that NVidia's using to promote OptiX. First of all, you can run OptiX samples on GeForce cards (not only QuadroFX) by disabling video card name check (very sneaky, NVidia!). So did I. And NVidia sample was running at 3-4 FPS, but it needs a lot of frames to produce final results (it's simillar to progressive JPEG). On the other hand, my very simple raytracer coded in CUDA easily hits VSync (60 FPS) using higher resolution and instant multisampling. Sure, my raytraces is not pixel accurate with OptiX, but its a matter of setting lights, objects positions and other stuff. And why does OptiX perform so bad? I have no idea.

You will probably have a lot of problem running these demos (I have no luck with release builds). But if you have Visual Studio and CUDA SDK you have chances. x86 build runs fine on 32bit Windows XP. x64 build runs on 64-bit Windows 7 (and probably Vista). Visuals in x64 versions are somewhat different from image in my post, because I needed to backport some code.

PS. I love 7 driver model. NVidia drivers (191.07 in my case) are still very buggy (or rather unstable) when running CUDA code, and automatic driver restart is much better than total system crash.

PS2. I'll try to find some more time and finally post my OpenGL 3.2 multisample demo source code.

What is and why do we need explicit_multisample? (or how to do real antialiasing in deferred shading)

16.09.2009 12:08 in 3D graphics, OpenGL

Deferred shading has lately become extremely popular. I’m not huge fan of it, but depending on typical scene in game (preferably indoor, lot of lights) it can be a great advantage. However, antialiasing is a real pain in DS case. Most gamed involved edge filter combined with blur, but the result is visually horrible (especially in low resolutions, where AA is a must). But why can’t we use multisample (MSAA/CSAA) with deferred shading?

Let’s see how multisample works. Up to now, we:

  • render the scene
  • downsample AA buffer to texture
  • render full-screen quad with texture (and probably some postprocess)

This of course won’t do the thing right with deferred shading. Why? Because it will downsample each G-buffer individually. See following picture.

p1.png

We have 4 pixels, 4 samples each (I won’t go into multisample details, let’s keep it simple) - a normal vector is stored in each sample. We downsample AA buffer and poof! Normals have gone wrong. Everything else will follow the same routine, so at edges we will have blurred normals/diffuse values and other data. Using AA will probably only boost visual artifacts.

But, OpenGL 3.0 and DirectX 10 has a new feature which is called explicit multisample (or custom resolve). It allows us to access each sample in multisample buffer. In this scenario, we don’t downsample AA buffer - we use it like a texture, so in lighting shader we have access to every normal/diffuse, and our computations look like the second picture.

p2.png

And we still benefit from multisampling (instead of supersampling). Time for some C++.

What do we need to do to upgrade our rendering? First, buffers creating:

glGenTextures(1, &tex);
glBindTexture(GL_TEXTURE_RENDERBUFFER_NV, tex);
glGenRenderbuffers(1, &buffer);
glBindRenderbuffer(GL_RENDERBUFFER, buffer);
glRenderbufferStorageMultisample(GL_RENDERBUFFER, 8, GL_RGBA32F, 1024, 768);
glTexRenderbuffer(GL_TEXTURE_RENDERBUFFER_NV, buffer);
glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_RENDERBUFFER, buffer);

And then, binding texture for FSQ:

glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_RENDERBUFFER_NV, tex);
glTexRenderbuffer(GL_TEXTURE_RENDERBUFFER_NV, buffer);
glUniform1i(sampler, 0);

Finally, let’s fix shader code. Assume we have following code:

#version 150
sampler2D sampler_diffuse, sampler_position, sampler_normal;
in vec2 texcoord; // [0,1]x[0,1]
out vec4 result;
vec4 compute_lighting(vec3 diffuse, vec3 position, vec3 normal)
{
  ...
}
void main()
{
  vec3 diffuse = texture2D(sampler_diffuse, texcoord).rgb;
  vec3 position = texture2D(sampler_position, texcoord).xyz;
  vec3 normal = texture2D(sampler_normal, texcoord).xyz;
  result = compute_lighting(diffuse, position, normal);
} 

We upgrade it to:

#version 150 
#extension GL_EXT_gpu_shader4 : enable
#extension GL_NV_explicit_multisample : enable
samplerRenderbuffer sampler_diffuse, sampler_position, sampler_normal;
in vec2 texcoord; // [0,1]x[0,1]
out vec4 result;
vec4 compute_lighting(vec3 diffuse, vec3 position, vec3 normal)
{
  ...
}
void main()
{
  const int samples = 8;
  result = vec4(0); 
  ivec2 texcoord2 = ivec2(textureSizeRenderbuffer(sampler_diffuse) * texcoord);
  for (int i = 0; i < samples; i++)
  {
    // AA renderbuffers are addressed with integers
    vec3 diffuse = texelFetchRenderbuffer(sampler_diffuse, texcoord2, i).rgb;
    vec3 position = texelFetchRenderbuffer(sampler_position, texcoord2, i).xyz;
    vec3 normal = texelFetchRenderbuffer(sampler_normal, texcoord2, i).xyz;
    result += compute_lighting(diffuse, position, normal);
  }
  result /= (float)samples;
} 

That’s it! There are various impovements we can do. For example, if we use shadow mapping, we can calculate shadow term per-pixel and then apply it to all samples. And we must hope that ATI would implement OpenGL 3.2 (and explicit multisample) soon.

Update: there is ARB_texture_multisample (now part of OpenGL core) that should do the same thing and be more portable. I'm going to check differences between this and nv_explicit_multisample soon!