poniedziałek, 18 listopada 2013

Reconstructing Camera Space Position from Depth

Memory bandwidth is a precious resource on the current generation GPU hardware. So precious that it is often the case when storing less data and using packing/unpacking algorithms can be much faster than storing fat data in unpacked form. The most common example of this issue that comes to my mind are deferred shading/lighting algorithms.

In deferred shading we usually need to store, in some offscreen buffer, positions of pixels that we are going to light. The simplest form is to use the coordinates of positions directly, that is, three floats per pixel. That forces us to use either RGBA32F format or, if we can get away with lower precision, RGBA16F format. In either case that is a lot of memory.

Can we do better? Well, yes. A common solution to this problem is to use, for instance, the depth buffer and reconstruct pixels' positions from it. We should be able to do that, right? If we could get from camera space to projection space and store values in the depth buffer, then we can do the reverse. To do that we just need to take the value from the depth buffer and find $x$ and $y$ normalized device coordinates of the pixel we're processing (those are fairly easy to find). Once we have those values we just need to multiply a vector composed of these values (plugging $w=1$) and multiply it by the inverse projection matrix. That would give us position of the pixel in camera space.

So are we done? Technically, yes. There are some flaws to this approach however. The first one is that the depth buffer is not always available for reading on all hardware (mobile devices for instance). That is not so much of a problem as we can generate a "custom" depth buffer that will most likely also be used for other effects, like soft particles and such. The second one is performance. The vector-matrix multiplication must take place in the pixel shader where every single instruction counts. The sole vector-matrix mul is four dot products. We might also need to normalize the output vector so that its $w=1$, which is one division.

There is fortunately a better way to reconstruct the camera space position from depth the will take barely one mul in the pixel shader! Let's have a look at the following picture:

The picture shows a side view of a camera's frustum. Point $(y, z)$ lies on the near plane. Our task is to find $y'$ of some point of interest. The value of $z'$ is known - it is just the linear depth value in camera space of that point. We can get it either by converting the normalized device coordinate (NDC) $z$ from the depth buffer to linear space (a word on this at the end of the post) or we can utilize the separate depth buffer we mentioned above. This separate buffer (which we will have to use when we don't have access to the hardware depth buffer) does not have to store the NDC $z$ values but rather linear camera space $z$ values. This is actually even more desirable for many practical applications.

So, we know $z'$. We want to find $y'$ but first we also need to find $y$ and $z$ values. This couple represents the coordinates of our point of interest on the camera's near plane. Given point's position in NDC we can calculate the coordinates on the near plane like this (in the vertex shader)
  vec2 position_ndc_normalized = 0.5 * position_ndc.xy / position_ndc.w;
  varying_positionOnNearPlane = vec3(nearPlaneSize.xy * position_ndc_normalized, zNear);
First, we scale the NDC position so that all coordinates are in $[-0.5, 0.5]$ range. Then we multiply those coordinates by the size (width and height) of the near plane and plug the camera's distance to the near plane zNear to the $z$ coordinate. This resulting vector (varying_positionOnNearPlane) can now be interpolated and read in the pixel shader.

As we now know $y$, $z$ and $z'$, evaluating $y'$ is easy - we just employ similiar triangles:
y' = \frac{y * z'}{z}
The pixel shader code:
  float depth_view = texture2DProj(gbufferLinearDepthTexture, varying_screenTexCoord).x;
  vec3 position_view = varying_positionOnNearPlane * depth_view / varying_positionOnNearPlane.z;
We're almost there. I promised there would be only one mul but at the moment there is one mul and one div. It might not be so obvious at first glance but the $z$ coordinate of varying_positionOnNearPlane is constant for all points - it's just the distance from the camera to the near plane so this division can be moved to the vertex shader.

And that's it. With a few simple observations and math "tricks" we managed to write an optimal pixel shader for camera space position reconstruction from linear depth.

One last thing that remains untold is how to recover linear depth from the hardware depth buffer. As we know, to get from camera space to projection (NDC) space we use some sort of projection matrix. We can use this fact to derive a simple formula that only transforms the $z$ coordinate from projection space to camera space. So let's say we have a vector $[x, y, z, 1]$ defining a point in camera space and we multiply it by a projection matrix (matrix taken after http://www.opengl.org/sdk/docs/man2/xhtml/gluPerspective.xml; note that we use row-major order and collapsed the entries' formulas) to get the vector in the projection space:
x & y & z & 1
A & 0 & 0 & 0 \cr
0 & B & 0 & 0 \cr
0 & 0 & C & -1 \cr
0 & 0 & D & 0
xA & yB & zC + D & -z
\frac{xA}{-z} & \frac{yB}{-z} & -(C + \frac{D}{z}) & 1
The term that goes into the depth buffer is $-(C + \frac{D}{z})$, where $z$ is the linear depth. Assuming the value stored in the depth buffer that we read is $z'$, we just need to solve the following equation for $z$:
-(C + \frac{D}{z}) = z' \cr
z = \frac{-D}{z' + C}
The constants $C$ and $D$ can be extracted from the projection matrix and passed to the pixel shader.

DEDYKACJA: tę notkę dedykuję Karolinie P., która w chwili obecnej nieco zmaga się z algebrą abstrakcyjną oraz programowaniem w C++, jednak jestem pewien, że jej upór i charakter pozwolą jej przebrnąć przez zarówno te zajęcia jak i cały semestr :).