## środa, 23 września 2015

### Deferred Lighting on Android

This post is a bit not on time. I made a demo of deferred lighting running on Android (tested on Tegra 2, Tegra 3 and Adreno 220 devices) back three years ago when it could still be considered realtively fresh. It's probably not today but nevertheless I wanted to share it with you.

The demo is here; the package also contains README.txt file with steps needed to run it. Basically you will need Eclipse, Android NDK and possibly other stuff. To get everything up and running when developing Android apps I use this https://developer.nvidia.com/AndroidWorks.

Now a few words on how my implementation works.

The demo uses OpenGL ES 2.0. This API allows to barely render into one render target at a time and as far as I know does not provide floating point formats. How do you generate the g-buffer with that limitation? Well, you could go with a few passes but yeah, that is insane in terms of performance for Android devices. That's why I settled with one pass and fit everything into a single RGBA8 buffer. All I need for basic deferred lighting is position and normals. Instead of postition we store (linear) depth using 16 bits of the buffer (position is reconstructed later in the lighting pass). In case of normals I store $x$ and $y$ components in view-space and reconstruct $z$ in the lighting pass, hence only two (8-bit) channels suffice. All fits into a 32-bit buffer. Please examine shaders code to see how it all works.

The demo can use two common optimizations for the lights in the lighting pass. One is using scissor test and the other using stencil buffer. Both can be turned on/off in code using macros USE_SCISSOR and USE_STENCIL.

The demo renders 18 small-to-medium lights on Tegra 3 with 21 FPS which I consider not bad. Here goes a picture:

## sobota, 19 września 2015

### As Simple As Possible Explanation of Depth of Field in Ray Tracer

Recently I was working on a ray tracer and came to implement depth of field. I looked at a few "tutorials" on how to do it and must admit I had some difficulties understanding them. They often introduce the concept of a lens and try to explain how actual physics work. Sort of. Well, I didn't like these explanations so I decided to share my own.

Have a look at the picture:

It shows a side view of the view frustum. There is the eye along with the "plane" (let's call it eye plane) the eye is on, the near plane at near plane distance (you could also call it film plane) and the focal plane at focal plane distance.

The point of depth of field is to make the image appear blurry everywhere where the generated rays don't intersect scene at focal plane distance. To achieve this, instead of ray tracing only one regular pinhole ray (the black ray in the picture), we generate a bunch of rays (let's call them depth of field rays) that originate at random places on the eye plane (one of these rays is the red ray in the picture). So, we know where the new ray originates. Now what we need to know is its direction. Since our image cannot be blurry at the focal plane, all of the depth of field rays must meet at the same point on the focal plane. This intersection point is simply calculated by finding the intersection point of the regular pinhole ray (the black ray in the picture) with the focal plane. So now, for each depth of field ray you have its origin (random point on the eye plane; the bigger the variance of those rays the bigger the DOF effect) and point it passes through (on the focal plane) hence the ray's direction is easy to be calculated. Once you've traced all DOF rays and got their radiances you just average them and get a nice depth of field effect.

That was it!

## wtorek, 19 maja 2015

### Spherical Harmonic Lighting - Analytical Lights

Recently I decided to finally get the hang of spherical harmonics (in the context  of computer graphics, of course). I started with the standard reading on the subject, which is Robin Green's paper. It's nice but I found an even better reading in A Gentle Introduction to PRT (and this paper as a bit of basis for this note). Both papers nicely explain how to integrate numerically the lighting environment and project it into SH. What they don't talk about is how to project analytical, directional lights into SH basis. One article I found on this subject is chapter 2.15 in ShaderX3 but (to me) it's not thorough enough, or at least it doesn't discuss some details I was interested in. A lot of light was shed on me here and here.

## SH Formulas

Here come the first three bands of SH functions (both in spherical and cartesian coords, with Condon-Shortley phase). I've seen them in various places but often with typos. The formulas here are taken from chapter 3.2 from GPU Pro 2.
\begin{eqnarray}
Y_0^0 &=& \frac{1}{2} \sqrt{\frac{1}{\pi}} &=& \frac{1}{2} \sqrt{\frac{1}{\pi}} \cr
Y_1^{-1} &=& \frac{-1}{2} \sqrt{\frac{3}{\pi}} \sin(\theta) \sin(\phi) &=& \frac{-1}{2} \sqrt{\frac{3}{\pi}} y \cr
Y_1^0 &=& \frac{1}{2} \sqrt{\frac{3}{\pi}} \cos(\theta) &=& \frac{1}{2} \sqrt{\frac{3}{\pi}} z \cr
Y_1^1 &=& \frac{-1}{2} \sqrt{\frac{3}{\pi}} \sin(\theta) \cos(\phi) &=& \frac{-1}{2} \sqrt{\frac{3}{\pi}} x \cr
Y_2^{-2} &=& \frac{1}{2} \sqrt{\frac{15}{\pi}} \sin^2(\theta) \sin(2\phi) &=& \frac{1}{2} \sqrt{\frac{15}{\pi}} xy \cr
Y_2^{-1} &=& \frac{-1}{2} \sqrt{\frac{15}{\pi}} \sin(\theta) \cos(\theta) \sin(\phi) &=& \frac{-1}{2} \sqrt{\frac{15}{\pi}} yz \cr
Y_2^0 &=& \frac{1}{4} \sqrt{\frac{5}{\pi}} (3 \cos^2(\theta) - 1) &=& \frac{1}{4} \sqrt{\frac{5}{\pi}} (3z^2 - 1) \cr
Y_2^1 &=& \frac{-1}{2} \sqrt{\frac{15}{\pi}} \sin(\theta) \cos(\theta) \cos(\phi) &=& \frac{-1}{2} \sqrt{\frac{15}{\pi}} zx \cr
Y_2^2 &=& \frac{1}{4} \sqrt{\frac{15}{\pi}} \sin^2(\theta) \cos(2\phi) &=& \frac{1}{4} \sqrt{\frac{15}{\pi}} (x^2 - y^2) \cr
\end{eqnarray}
We'll work only with the first three bands as this is enough for accurately representing directional diffuse (Lambertian BRDF) lighting.

## Projecting Light and Transfer Functions into SH

Remember we're dealing with directional lights acting on Lambertian surfaces. The light function $L$ is simply constant (light) color ($lc$), projected into SH in the direction of the light ($ld$). Thus, $L$ projected into SH gives $L_l^m$:
\begin{eqnarray}
L_0^0 &=& \frac{1}{2} \sqrt{\frac{1}{\pi}} lc \cr
L_1^{-1} &=& \frac{-1}{2} \sqrt{\frac{3}{\pi}} ld_y lc \cr
L_1^0 &=& \frac{1}{2} \sqrt{\frac{3}{\pi}} ld_z lc \cr
L_1^1 &=& \frac{-1}{2} \sqrt{\frac{3}{\pi}} ld_x lc \cr
L_2^{-2} &=& \frac{1}{2} \sqrt{\frac{15}{\pi}} ld_x ld_y lc \cr
L_2^{-1} &=& \frac{-1}{2} \sqrt{\frac{15}{\pi}} ld_y ld_z lc \cr
L_2^0 &=& \frac{1}{4} \sqrt{\frac{5}{\pi}} (3ld_z^2 - 1) lc \cr
L_2^1 &=& \frac{-1}{2} \sqrt{\frac{15}{\pi}} ld_z ld_x lc \cr
L_2^2 &=& \frac{1}{4} \sqrt{\frac{15}{\pi}} (ld_x^2 - ld_y^2) lc \cr
\end{eqnarray}
This is just light. We also need to project into SH the transfer function $A$, which in case of our Lambertian BRDF function is simply $\cos(\theta)$. The formula for this function projected into SH is derived here; have a look at equations (19) (this equation has two typos which are fixed here, thanks to KriS) and (26) in particular. Formulas:
\begin{eqnarray}
l = 1: & A_l &=& \frac{\sqrt{\pi}}{3} N_l \cr
l > 1, \mbox{odd}: & A_l &=& 0 \cr
l, \mbox{even}: & A_l &=& 2 \pi \sqrt{\frac{2l+1}{4 \pi}} \frac{(-1)^{l/2-1}}{(l+2)(l-1)} \frac{l!}{2^l ((n/2)!)^2} N_l \cr
\mbox{where} & N_l &=& \sqrt{\frac{4 \pi}{2l + 1}} \cr
\end{eqnarray}
Note that $A$ varies per SH band index. $A$ values for the first three SH bands are:
\begin{eqnarray}
A_0 &=& \pi \cr
A_1 &=& \pi \frac{2}{3} \cr
A_2 &=& \frac{\pi}{4} \cr
\end{eqnarray}
The final formula for each SH coefficient of light and transfer functions, projected into SH and convolved is:
\begin{eqnarray}
E_l^m = L_l^m A_l
\end{eqnarray}

## Final Vertex Color

To finally calculate the lighting for a vertex, given its normal $n$, we need to project the vertex's color ($vc$) in the direction of the normal into SH. This is pretty much the same what we did for the light function:
\begin{eqnarray}
V_0^0 &=& \frac{1}{2} \sqrt{\frac{1}{\pi}} vc \cr
V_1^{-1} &=& \frac{-1}{2} \sqrt{\frac{3}{\pi}} n_y vc \cr
V_1^0 &=& \frac{1}{2} \sqrt{\frac{3}{\pi}} n_z vc \cr
V_1^1 &=& \frac{-1}{2} \sqrt{\frac{3}{\pi}} n_x vc \cr
V_2^{-2} &=& \frac{1}{2} \sqrt{\frac{15}{\pi}} n_x n_y vc \cr
V_2^{-1} &=& \frac{-1}{2} \sqrt{\frac{15}{\pi}} n_y n_z vc \cr
V_2^0 &=& \frac{1}{4} \sqrt{\frac{5}{\pi}} (3n_z^2 - 1) vc \cr
V_2^1 &=& \frac{-1}{2} \sqrt{\frac{15}{\pi}} n_z n_x vc \cr
V_2^2 &=& \frac{1}{4} \sqrt{\frac{15}{\pi}} (n_x^2 - n_y^2) vc \cr
\end{eqnarray}
Convolved with $E_l^m$ it gives the final vertex color $C_l^m$ for each band:
\begin{eqnarray}
C_l^m &=& V_l^m E_l^m
\end{eqnarray}
To retrieve the final vertex color $C$ all we need is to sum all SH components:
\begin{eqnarray}
C = \sum\limits_{l=0}^2 \sum\limits_{m=-l}^l C_l^m
\end{eqnarray}

## środa, 1 kwietnia 2015

### Data-oriented Design and Fourier Transform

For some time now I've been thinking about this whole data-oriented design/programming idea that, among others, BitSquid guys have been preaching. To put it simple, the idea is to have data and functions that operate on this data declared separately. So it's closer to functional programming than to standard object-oriented programming. On the other end we have the just-mentioned object-oriented design. In this case we have some sort of an object that stores its state (data) and has methods (functions) that operate on that data. This is a simplification but should give you the general idea of the concept.

I'm not going to go into details of how to code in data-oriented manner. I'm a very adept of this technique. What I want is to share with you some of my old code (2010-2011 or something) and its refactored counter-part (2015); refactored with data-oriented design in mind. How I see it.

The code consists mostly of a rather simple implementation of math and image library. There is also a sample that performs Discrete Fourier Transform using the image library.

fourier-old
The math library code is in math/include/blossom_math/ and math/src/.
The general image code is in common/include/blossom_common/image.hpp and common/src/image.cpp.
The sample's code is in samples/fourier/src/main.cpp. Also, the Fourier functions can be found here.

fourier-new
The math code is in include/math/ and src/math/.
The image code is in include/image/ and src/image/. Also, the Fourier functions are here.
The sample's code is in samples/fourier/src/main.cpp.

Both archives come with binaries (for Windows). In samples/fourier/data/ there is run.bat file that will run the sample and perform Fourier transform on selected image using specified filter.

Feel free to express what you think about my data-oriented understanding :).

## wtorek, 17 marca 2015

### Stepping Towards Target Value

When you have some value, name it current, and want to increment it with some constant, name it step, you'd normally do something as simple as this:
current = current + step;

If you wanted to make sure it doesn't step further once it has reached some target value you would do this:
float current = 0.0f;
float target = 1.0f;
float step = 0.1f;

for (;;)
{
current = current + step;
if (current > target)
current = target;
}

Not a big deal. But once I stumbled upon such piece of code:
float current = 0.0f;
float target = 1.0f;
float step = 0.1f;

for (;;)
{
float delta = fabs(target - current) + 0.0001f;
float t = Clamp(step / delta);
current = Lerp(current, target, t);
}

Turns out this code, at first glance looking quite complicated, does the same thing. So, why did the author of that code preferred it this way instead of something more simple as the solution presented in the beginning of this post? I'm not sure. Maybe because this code works correctly when we arbitrarily change target value? For instance, the first solution won't work when target is smaller than current. But we can easily handle this case as well.

Anyway, I was curious to find out that if from mathematical standpoint these two approaches were identical. Here's the proof:
\begin{eqnarray}
a &=& current \cr
b &=& target \cr
s &=& step \cr
t &=& \frac{s}{b-a} \cr
\end{eqnarray}
\begin{eqnarray}
Lerp(a, b, t) = (1-t)a + tb &=& \cr
(a - \frac{s}{b-a}a) + (\frac{s}{b-a}b) &=& \cr
\frac{(b-a)a - sa + sb}{b-a} &=& \cr
\frac{(b-a)a+s(b-a)}{b-a} &=& \cr
a + s
\end{eqnarray}
So, to sum up, the lerp way results in simple incrementation of a with s.

## niedziela, 15 marca 2015

### Normal Map Generator (from diffuse maps)

In recent days I had a need to generate a normal map from diffuse map. Actually, I had to generate a couple of hundreds normal maps. So I quickly googled for a normal map generator. It's a common "problem" so I was sure a ready-to-use solution had already been there. Turned out I was wrong. Indeed I had stumbled upon a lot of normal map generators but they turned to either be too complex to use, costed money or - what is particularly incomprehensible - none of the programs I had found could run in command-line mode! Not even libs and headers were provided so I could write my own tool. Did they really expect me to click over a couple of hundreds diffuse maps? So eventually I came up with my own program. To my surprise it turned out to be easier than I had initially thought. I bet there are many algorithms for tackling this problem but I really needed anything.

You can get my program's code here. It's written in C# and runs in command-line. The program takes four arguments:
• input diffuse map file's path
• output normal map file's path (can be the same as first argument)
• scale - the bigger this value the bigger the "contrast" of the normal map will be
• border - width of border of the image filled with (0, 0, 1) normal
I'm not going to describe the algorithm here because the source code is simple enough to easily get a grasp of what's going on.

## piątek, 9 stycznia 2015

### A Better Way To Use Lerp To Smooth Things Out

In video games, or other areas that involve some movement and interactive actions, there is often a need to smooth out movement of some object. As a video game programmer we often stumble across an article or real-life code that tackles the problem in a way that is not satisfactory enough for us. Many people misuse linear interpolation (lerp) call for this purpose. Lerp is defined as follows: $$lerp(A, B, t) = A + (B-A)t.$$ The problem goes like this: we want to move an object from current position to $X$ smoothly. What some people do, e.g. Unity developers in their Mecanim Animation Tutorial is they call the following code every frame: $$\label{wrong} newPosition = lerp(oldPosition, X, a \cdot \Delta t),~~~~~~~~~~~~~~~~(1)$$ where $\Delta t$ is the time last frame lasted measured in seconds and $a$ is a constant that state how fast the transition to $X$ should occur.

Note that given scenario allows $X$ to be changed while the time progresses, so that the object follows the $X$ position. Also, even though the code uses lerp function call, the movement is not linear in any sense of that word.

The thing that caught our attention and made us wonder is that interpolation parameter (the last argument of lerp function) can sometimes be greater than one, e.g. when $\Delta t$ is big because last frame lasted a long time for some reasons. We don't want that.

Given solution is wrong because the movement coded in such way is not independent from the frame rate. To see it clearly, let us rewrite equation $1$ by parametrizing it with $\Delta t$ and make it recursive sequence of positions at $n$-th frame. $$\label{wrong2} P^{\Delta t}_n = lerp(P_{n-1}, X, a \cdot \Delta t)~~~~~~~~~~~~~~~(2)$$ We assume for a moment for simplicity that $\Delta t$ and $X$ do not change during execution of the code. For our purposes, we can also assume that $P^{\Delta t}_0=0$ and compute the explicit form of equation $2$ (using our math skills or Wolfram Alpha). $$P^{\Delta t}_n = X(1 - (1-a\cdot\Delta t)^n)$$ Now let us consider two special cases. Let $a=1$. One were code runs at $60$ FPS for $60$ frames and second one that runs at $10$ FPS for $10$ frames. $$P^{1/60}_{60} = X(1 - (1-\frac{1}{60})^{60}) \approx 0.6352 X$$ $$P^{1/10}_{10} = X(1 - (1-\frac{1}{10})^{10}) \approx 0.6513 X$$ Both cases last one second and it would be desirable if those values were the same. Thus, by example above, we see that this method is not frame rate independent.

Generally speaking, we want the following equality to be met $$\label{req} P^{\Delta t}_{n} = P^{\Delta t \cdot 1/\beta}_{n \cdot \beta}~~~~~~~~~~~~~~~(3)$$ for any value of $\beta$.

Once we have described and analysed the problem, we can try to come up with a better solution. We propose to use the following code: $$newPosition = lerp(oldPosition, X, 1 - e^{-a \cdot \Delta t})$$ where $e$ is the base of the natural logarithm and $a$ is a constant that denote how fast we want to interpolate the movement. Rewriting to a more concise form as we did before, we get $$P^{\Delta t}_n = lerp(P_{n-1}, X, 1 - e^{-a \cdot \Delta t}).$$ Assuming initial conditions $P^{\Delta t}_0=0$, the explicit version can be computed (we used some math software again). $$\label{good} P^{\Delta t}_n = X (1 - e^{-a\cdot\Delta t\cdot n})~~~~~~~~~~~~~~~(4)$$

Equation $4$ meets the requirement given by equation $3$: $$P^{\Delta t \cdot 1/\beta}_{n \cdot \beta} = X(1 - e^{-a\cdot\frac{1}{\beta}\cdot\Delta t \cdot n \cdot \beta}) = X (1 - e^{-a\cdot\Delta t\cdot n}) = P^{\Delta t}_n$$ and therefore is independent from the frame rate.

To sum up, we propose to use the $1-e^{-a\cdot\Delta t}$ instead of $a\cdot\Delta t$ as the interpolation factor. It makes much more sense and the movement becomes more predictable when the frame rate varies.