tag:blogger.com,1999:blog-91103904134899675932018-03-21T09:16:39.499+01:00A(nother) Game ProgrammerWojciech Sternahttp://www.blogger.com/profile/13160846785717328316noreply@blogger.comBlogger22125tag:blogger.com,1999:blog-9110390413489967593.post-53147573165891394682018-03-20T12:10:00.001+01:002018-03-21T09:16:39.638+01:00Solving for Quadratic Equation with TensorFlow and Custom C++ CodeLet's just start with what is our problem. The problem we want to solve is to find quadratic equation's $Ax^2 + Bx + C$ coefficients $A$, $B$ and $C$ that best approximate a given set of 2D points $(x, y)$. This is a linear problem that can easily be solved with Least Squares Fitting [1]. However, here we will solve it using Stochastic Gradient Descent (SGD), which is a general algorithm that lets you solve for both linear and non-linear problems. This is a basic algorithm that all neural networks frameworks and based on.<br /><br />Some background for this blog post. Initally I started doing my first neural networks work using TensorFlow and found it very difficult to 1) work with that framework (I find PyTroch much more intuitive to work with) and 2) to force it to use GPU. Because of that I thought that since SGD is not a hard algorithm to implement I thought that I could write my own implementation of it using DirectCompute. But before doing that I had to see if I correctly understand what TensorFlow is doing under the hood. I was able to verify that. I wrote a simple quadratic equation solver with TensorFlow and then a simple quadratic solver in C++ using SGD. Now, the huge difference between the two is that TensorFlow computes derivatives (which are base for SGD) symbolically whereas my C++ solver does so numerically. Obviously, the first approach - symbolic differentiation - is the preferred way as it provides more stable results. But for quick verification numerical differentation was enough for me.<br /><br />Now a little bit on SGD itself. The idea is that we have some cost function as a starting point. This cost function could be the sum of distances of all points to the quadratic curve. Usually our aim is to minimize that cost function, that is, find input arguments for that function (in our case $A$, $B$ and $C$ of the quadratic equation) such that the cost function has value as small as possible (i.e. minimized). We also know there this mathematical entity called <i>gradient</i> which, when computed, tells us what values we should <i>add </i>to our input arguments so that the value of the cost function increases (yes, increases, not decreases) by more than it would have increased if we had chosen different values to add. If we repeat this step a few times we might eventually reach the top of the function, i.e. such input arguments where the value of the cost function is the biggest. Now since we're interested in finding the bottom of the function, not the top, we just need to <i>subtract</i> the gradient from our input arguments, not add it.<br /><br />As stated, we'll be finding $A$, $B$ and $C$ of the quadratic equation given a set of points $(x, y)$ that the quadratic equation is supposed to approximate. We will use here a short, predefined list of points: $(1, 0), (2, -1), (3, -2), (4, -3)$. Yeah, you've probably noticed that they all lie on a straight line so straight line would be what best approximates that set. But we want a quadratic. Because that's why :).<br /><br />TensorFlow code that solves this comes here:<br /><pre class="brush: python;">import tensorflow as tf<br />import numpy<br /><br /><br /># TensorFlow model<br /><br />tf_a = tf.Variable([0.0], dtype=tf.float32)<br />tf_b = tf.Variable([0.0], dtype=tf.float32)<br />tf_c = tf.Variable([0.0], dtype=tf.float32)<br /><br />tf_x = tf.placeholder(tf.float32)<br />tf_y = tf.placeholder(tf.float32)<br /><br />tf_cost = tf.reduce_sum(<br /> tf.square(tf_a*tf_x*tf_x + tf_b*tf_x + tf_c - tf_y))<br />tf_optimizer = tf.train.GradientDescentOptimizer(0.001).minimize(tf_cost)<br /><br /><br /># run the model<br /><br />points = [ [1, 2, 3, 4], [0, -1, -2, -3] ]<br /><br />sess = tf.Session()<br />sess.run(tf.global_variables_initializer())<br />for i in range(1000):<br /> sess.run(tf_optimizer,<br /> { tf_x: points[0], tf_y: points[1] })<br /><br />a, b, c, cost = sess.run(<br /> [tf_a, tf_b, tf_c, tf_cost],<br /> { tf_x: points[0], tf_y: points[1] })<br />print("A: ", a, " B: ", b, " C: ", c, " cost: ", cost);<br /></pre>As you can see, we're first building a TensorFlow model of the computational graph and then run it with data provided (1000 iterations, learning rate set to $0.001$). We define the cost function as the sum of differences between the quadratic equation with $x$ provided for all points, and actual $y$ provided for all points. For anyone who's worked at least a little bit with TensorFlow this code should be pretty self-explanatory. For those of you for which it is not I highly recommend going to [2].<br /><br />Okay. So there was the magic of TensorFlow. We will now see how to write code that does exactly the same calculations but with C++ (but using numerical differentation for ease of implementation and generalization). Let's just see the whole code:<br /><pre class="brush: cpp;">float Cost(const SVector2& pt, float a, float b, float c)<br />{<br /> float value = a*pt.x*pt.x + b*pt.x + c - pt.y;<br /> return value * value;<br />}<br /><br />...<br /><br />vector< SVector2 > points;<br />points.push_back(VectorCustom(1.0f, 0.0f));<br />points.push_back(VectorCustom(2.0f, -1.0f));<br />points.push_back(VectorCustom(3.0f, -2.0f));<br />points.push_back(VectorCustom(4.0f, -3.0f));<br /><br />const float epsilon = 0.0001f;<br /><br />float a = 0.0f;<br />float b = 0.0f;<br />float c = 0.0f;<br />float cost = 0.0f;<br />for (uint i = 0; i < 1000; i++)<br />{<br /> cost = 0.0;<br /> for (uint j = 0; j < points.size(); j++)<br /> cost += Cost(points[j], a, b, c);<br /><br /> float dCost_dA = 0.0f;<br /> float dCost_dB = 0.0f;<br /> float dCost_dC = 0.0f;<br /> for (uint j = 0; j < points.size(); j++)<br /> {<br /> dCost_dA += ( Cost(points[j], a + epsilon, b, c) -<br /> Cost(points[j], a - epsilon, b, c) ) /<br /> (2.0f * epsilon);<br /> dCost_dB += ( Cost(points[j], a, b + epsilon, c) -<br /> Cost(points[j], a, b - epsilon, c) ) /<br /> (2.0f * epsilon);<br /> dCost_dC += ( Cost(points[j], a, b, c + epsilon) -<br /> Cost(points[j], a, b, c - epsilon) ) /<br /> (2.0f * epsilon);<br /> }<br /><br /> a -= 0.001f * dCost_dA;<br /> b -= 0.001f * dCost_dB;<br /> c -= 0.001f * dCost_dC;<br />}<br /><br />cout << "A: " << a << " B: " << b << " C: " << c << " cost: " << cost << endl;</pre>We start with defining a cost function for a single point (note that TensorFlow, as it operates on tensors, computes the error for all points in single expression). This cost function is further used in iterations (also 1000) loop where costs for all points are summed together. The next step is doing SGD using numerical differentiation. Values <span style="font-family: "courier new" , "courier" , monospace;">dCost_<span style="font-family: "courier new" , "courier" , monospace;">dA</span></span>, <span style="font-family: "courier new" , "courier" , monospace;">dCost_<span style="font-family: "courier new" , "courier" , monospace;">dB</span></span> and <span style="font-family: "courier new" , "courier" , monospace;">dCost_<span style="font-family: "courier new" , "courier" , monospace;">dC</span></span> represent partial derivatives, the gradient. In the points loop we are using well-known two-point differentation formula ([3], see the one with $2h$ in the denominator) to calculate the partial derivatives $\frac{dCost}{dA}$, $\frac{dCost}{dB}$ and $\frac{dCost}{dC}$.<br />Once we have looped through all the points we have a joint gradient for all the points that tells us the direction of the greatest rate of increase of the cost function. Since we want to go towards the bottom of the function, not the top, we need to subtract the gradient (using learn rate of $0.001$, just like in TensorFlow example) from the model parameters $A$, $B$ and $C$.<br /><br />And that's it. Really. I can't be sure if TensorFlow doesn't do anything more sophisticated with the derivatives but the results I got from both TensorFlow code above and the C++ code above are very similar.<br />TensorFlow:<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-qLybi3VVOfE/WrDqQi8xgAI/AAAAAAAACfA/WGngy1eARQkpt6WanOEZsvReqiS8nS33ACLcBGAs/s1600/tensorflow.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="65" data-original-width="642" src="https://2.bp.blogspot.com/-qLybi3VVOfE/WrDqQi8xgAI/AAAAAAAACfA/WGngy1eARQkpt6WanOEZsvReqiS8nS33ACLcBGAs/s1600/tensorflow.png" /></a></div>C++:<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-1NOEQFgtUfk/WrDqatP20WI/AAAAAAAACfE/LcQQgVzXS1sBJI6pg-SHCv0YKXSphgTPACLcBGAs/s1600/cpp.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="18" data-original-width="639" src="https://4.bp.blogspot.com/-1NOEQFgtUfk/WrDqatP20WI/AAAAAAAACfE/LcQQgVzXS1sBJI6pg-SHCv0YKXSphgTPACLcBGAs/s1600/cpp.png" /></a></div>The reason that the results are not identical could be that TensorFlow uses doubles whereas I used floats. Another reason could be that TensorFlow uses symbolic differentiation while I used numerical.<br /><br />Now one mind wonder... Since as input we gave points that lie on a straight line why didn't we end up with $A=0$ and $0$ cost? A straight line should approximate these points best, right? That is correct. And in fact we can end up with $0$ cost once we have performed a sufficient number of iterations. In both cases above we made a thousand of them. But what coefficients we would end up with if we did 100k iterations? Here's the C++ code's answer:<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-xgAVTxpeNzY/WrDxX1WPLcI/AAAAAAAACfY/LPzgViGSCNA5pKcihvpIsusw55rOJZijQCLcBGAs/s1600/cpp2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="18" data-original-width="639" src="https://4.bp.blogspot.com/-xgAVTxpeNzY/WrDxX1WPLcI/AAAAAAAACfY/LPzgViGSCNA5pKcihvpIsusw55rOJZijQCLcBGAs/s1600/cpp2.png" /></a></div>We got almost straight line equation $-x + 1$.<br />For such a simple linear problem least squares algorithm would be a much better (cheaper and more numerically stable) option. Using SGD for that is like using heavy artillery to shoot a mosquito. But its great advantage is that you can shoot... solve many other kinds of functions. That's what it is for.<br /><br />The C++ code I wrote for SGD was to be the first step towards my custom GPU-based neural network implementation. For the time being I have ditched that idea because I found it surprisingly easy to work with PyTorch and that is what I'm exploring now with regard to GPU-based neural nets.<br /><br />[1] <a href="http://www.mn.uio.no/ifi/english/services/knowledge/cs/forelesn.kap5.pdf">http://www.mn.uio.no/ifi/english/services/knowledge/cs/forelesn.kap5.pdf</a><br />[2] <a href="https://www.youtube.com/watch?v=wuo4JdG3SvU">https://www.youtube.com/watch?v=wuo4JdG3SvU</a><br />[3] <a href="https://en.wikipedia.org/wiki/Numerical_differentiation">https://en.wikipedia.org/wiki/Numerical_differentiation</a>Wojciech Sternahttp://www.blogger.com/profile/13160846785717328316noreply@blogger.com0tag:blogger.com,1999:blog-9110390413489967593.post-12629143739165665792018-02-26T15:00:00.001+01:002018-02-26T15:00:50.731+01:00DirectX 11, HLSL, GatherRedEvery once in a while I am in need to use one of those <a href="https://msdn.microsoft.com/en-us/library/windows/desktop/ff471558(v=vs.85).aspx">Gather </a>functions from DirectX 11's HLSL library. <span style="font-family: "courier new" , "courier" , monospace;">GatherRed</span><span style="font-family: inherit;"> </span>in this case. This function is useful because it allows you take four samples with just one instruction and store them all in <span style="font-family: "courier new" , "courier" , monospace;">float4</span>. As the name indicates, of the four texels that are sampled simultaneously, only the values from the red channels will be stored in <span style="font-family: "courier new" , "courier" , monospace;">float4</span>. If you need data from other channels you can use respective functions. It is really worth using this function(s) if you only need data from one channel as calling one gather is faster than taking four samples individually.<br /><br />If instead of using your original UV coordinates to take one regular sample with <span style="font-family: "courier new" , "courier" , monospace;">Sample</span><span style="font-family: inherit;"> </span>or <span style="font-family: "courier new" , "courier" , monospace;">SampleLevel</span> you call <span style="font-family: "courier new" , "courier" , monospace;">GatherRed</span> which four samples (their red channels) exactly will be taken? This is something the DirectX's documentation doesn't specify so this short blog post is here to fill this gap. You can also stumble on that information in various DirectX 11 presentations.<br /><br />Take a look at the picture:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-RQv5B6fiTTI/WpIBORZIZbI/AAAAAAAACek/GlNHS729ufcG7U68F_faQEVugOpUvE5lwCLcBGAs/s1600/gather_red.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="200" data-original-width="200" src="https://1.bp.blogspot.com/-RQv5B6fiTTI/WpIBORZIZbI/AAAAAAAACek/GlNHS729ufcG7U68F_faQEVugOpUvE5lwCLcBGAs/s1600/gather_red.png" /></a></div><br />The grey pixel is the one whose UV coordinates we have in the shader (the very center of that texel, to be more specific). If you call <span style="font-family: "courier new" , "courier" , monospace;">GatherRed</span> you will get the four labeled samples (again, only their red channel's values). Probably a little bit counter-intuitively the value of the "base" sample is not stored in return value's $x$ component but $w$ as the image above shows. For better picture the two following snippets are equivalent:<br /><pre class="brush: cpp;">float myValueR = myTexture.Sample(mySampler, uv).x;<br />float myValueG = myTexture.Sample(mySampler, uv).y;<br />float myValueB = myTexture.Sample(mySampler, uv).z;<br />float myValueA = myTexture.Sample(mySampler, uv).w;</pre>and:<br /><pre class="brush: cpp;">float myValueR = myTexture.GatherRed(mySampler, uv).w;<br />float myValueG = myTexture.GatherGreen(mySampler, uv).w;<br />float myValueB = myTexture.GatherBlue(mySampler, uv).w;<br />float myValueA = myTexture.GatherAlpha(mySampler, uv).w;</pre>And these as well: <br /><pre class="brush: cpp;">float r1 = myTexture.Sample(mySampler, uv).x;<br />float r2 = myTexture.Sample(mySampler, uv, int2(1, 0)).x;<br />float r3 = myTexture.Sample(mySampler, uv, int2(0, 1)).x;<br />float r4 = myTexture.Sample(mySampler, uv, int2(1, 1)).x;<br /></pre>and: <br /><pre class="brush: cpp;">float4 samples = myTexture.GatherRed(mySampler, uv);<br />float r1 = samples.w;<br />float r2 = samples.z;<br />float r3 = samples.x;<br />float r4 = samples.y;<br /></pre>There. I hope you won't have to wonder anymore the order of samples returned by gathers :). At least I know I won't.Wojciech Sternahttp://www.blogger.com/profile/13160846785717328316noreply@blogger.com0tag:blogger.com,1999:blog-9110390413489967593.post-63192693276293537612016-09-15T00:22:00.002+02:002016-09-17T00:35:58.647+02:00Conversion Between std::string and std::wstringWhile developing my brand new (being over a year old at least...) framework I decided I want to use ANSI chars (<span style="font-family: "courier new" , "courier" , monospace;">std::string</span> or <span style="font-family: "courier new" , "courier" , monospace;">char</span>) as often as possible, instead of more complex encodings like Unicode (stored in <span style="font-family: "courier new" , "courier" , monospace;">std::wstring</span> or <span style="font-family: "courier new" , "courier" , monospace;">wchar_t</span>). As file names for instance. And for tons of other stuff. However, many functions, most notably from Windows SDK, expect wide chars instead of pure chars. This is the case with <span style="font-family: "courier new" , "courier" , monospace;">D3DCompileFromFile</span> function that I used. My custom function called <span style="font-family: "courier new" , "courier" , monospace;">CompileShaderFromFile</span> calls this function but takes as input file name to a shader file stored in <span style="font-family: "courier new" , "courier" , monospace;">std::string</span>, which cannot be passed to <span style="font-family: "courier new" , "courier" , monospace;">D3DCompileFromFile</span>. At least not without conversion. So I started to search Google for conversion functions between the two, chars and wide chars. To my surprise I found a lot of code that did not work straight away, was very complex or was platform-specific. Eventually, I thought that since built-in STL library is so packed with various conversion functions then maybe there is something to convert between <span style="font-family: "courier new" , "courier" , monospace;">string</span> and <span style="font-family: "courier new" , "courier" , monospace;">wstring</span>. Turns out there is. You can add a <span style="font-family: "courier new" , "courier" , monospace;">wchar_t</span> to <span style="font-family: "courier new" , "courier" , monospace;">string</span> and a <span style="font-family: "courier new" , "courier" , monospace;">char</span> to <span style="font-family: "courier new" , "courier" , monospace;">wstring</span>, and the append function (<span style="font-family: "Courier New",Courier,monospace;"><span style="font-family: "courier new" , "courier" , monospace;">+=</span></span> operator in my code) will perform proper conversion.<br />Here goes the code:<br /><pre class="brush: cpp;">inline wstring StringToWString(const string& s)<br />{<br /> wstring temp = L"";<br /><br /> for (uint i = 0; i < s.length(); i++)<br /> temp += (wchar_t)s[i];<br /><br /> return temp;<br />}<br /><br />inline string WStringToString(const wstring& s)<br />{<br /> string temp = "";<br /><br /> for (uint i = 0; i < s.length(); i++)<br /> temp += (char)s[i];<br /><br /> return temp;<br />}<br /></pre>Or equivalently:<br /><pre class="brush: cpp;">inline wstring StringToWString(const string& s)<br />{<br /> return wstring(s.begin(), s.end());<br />}<br /><br />inline string WStringToString(const wstring& s)<br />{<br /> return string(s.begin(), s.end());<br />}<br /></pre>And that's it. With these two functions (or actually only the first one) I was able to pass <span style="font-family: "courier new" , "courier" , monospace;">string</span> to all my custom functions and convert it to <span style="font-family: "courier new" , "courier" , monospace;">wstring</span> when necessary. Simple, clean and elegant solution.<br /><br /><b>IMPORTANT EDIT (17.09.2016):</b><br /><br />In a GameDev.net thread (<a href="http://www.gamedev.net/topic/682185-conversion-between-stdstring-and-stdwstring/">http://www.gamedev.net/topic/682185-conversion-between-stdstring-and-stdwstring/</a>) it was pointed out to me that the code above will only work for ISO-8859-1 encoding and that I'm actually not converting chars and wide chars but simply cast them. This is fine for the just-mentioned encoding but might cause troubles in others.<br /><br />Anyway, my problem has been solved with this simple solution so if you don't have funky chars in your paths and just want to pass wide chars which you know are ANSI (and stored in chars) then casting chars to wide chars will do the trick.Wojciech Sternahttp://www.blogger.com/profile/13160846785717328316noreply@blogger.com0tag:blogger.com,1999:blog-9110390413489967593.post-44578073800079445332015-09-23T18:05:00.004+02:002015-09-23T18:06:15.089+02:00Deferred Lighting on AndroidThis post is a bit not on time. I made a demo of deferred lighting running on Android (tested on Tegra 2, Tegra 3 and Adreno 220 devices) back three years ago when it could still be considered realtively fresh. It's probably not today but nevertheless I wanted to share it with you.<br /><br />The demo is <a href="http://maxest.gct-game.net/blog/DeferredLighting.zip">here</a>; the package also contains README.txt file with steps needed to run it. Basically you will need Eclipse, Android NDK and possibly other stuff. To get everything up and running when developing Android apps I use this <a href="https://developer.nvidia.com/AndroidWorks">https://developer.nvidia.com/AndroidWorks</a>.<br /><br />Now a few words on how my implementation works.<br /><br />The demo uses OpenGL ES 2.0. This API allows to barely render into one render target at a time and as far as I know does not provide floating point formats. How do you generate the g-buffer with that limitation? Well, you could go with a few passes but yeah, that is insane in terms of performance for Android devices. That's why I settled with one pass and fit everything into a single RGBA8 buffer. All I need for basic deferred lighting is position and normals. Instead of postition we store (linear) depth using 16 bits of the buffer (position is reconstructed later in the lighting pass). In case of normals I store $x$ and $y$ components in view-space and reconstruct $z$ in the lighting pass, hence only two (8-bit) channels suffice. All fits into a 32-bit buffer. Please examine shaders code to see how it all works.<br /><br />The demo can use two common optimizations for the lights in the lighting pass. One is using scissor test and the other using stencil buffer. Both can be turned on/off in code using macros <span style="font-family: "Courier New",Courier,monospace;">USE_SCISSOR</span> and <span style="font-family: "Courier New",Courier,monospace;">USE_STENCIL</span>.<br /><br />The demo renders 18 small-to-medium lights on Tegra 3 with 21 FPS which I consider not bad. Here goes a picture:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-a9uoncbOOWQ/VgLNqV8NcnI/AAAAAAAACG0/AcM8JbMEJSw/s1600/img.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="http://2.bp.blogspot.com/-a9uoncbOOWQ/VgLNqV8NcnI/AAAAAAAACG0/AcM8JbMEJSw/s640/img.jpg" width="360" /></a></div>Wojciech Sternahttp://www.blogger.com/profile/13160846785717328316noreply@blogger.com0tag:blogger.com,1999:blog-9110390413489967593.post-75044152198533216612015-09-19T18:46:00.002+02:002015-09-23T18:09:04.241+02:00As Simple As Possible Explanation of Depth of Field in Ray TracerRecently I was working on a ray tracer and came to implement depth of field. I looked at a few "tutorials" on how to do it and must admit I had some difficulties understanding them. They often introduce the concept of a lens and try to explain how actual physics work. Sort of. Well, I didn't like these explanations so I decided to share my own.<br /><br />Have a look at the picture: <br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-W1sBUNLqaxY/Vf2N2KB6koI/AAAAAAAACGk/1SEptg4C9TI/s1600/dof_ray.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-W1sBUNLqaxY/Vf2N2KB6koI/AAAAAAAACGk/1SEptg4C9TI/s1600/dof_ray.png" /></a></div><br />It shows a side view of the view frustum. There is the eye along with the "plane" (let's call it eye plane) the eye is on, the near plane at near plane distance (you could also call it film plane) and the focal plane at focal plane distance.<br /><br />The point of depth of field is to make the image appear blurry everywhere where the generated rays don't intersect scene at focal plane distance. To achieve this, instead of ray tracing only one regular pinhole ray (the black ray in the picture), we generate a bunch of rays (let's call them depth of field rays) that originate at random places on the eye plane (one of these rays is the red ray in the picture). So, we know where the new ray originates. Now what we need to know is its direction. Since our image cannot be blurry at the focal plane, all of the depth of field rays must meet at the same point on the focal plane. This intersection point is simply calculated by finding the intersection point of the regular pinhole ray (the black ray in the picture) with the focal plane. So now, for each depth of field ray you have its origin (random point on the eye plane; the bigger the variance of those rays the bigger the DOF effect) and point it passes through (on the focal plane) hence the ray's direction is easy to be calculated. Once you've traced all DOF rays and got their radiances you just average them and get a nice depth of field effect.<br /><br />That was it!Wojciech Sternahttp://www.blogger.com/profile/13160846785717328316noreply@blogger.com0tag:blogger.com,1999:blog-9110390413489967593.post-54974988519285244632015-05-19T11:55:00.000+02:002016-09-17T00:41:16.489+02:00Spherical Harmonic Lighting - Analytical LightsRecently I decided to finally get the hang of spherical harmonics (in the context of computer graphics, of course). I started with the standard reading on the subject, which is <a href="http://www1.cs.columbia.edu/~cs4162/slides/spherical-harmonic-lighting.pdf">Robin Green's paper</a>. It's nice but I found an even better reading in <a href="http://www.inf.ufrgs.br/~oliveira/pubs_files/Slomp_Oliveira_Patricio-Tutorial-PRT.pdf">A Gentle Introduction to PRT</a> (and this paper as a bit of basis for this note). Both papers nicely explain how to integrate numerically the lighting environment and project it into SH. What they don't talk about is how to project analytical, directional lights into SH basis. One article I found on this subject is chapter 2.15 in ShaderX3 but (to me) it's not thorough enough, or at least it doesn't discuss some details I was interested in. A lot of light was shed on me <a href="https://seblagarde.wordpress.com/2012/01/08/pi-or-not-to-pi-in-game-lighting-equation/">here</a> and <a href="http://www.gamedev.net/topic/668161-spherical-harmonics-shortcuts/">here</a>.<br /><h2>SH Formulas</h2>Here come the first three bands of SH functions (both in spherical and cartesian coords, with Condon-Shortley phase). I've seen them in various places but often with typos. The formulas here are taken from chapter 3.2 from GPU Pro 2.<br />\begin{eqnarray}<br />Y_0^0 &=& \frac{1}{2} \sqrt{\frac{1}{\pi}} &=& \frac{1}{2} \sqrt{\frac{1}{\pi}} \cr<br />Y_1^{-1} &=& \frac{-1}{2} \sqrt{\frac{3}{\pi}} \sin(\theta) \sin(\phi) &=& \frac{-1}{2} \sqrt{\frac{3}{\pi}} y \cr<br />Y_1^0 &=& \frac{1}{2} \sqrt{\frac{3}{\pi}} \cos(\theta) &=& \frac{1}{2} \sqrt{\frac{3}{\pi}} z \cr<br />Y_1^1 &=& \frac{-1}{2} \sqrt{\frac{3}{\pi}} \sin(\theta) \cos(\phi) &=& \frac{-1}{2} \sqrt{\frac{3}{\pi}} x \cr<br />Y_2^{-2} &=& \frac{1}{2} \sqrt{\frac{15}{\pi}} \sin^2(\theta) \sin(2\phi) &=& \frac{1}{2} \sqrt{\frac{15}{\pi}} xy \cr<br />Y_2^{-1} &=& \frac{-1}{2} \sqrt{\frac{15}{\pi}} \sin(\theta) \cos(\theta) \sin(\phi) &=& \frac{-1}{2} \sqrt{\frac{15}{\pi}} yz \cr<br />Y_2^0 &=& \frac{1}{4} \sqrt{\frac{5}{\pi}} (3 \cos^2(\theta) - 1) &=& \frac{1}{4} \sqrt{\frac{5}{\pi}} (3z^2 - 1) \cr<br />Y_2^1 &=& \frac{-1}{2} \sqrt{\frac{15}{\pi}} \sin(\theta) \cos(\theta) \cos(\phi) &=& \frac{-1}{2} \sqrt{\frac{15}{\pi}} zx \cr<br />Y_2^2 &=& \frac{1}{4} \sqrt{\frac{15}{\pi}} \sin^2(\theta) \cos(2\phi) &=& \frac{1}{4} \sqrt{\frac{15}{\pi}} (x^2 - y^2) \cr<br />\end{eqnarray}<br />We'll work only with the first three bands as this is enough for accurately representing directional diffuse (Lambertian BRDF) lighting. <br /><h2>Projecting Light and Transfer Functions into SH</h2>Remember we're dealing with directional lights acting on Lambertian surfaces. The light function $L$ is simply constant (light) color ($lc$), projected into SH in the direction of the light ($ld$). Thus, $L$ projected into SH gives $L_l^m$:<br />\begin{eqnarray}<br />L_0^0 &=& \frac{1}{2} \sqrt{\frac{1}{\pi}} lc \cr<br />L_1^{-1} &=& \frac{-1}{2} \sqrt{\frac{3}{\pi}} ld_y lc \cr<br />L_1^0 &=& \frac{1}{2} \sqrt{\frac{3}{\pi}} ld_z lc \cr<br />L_1^1 &=& \frac{-1}{2} \sqrt{\frac{3}{\pi}} ld_x lc \cr<br />L_2^{-2} &=& \frac{1}{2} \sqrt{\frac{15}{\pi}} ld_x ld_y lc \cr<br />L_2^{-1} &=& \frac{-1}{2} \sqrt{\frac{15}{\pi}} ld_y ld_z lc \cr<br />L_2^0 &=& \frac{1}{4} \sqrt{\frac{5}{\pi}} (3ld_z^2 - 1) lc \cr<br />L_2^1 &=& \frac{-1}{2} \sqrt{\frac{15}{\pi}} ld_z ld_x lc \cr<br />L_2^2 &=& \frac{1}{4} \sqrt{\frac{15}{\pi}} (ld_x^2 - ld_y^2) lc \cr<br />\end{eqnarray}<br />This is just light. We also need to project into SH the transfer function $A$, which in case of our Lambertian BRDF function is simply $\cos(\theta)$. The formula for this function projected into SH is derived <a href="http://cseweb.ucsd.edu/~ravir/papers/invlamb/josa.pdf">here</a>; have a look at equations (19) (this equation has two typos which are fixed here, thanks to <a href="https://knarkowicz.wordpress.com/">KriS</a>) and (26) in particular. Formulas:<br />\begin{eqnarray}<br />l = 1: & A_l &=& \frac{\sqrt{\pi}}{3} N_l \cr<br />l > 1, \mbox{odd}: & A_l &=& 0 \cr<br />l, \mbox{even}: & A_l &=& 2 \pi \sqrt{\frac{2l+1}{4 \pi}} \frac{(-1)^{l/2-1}}{(l+2)(l-1)} \frac{l!}{2^l ((n/2)!)^2} N_l \cr<br />\mbox{where} & N_l &=& \sqrt{\frac{4 \pi}{2l + 1}} \cr <br />\end{eqnarray}<br />Note that $A$ varies per SH band index. $A$ values for the first three SH bands are:<br />\begin{eqnarray}<br />A_0 &=& \pi \cr<br />A_1 &=& \pi \frac{2}{3} \cr<br />A_2 &=& \frac{\pi}{4} \cr <br />\end{eqnarray}<br />The final formula for each SH coefficient of light and transfer functions, projected into SH and convolved is:<br />\begin{eqnarray}<br />E_l^m = L_l^m A_l<br />\end{eqnarray}<br /><h2>Final Vertex Color</h2>To finally calculate the lighting for a vertex, given its normal $n$, we need to project the vertex's color ($vc$) in the direction of the normal into SH. This is pretty much the same what we did for the light function:<br />\begin{eqnarray}<br />V_0^0 &=& \frac{1}{2} \sqrt{\frac{1}{\pi}} vc \cr<br />V_1^{-1} &=& \frac{-1}{2} \sqrt{\frac{3}{\pi}} n_y vc \cr<br />V_1^0 &=& \frac{1}{2} \sqrt{\frac{3}{\pi}} n_z vc \cr<br />V_1^1 &=& \frac{-1}{2} \sqrt{\frac{3}{\pi}} n_x vc \cr<br />V_2^{-2} &=& \frac{1}{2} \sqrt{\frac{15}{\pi}} n_x n_y vc \cr<br />V_2^{-1} &=& \frac{-1}{2} \sqrt{\frac{15}{\pi}} n_y n_z vc \cr<br />V_2^0 &=& \frac{1}{4} \sqrt{\frac{5}{\pi}} (3n_z^2 - 1) vc \cr<br />V_2^1 &=& \frac{-1}{2} \sqrt{\frac{15}{\pi}} n_z n_x vc \cr<br />V_2^2 &=& \frac{1}{4} \sqrt{\frac{15}{\pi}} (n_x^2 - n_y^2) vc \cr<br />\end{eqnarray}<br />Convolved with $E_l^m$ it gives the final vertex color $C_l^m$ for each band:<br />\begin{eqnarray}<br />C_l^m &=& V_l^m E_l^m<br />\end{eqnarray}<br />To retrieve the final vertex color $C$ all we need is to sum all SH components:<br />\begin{eqnarray}<br />C = \sum\limits_{l=0}^2 \sum\limits_{m=-l}^l C_l^m<br />\end{eqnarray}Wojciech Sternahttp://www.blogger.com/profile/13160846785717328316noreply@blogger.com0tag:blogger.com,1999:blog-9110390413489967593.post-91468212920122143262015-04-01T17:52:00.000+02:002015-04-07T13:30:02.381+02:00Data-oriented Design and Fourier TransformFor some time now I've been thinking about this whole data-oriented design/programming idea that, among others, <a href="http://bitsquid.blogspot.com/">BitSquid</a> guys have been preaching. To put it simple, the idea is to have data and functions that operate on this data declared separately. So it's closer to functional programming than to standard object-oriented programming. On the other end we have the just-mentioned object-oriented design. In this case we have some sort of an object that stores its state (data) and has methods (functions) that operate on that data. This is a simplification but should give you the general idea of the concept.<br /><br />I'm not going to go into details of how to code in data-oriented manner. I'm a very adept of this technique. What I want is to share with you some of my old code (2010-2011 or something) and its refactored counter-part (2015); refactored with data-oriented design in mind. How I see it.<br /><br />The code consists mostly of a rather simple implementation of math and image library. There is also a sample that performs Discrete Fourier Transform using the image library.<br /><br /><a href="http://maxest.gct-game.net/blog/fourier-old.zip">fourier-old</a><br />The math library code is in <span style="font-family: "Courier New",Courier,monospace;">math/include/blossom_math/</span> and <span style="font-family: "Courier New",Courier,monospace;">math/src/</span>.<br />The general image code is in <span style="font-family: "Courier New",Courier,monospace;">common/include/blossom_common/image.hpp</span> and <span style="font-family: "Courier New",Courier,monospace;">common/src/image.cpp</span>.<br />The sample's code is in <span style="font-family: "Courier New",Courier,monospace;">samples/fourier/src/main.cpp</span>. Also, the Fourier functions can be found here.<br /><br /><a href="http://maxest.gct-game.net/blog/fourier-new.zip">fourier-new</a><br />The math code is in <span style="font-family: "Courier New",Courier,monospace;">include/math/</span> and <span style="font-family: "Courier New",Courier,monospace;">src/math/</span>.<br />The image code is in <span style="font-family: "Courier New",Courier,monospace;">include/image/</span> and <span style="font-family: "Courier New",Courier,monospace;">src/image/</span>. Also, the Fourier functions are here.<br />The sample's code is in <span style="font-family: "Courier New",Courier,monospace;">samples/fourier/src/main.cpp</span>.<br /><br />Both archives come with binaries (for Windows). In <span style="font-family: "Courier New",Courier,monospace;">samples/fourier/data/</span> there is <span style="font-family: "Courier New",Courier,monospace;">run.bat</span> file that will run the sample and perform Fourier transform on selected image using specified filter.<br /><br />Feel free to express what you think about my data-oriented understanding :).Wojciech Sternahttp://www.blogger.com/profile/13160846785717328316noreply@blogger.com0tag:blogger.com,1999:blog-9110390413489967593.post-63028383957189637332015-03-17T17:10:00.003+01:002015-03-17T17:12:04.540+01:00Stepping Towards Target ValueWhen you have some value, name it <span style="font-family: "Courier New",Courier,monospace;">current</span>, and want to increment it with some constant, name it <span style="font-family: "Courier New",Courier,monospace;">step</span>, you'd normally do something as simple as this:<br /><pre class="brush: cpp;">current = current + step;<br /></pre>If you wanted to make sure it doesn't step further once it has reached some <span style="font-family: "Courier New",Courier,monospace;">target</span> value you would do this:<br /><pre class="brush: cpp;">float current = 0.0f;<br />float target = 1.0f;<br />float step = 0.1f;<br /><br />for (;;)<br />{<br /> current = current + step;<br /> if (current > target)<br /> current = target;<br />}<br /></pre>Not a big deal. But once I stumbled upon such piece of code:<br /><pre class="brush: cpp;">float current = 0.0f;<br />float target = 1.0f;<br />float step = 0.1f;<br /><br />for (;;)<br />{<br /> float delta = fabs(target - current) + 0.0001f;<br /> float t = Clamp(step / delta);<br /> current = Lerp(current, target, t);<br />}<br /></pre>Turns out this code, at first glance looking quite complicated, does the same thing. So, why did the author of that code preferred it this way instead of something more simple as the solution presented in the beginning of this post? I'm not sure. Maybe because this code works correctly when we arbitrarily change <span style="font-family: "Courier New",Courier,monospace;">target</span> value? For instance, the first solution won't work when <span style="font-family: "Courier New",Courier,monospace;">target</span> is smaller than <span style="font-family: "Courier New",Courier,monospace;">current</span>. But we can easily handle this case as well.<br /><br />Anyway, I was curious to find out that if from mathematical standpoint these two approaches were identical. Here's the proof:<br />\begin{eqnarray}<br />a &=& current \cr<br />b &=& target \cr<br />s &=& step \cr <br />t &=& \frac{s}{b-a} \cr<br />\end{eqnarray}<br />\begin{eqnarray}<br />Lerp(a, b, t) = (1-t)a + tb &=& \cr<br />(a - \frac{s}{b-a}a) + (\frac{s}{b-a}b) &=& \cr<br />\frac{(b-a)a - sa + sb}{b-a} &=& \cr<br />\frac{(b-a)a+s(b-a)}{b-a} &=& \cr<br />a + s <br />\end{eqnarray}<br />So, to sum up, the lerp way results in simple incrementation of <span style="font-family: "Courier New",Courier,monospace;">a</span> with <span style="font-family: "Courier New",Courier,monospace;">s</span>.Wojciech Sternahttp://www.blogger.com/profile/13160846785717328316noreply@blogger.com0tag:blogger.com,1999:blog-9110390413489967593.post-38114209201557777862015-03-15T21:20:00.000+01:002015-03-15T21:20:05.983+01:00Normal Map Generator (from diffuse maps)In recent days I had a need to generate a normal map from diffuse map. Actually, I had to generate a couple of hundreds normal maps. So I quickly googled for a normal map generator. It's a common "problem" so I was sure a ready-to-use solution had already been there. Turned out I was wrong. Indeed I had stumbled upon a lot of normal map generators but they turned to either be too complex to use, costed money or - what is particularly incomprehensible - none of the programs I had found could run in command-line mode! Not even libs and headers were provided so I could write my own tool. Did they really expect me to click over a couple of hundreds diffuse maps? So eventually I came up with my own program. To my surprise it turned out to be easier than I had initially thought. I bet there are many algorithms for tackling this problem but I really needed anything.<br /><br />You can get my program's code <a href="http://maxest.gct-game.net/blog/NormalMapGenerator.cs">here</a>. It's written in C# and runs in command-line. The program takes four arguments:<br /><ul><li>input diffuse map file's path</li><li>output normal map file's path (can be the same as first argument)</li><li>scale - the bigger this value the bigger the "contrast" of the normal map will be</li><li>border - width of border of the image filled with <span style="font-family: "Courier New",Courier,monospace;">(0, 0, 1)</span> normal</li></ul>I'm not going to describe the algorithm here because the source code is simple enough to easily get a grasp of what's going on.Wojciech Sternahttp://www.blogger.com/profile/13160846785717328316noreply@blogger.com0tag:blogger.com,1999:blog-9110390413489967593.post-50753333273572021222015-01-09T22:03:00.000+01:002015-01-09T22:17:25.720+01:00A Better Way To Use Lerp To Smooth Things OutIn video games, or other areas that involve some movement and interactive actions, there is often a need to smooth out movement of some object. As a video game programmer we often stumble across an article or real-life code that tackles the problem in a way that is not satisfactory enough for us. Many people misuse linear interpolation (lerp) call for this purpose. Lerp is defined as follows: $$ lerp(A, B, t) = A + (B-A)t. $$ The problem goes like this: we want to move an object from current position to $X$ smoothly. What some people do, e.g. Unity developers in their <a href="https://www.youtube.com/watch?v=Xx21y9eJq1U">Mecanim Animation Tutorial</a> is they call the following code every frame: \begin{equation} \label{wrong} newPosition = lerp(oldPosition, X, a \cdot \Delta t),~~~~~~~~~~~~~~~~(1) \end{equation} where $\Delta t$ is the time last frame lasted measured in seconds and $a$ is a constant that state how fast the transition to $X$ should occur. <p>Note that given scenario allows $X$ to be changed while the time progresses, so that the object follows the $X$ position. Also, even though the code uses lerp function call, the movement is not linear in any sense of that word. </p><p>The thing that caught our attention and made us wonder is that interpolation parameter (the last argument of lerp function) can sometimes be greater than one, e.g. when $\Delta t$ is big because last frame lasted a long time for some reasons. We don't want that. </p><p>Given solution is wrong because the movement coded in such way is not independent from the frame rate. To see it clearly, let us rewrite equation $1$ by parametrizing it with $\Delta t$ and make it recursive sequence of positions at $n$-th frame. \begin{equation} \label{wrong2} P^{\Delta t}_n = lerp(P_{n-1}, X, a \cdot \Delta t)~~~~~~~~~~~~~~~(2) \end{equation} We assume for a moment for simplicity that $\Delta t$ and $X$ do not change during execution of the code. For our purposes, we can also assume that $P^{\Delta t}_0=0$ and compute the explicit form of equation $2$ (using our math skills or <a href="http://http://www.wolframalpha.com/">Wolfram Alpha</a>). $$ P^{\Delta t}_n = X(1 - (1-a\cdot\Delta t)^n) $$ Now let us consider two special cases. Let $a=1$. One were code runs at $60$ FPS for $60$ frames and second one that runs at $10$ FPS for $10$ frames. $$ P^{1/60}_{60} = X(1 - (1-\frac{1}{60})^{60}) \approx 0.6352 X $$ $$ P^{1/10}_{10} = X(1 - (1-\frac{1}{10})^{10}) \approx 0.6513 X $$ Both cases last one second and it would be desirable if those values were the same. Thus, by example above, we see that this method is not frame rate independent. </p><p>Generally speaking, we want the following equality to be met \begin{equation} \label{req} P^{\Delta t}_{n} = P^{\Delta t \cdot 1/\beta}_{n \cdot \beta}~~~~~~~~~~~~~~~(3) \end{equation} for any value of $\beta$. </p><p>Once we have described and analysed the problem, we can try to come up with a better solution. We propose to use the following code: $$ newPosition = lerp(oldPosition, X, 1 - e^{-a \cdot \Delta t}) $$ where $e$ is the base of the natural logarithm and $a$ is a constant that denote how fast we want to interpolate the movement. Rewriting to a more concise form as we did before, we get $$ P^{\Delta t}_n = lerp(P_{n-1}, X, 1 - e^{-a \cdot \Delta t}). $$ Assuming initial conditions $P^{\Delta t}_0=0$, the explicit version can be computed (we used some math software again). \begin{equation} \label{good} P^{\Delta t}_n = X (1 - e^{-a\cdot\Delta t\cdot n})~~~~~~~~~~~~~~~(4) \end{equation} </p><p>Equation $4$ meets the requirement given by equation $3$: $$ P^{\Delta t \cdot 1/\beta}_{n \cdot \beta} = X(1 - e^{-a\cdot\frac{1}{\beta}\cdot\Delta t \cdot n \cdot \beta}) = X (1 - e^{-a\cdot\Delta t\cdot n}) = P^{\Delta t}_n $$ and therefore is independent from the frame rate. </p><p>To sum up, we propose to use the $1-e^{-a\cdot\Delta t}$ instead of $a\cdot\Delta t$ as the interpolation factor. It makes much more sense and the movement becomes more predictable when the frame rate varies. </p>Bartosz Chodorowskihttp://www.blogger.com/profile/00338238119649979793noreply@blogger.com0tag:blogger.com,1999:blog-9110390413489967593.post-76937184524449632572014-09-15T01:12:00.001+02:002016-09-17T00:42:00.352+02:00Finding Two Max Values in an Array in ShaderOne of the problems I was faced was to, given four floats, find the two highest values and their indices in the array. This had to be done in a shader program so ideally no loops, arrays indexing or things like that. Fortunately my sort-of-a-boss, <a href="https://knarkowicz.wordpress.com/">KriS</a>, quickly found a solution that I'm presenting here.<br /><br />Here comes the code:<br /><pre class="brush: cpp;">void GetTwoMaxValues( in float4 values, out float2 weights, out float2 indices )<br />{<br /> float v1 = max( max( max( values.x, values.y ), values.z ), values.w );<br /> float i1 = 0;<br /> i1 = CndMask( values.y == v1, 1, i1 );<br /> i1 = CndMask( values.z == v1, 2, i1 );<br /> i1 = CndMask( values.w == v1, 3, i1 );<br /><br /> //<br /><br /> values.x = CndMask( i1 == 0, 0, values.x );<br /> values.y = CndMask( i1 == 1, 0, values.y );<br /> values.z = CndMask( i1 == 2, 0, values.z );<br /> values.w = CndMask( i1 == 3, 0, values.w );<br /><br /> float v2 = max( max( max( values.x, values.y ), values.z ), values.w );<br /> float i2 = 0;<br /> i2 = CndMask( values.y == v2, 1, i2 );<br /> i2 = CndMask( values.z == v2, 2, i2 );<br /> i2 = CndMask( values.w == v2, 3, i2 );<br /><br /> //<br /><br /> weights.x = v1;<br /> weights.y = v2;<br /> indices.x = i1;<br /> indices.y = i2;<br />}</pre>The code finds the first maximum value simply by "iterating" over elements and "max'ing" them. That was simple. But we also need to know the index of the value in the array. We assume it's the 0th index and compare the found value with other values from the array using <span style="font-family: "courier new" , "courier" , monospace;">CndMask</span> statement. If the compared value is equal to the value of the element in the array, we update the index.<br /><br />To find the second highest values we first zeroe the actual highest element found in the previous pass and do the same procedure as in the previous pass to find the second highest element.<br /><br />The code can actually be understood much quicker by actually reading it rather than reading explanation trying to explain it :). It doesn't do anything magical in algorithmic terms but shows how to solve (efficiently) this simple problem of two max values in a shader program.Wojciech Sternahttp://www.blogger.com/profile/13160846785717328316noreply@blogger.com1tag:blogger.com,1999:blog-9110390413489967593.post-69748440479678132862014-07-30T01:14:00.001+02:002016-09-17T00:42:24.901+02:00Why Are Kids Happy?Yes. This blog post is way different from what you (like three or four of you?) have accustomed to see here but I just have the need to share some of my life philosophies/thoughts, not just math/game/graphics-related stuff :).<br /><br />So, the subject of this post goes "why are kids happy?". Today I loosely pondered a bit on that with my flatmates (who by the way are some of my favorites and closest friends :)). For whatever it may sound, it started with Moomins and Smurfs... Think of these books or TV series or whatever form they assumed. You might have forgotten them but you remember them. Those hippo-shaped humanoids and blue gnoms. What do you think of them? Do you think how high their creators must have been at the time they made their concepts arts? What stuff did these guys inhale to make a story of humanoidal hippos with occasional humans appearing in form of an always-wandering Snufkin or girl/lady(?) called Little My? Not to mention the blue Smurf creatures who were almost always males, with one Smurfette lady who probably had to satisfy all of the other male Smurfs sexual needs. Are any of these or something similar on your mind right now? If yes, that is not a big deal. That's just what adults do. They analyze a whole lot. And then more and more thinking of how the worlds created in Smurfs or Moomins could exist in "real-life". And that's the problem. Kids <b>don't</b> do that. They don't analyze. They don't wonder on whether a village full of male blue gnoms and a single lady makes sense and what they do when lights go off. They just <b>experience this world as it is</b>. Sure, it's easy to manipulate kids this way because they are so trustful, but try from time to time to observe a child or a kid in a casual situation and see what fun he or she has from experiencing it, indulging in the moment.<br /><br />Kids just happen to live for the moment more often than adults, who spent more time on living somewhere else, whether it be the past or future. I want a new car, a new house, a good-looking partner, a thriving business... Don't you spend too much time on thinking (and worrying!) on all those things? What was the last time you walked barefoot on wet grass? Or just did something that did not quite fit to your daily routine? So next time you catch yourself on thinking what your life is going to be in a year from now, take a bike and go on a trip outside of city. There is so many beautiful green areas around! See them. Feel them. Experience them. Like this little kid who indeed does this, even though it might happen for him/her on a more subconscious than conscious level :).Wojciech Sternahttp://www.blogger.com/profile/13160846785717328316noreply@blogger.com2tag:blogger.com,1999:blog-9110390413489967593.post-80581422117369453132014-04-09T22:17:00.001+02:002016-09-17T00:42:33.890+02:00Tangent Basis in Compact FormRecently at <a href="http://flyingwildhog.com/">Flying Wild Hog</a> my task was to change the vertex format of meshes. Both static and skinned meshes used 12 bytes for storing normal, tangent, bitangent and color of the vertex. This looked something like this:<br /><pre class="brush: cpp;">UByte4N normal;<br />UByte4N tangent;<br />UByte4N bitangent;</pre>Each tangent basis vector is stored explicitly on 24 bits and the last 8 bits of each vector are intended to store one component of the vertex color (red stored in normal's alpha, green in tangent's alpha and blue in bitangent's alpha).<br /><br />The <span style="font-family: "courier new" , "courier" , monospace;">UByte4N</span> type internally stores one 32-bit unsigned integer (hence the <span style="font-family: "courier new" , "courier" , monospace;">U</span> prefix) and has some helper functions for packing and unpacking four 8-bit values. Obviously, as tangent basis vector's components are floats in normalized $[-1, 1]$ interval (hence the <span style="font-family: "courier new" , "courier" , monospace;">N</span> postfix meaning normalized), there are also functions which convert it to decimal $[0, 255]$ interval.<br /><br />This form of the tangent basis with color packed together is actually quite nice. The precision is not bad and all channels are fully utilized. The biggest problem is that in order to get color we must read three vectors. We can do better though.<br /><br />The first thing is to get rid off the bitangent vector altogether. The whole tangent basis can nicely be packed with just normal, tangent and one value that indicates the handedness of the entire basis. To get the handedness of the basis you can compute the determinant of a matrix that is formed of that basis - it will be either $1$ or $-1$ for an orthonormal basis. This way we save two values (one handedness value as opposed to three defining the bitangent vector) and we can extract the color to a separate vector:<br /><pre class="brush: cpp;">UByte4N normal; // .w unused<br />UByte4N tangent; // .w stores handedness<br />UByte4N color;</pre>It looks like now the color has been enriched with the alpha channel. Also, the $w$ component of the normal vector is unused. And hey, do we really need to waste 8 bits for the handedness? One value would suffice. Fortunately, GPUs are equipped with RGB10A2 format, where 10 bits are devoted to RGB channels and 2 bits for the alpha channel. This way we can store the tangent vector with greater precision. So, our new tangent basis and color part of the vertex structure is this:<br /><pre class="brush: cpp;">UX10Y10Z10W2N normal; // .w unused<br />UX10Y10Z10W2N tangent; // .w stores handedness<br />UByte4N color;</pre>Now this one looks scary at first glance. The <span style="font-family: "courier new" , "courier" , monospace;">U</span> prefix again means that the stored values are unsigned - decimal $[0, 1023]$ interval for the $x$, $y$, and $z$ components and $[0, 3]$ for the $w$ component. Also, the <span style="font-family: "courier new" , "courier" , monospace;">N</span> postfix means the that input float values to the structure must be normalized, that is in $[-1, 1]$ interval.<br /><br />Finally, some piece of working code:<br /><pre class="brush: cpp;">struct UX10Y10Z10W2N<br />{<br /> UInt32 m_value;<br /><br /> void Set( const float* xyz, int w )<br /> {<br /> m_value = 0;<br /><br /> for ( int i = 0; i < 3; i++ )<br /> {<br /> float f = ( xyz[ i ] + 1.0f ) * 511.5f;<br /><br /> if ( f < 0.0f )<br /> f = 0.0f;<br /> else if ( f > 1023.0f )<br /> f = 1023.0f;<br /><br /> m_value |= ( ( int )f ) << 10*i;<br /> }<br /><br /> m_value |= w << 30;<br /> }<br /><br /> void Get( float& x, float& y, float& z ) const<br /> {<br /> x = ( float )( ( m_value >> 0 ) & 1023 );<br /> y = ( float )( ( m_value >> 10 ) & 1023 );<br /> z = ( float )( ( m_value >> 20 ) & 1023 );<br /><br /> x = ( x - 511.0f ) / 512.0f;<br /> y = ( y - 511.0f ) / 512.0f;<br /> z = ( z - 511.0f ) / 512.0f;<br /> }<br /><br /> void Get( float& x, float& y, float& z, int& w ) const<br /> {<br /> Get( x, y, z );<br /> w = ( ( m_value >> 30 ) & 3 );<br /> }<br />};</pre>I'm leaving you without much explanation as it should be quite easy to read. One thing worth noting is that the $w$ component is not passed or read as float but rather as int. That is because I determined it does not make much sense dealing with float here as we have barely 2 bits. I pass here $0$ or $3$ (which will be converted to $1$ in the shader) depending on the handedness of the tangent basis.Wojciech Sternahttp://www.blogger.com/profile/13160846785717328316noreply@blogger.com0tag:blogger.com,1999:blog-9110390413489967593.post-64192122019315746142014-02-22T02:33:00.002+01:002016-09-17T00:42:32.658+02:00Edge Detection from Depths and NormalsEdge detection is an algorithm that comes handy from time to time in every graphics programmer's life. There are various approaches to this problem and various tasks that we need edge detection for. For instance, we might use edge detection to perform anti-aliasing (like in here <a href="http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter09.html">http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter09.html</a>) or to draw some stylish outlines. So how can we detect the edges? Some algortihms compare color differences between the pixel we are currently processing and its neighbours (like antialiasing algorithms FXAA and MLAA). Others, like the one I linked a few words before, use the rendered scene's geometry information, like the depth-buffer and normals, to do so. In this post I present a variant of this algorithm.<br /><br />The algorithm I use needs three pixels. The one we're processing and some two "different" neighbours. You can take the one to the left along with the one to the top for instance. Now, we read normal vectors (either in view or world space) at those pixels and depths. We also recover view or world space positions from the sampled depths (I showed how to do this here <a href="http://maxestws.blogspot.com/2013/11/recovering-camera-position-from-depth.html">http://maxestws.blogspot.com/2013/11/recovering-camera-position-from-depth.html</a>).<br /><br />So, to sum up at this point, we have three (I assume view space) normals ($N$, $N_{left}$ and $N_{top}$) and positions ($P$, $P_{left}$ and $P_{top}$). Here comes the shader code:<br /><pre class="brush: cpp;">bool IsEdge(<br /> float3 centerNormal,<br /> float3 leftNormal,<br /> float3 topNormal,<br /> float3 centerPosition,<br /> float3 leftPosition,<br /> float3 topPosition,<br /> float normalsDotThreshold,<br /> float distanceThreshold)<br />{<br /> // normals dot criterium<br /> <br /> float centerLeftNormalsDot = dot(centerNormal, leftNormal);<br /> float centerTopNormalsDot = dot(centerNormal, topNormal);<br /> <br /> if (centerLeftNormalsDot < normalsDotThreshold ||<br /> centerTopNormalsDot < normalsDotThreshold)<br /> {<br /> return true;<br /> }<br /> <br /> // distances difference criterium<br /> <br /> float3 v1 = leftPosition - centerPosition;<br /> float3 v2 = topPosition - centerPosition;<br /> <br /> if (abs(dot(centerNormal, v1)) > distanceThreshold ||<br /> abs(dot(centerNormal, v2)) > distanceThreshold)<br /> {<br /> return true;<br /> } <br /> <br /> //<br /> <br /> return false;<br />}</pre>Good starting value for <span style="font-family: "courier new" , "courier" , monospace;">normalsDotThreshold</span><span style="font-family: "courier new" , "courier" , monospace;"><span style="font-family: inherit;"> is $0.98$ and for </span></span><span style="font-family: "courier new" , "courier" , monospace;">distanceThreshold</span><span style="font-family: "courier new" , "courier" , monospace;"><span style="font-family: inherit;"> it's $0.01$.</span></span><br /><br />The code uses two criterions to determine whether we are on an edge. The first checks dots of the normals. It's purpose is to detect edges that appear on the continous surface but with varying normals, where, for instance, two perpendicular planes meet (think of a side of a box standing on a floor).<br /><br />The second criterium checks for distances. Imagine two planes parallel to each other but at different heights. When viewed in such a way that an edge of the top plane appears in front of the bottom plane we clearly have, well, an edge to detect. The normals dot product won't work because normals of all pixels here are the same (the planes are parallel). So in this case we need to track vectors from the center pixel to the neighbouring pixels. If they are too far away then we have an edge. Note here that we actually dot those vectors with the center pixel's normal. This is to avoid false positive cases for pixels who are on the same plane but are far away from each other. In that case, obviously, we don't care they are far away. They lie on the same plane so there is no edge.<br /><br />I know that a picture is worth more than a thousand words. That's why I preferred to described the code with words instead of putting images here. Wait... what? That's right. Your's best bet is to just take this simple piece of code and try it in practice to see exactly how it works. I highly recommend doing the following modifications and see what happens:<br />1. Leave just the normals dot criterium.<br />2. Leave just the distances difference criterium.<br />3. Normalize vectors $v_1$ and $v_2$ in the distances difference criterium.<br /><br />Have fun :). Wojciech Sternahttp://www.blogger.com/profile/13160846785717328316noreply@blogger.com0tag:blogger.com,1999:blog-9110390413489967593.post-11522407213237287182013-11-18T23:49:00.000+01:002016-10-15T18:16:44.413+02:00Reconstructing Camera Space Position from DepthMemory bandwidth is a precious resource on the current generation GPU hardware. So precious that it is often the case when storing less data and using packing/unpacking algorithms can be much faster than storing fat data in unpacked form. The most common example of this issue that comes to my mind are deferred shading/lighting algorithms.<br /><br />In deferred shading we usually need to store, in some offscreen buffer, positions of pixels that we are going to light. The simplest form is to use the coordinates of positions directly, that is, three floats per pixel. That forces us to use either RGBA32F format or, if we can get away with lower precision, RGBA16F format. In either case that is a lot of memory.<br /><br />Can we do better? Well, yes. A common solution to this problem is to use, for instance, the depth buffer and reconstruct pixels' positions from it. We should be able to do that, right? If we could get from camera space to projection space and store values in the depth buffer, then we can do the reverse. To do that we just need to take the value from the depth buffer and find $x$ and $y$ normalized device coordinates of the pixel we're processing (those are fairly easy to find). Once we have those values we just need to multiply a vector composed of these values (plugging $w=1$) and multiply it by the inverse projection matrix. That would give us position of the pixel in camera space.<br /><br />So are we done? Technically, yes. There are some flaws to this approach however. The first one is that the depth buffer is not always available for reading on all hardware (mobile devices for instance). That is not so much of a problem as we can generate a "custom" depth buffer that will most likely also be used for other effects, like soft particles and such. The second one is performance. The vector-matrix multiplication must take place in the pixel shader where every single instruction counts. The sole vector-matrix mul is four dot products. We might also need to normalize the output vector so that its $w=1$, which is one division.<br /><br />There is fortunately a better way to reconstruct the camera space position from depth the will take barely one mul in the pixel shader! Let's have a look at the following picture:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-N01BwHlRNnY/UK0FH72D0zI/AAAAAAAAAUQ/XEdQdjOBwtc/s1600/projection.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="171" src="https://2.bp.blogspot.com/-N01BwHlRNnY/UK0FH72D0zI/AAAAAAAAAUQ/XEdQdjOBwtc/s320/projection.png" width="320" /></a></div><br />The picture shows a side view of a camera's frustum. Point $(y, z)$ lies on the near plane. Our task is to find $y'$ of some point of interest. The value of $z'$ is known - it is just the linear depth value in camera space of that point. We can get it either by converting the normalized device coordinate (NDC) $z$ from the depth buffer to linear space (a word on this at the end of the post) or we can utilize the separate depth buffer we mentioned above. This separate buffer (which we will have to use when we don't have access to the hardware depth buffer) does not have to store the NDC $z$ values but rather linear camera space $z$ values. This is actually even more desirable for many practical applications.<br /><br />So, we know $z'$. We want to find $y'$ but first we also need to find $y$ and $z$ values. This couple represents the coordinates of our point of interest on the camera's near plane. Given point's position in NDC we can calculate the coordinates on the near plane like this (in the vertex shader)<br /><pre class="brush: cpp;"> vec2 position_ndc_normalized = 0.5 * position_ndc.xy / position_ndc.w;<br /> varying_positionOnNearPlane = vec3(nearPlaneSize.xy * position_ndc_normalized, zNear);<br /></pre>First, we scale the NDC position so that all coordinates are in $[-0.5, 0.5]$ range. Then we multiply those coordinates by the size (width and height) of the near plane and plug the camera's distance to the near plane <span style="font-family: "courier new" , "courier" , monospace;">zNear</span> to the $z$ coordinate. This resulting vector (<span style="font-family: "courier new" , "courier" , monospace;">varying_positionOnNearPlane</span>) can now be interpolated and read in the pixel shader.<br /><br />As we now know $y$, $z$ and $z'$, evaluating $y'$ is easy - we just employ similiar triangles:<br />\begin{eqnarray}<br />y' = \frac{y * z'}{z}<br />\end{eqnarray}<br />The pixel shader code:<br /><pre class="brush: cpp;"> float depth_view = texture2DProj(gbufferLinearDepthTexture, varying_screenTexCoord).x;<br /> vec3 position_view = varying_positionOnNearPlane * depth_view / varying_positionOnNearPlane.z;<br /></pre>We're almost there. I promised there would be only one mul but at the moment there is one mul and one div. It might not be so obvious at first glance but the $z$ coordinate of <span style="font-family: "courier new" , "courier" , monospace;">varying_positionOnNearPlane</span> is constant for all points - it's just the distance from the camera to the near plane so this division can be moved to the vertex shader.<br /><br />And that's it. With a few simple observations and math "tricks" we managed to write an optimal pixel shader for camera space position reconstruction from linear depth.<br /><br />One last thing that remains untold is how to recover linear depth from the hardware depth buffer. As we know, to get from camera space to projection (NDC) space we use some sort of projection matrix. We can use this fact to derive a simple formula that only transforms the $z$ coordinate from projection space to camera space. So let's say we have a vector $[x, y, z, 1]$ defining a point in camera space and we multiply it by a projection matrix (matrix taken after <a href="http://www.opengl.org/sdk/docs/man2/xhtml/gluPerspective.xml">http://www.opengl.org/sdk/docs/man2/xhtml/gluPerspective.xml</a>; note that we use row-major order and collapsed the entries' formulas) to get the vector in the projection space:<br />\[<br />\begin{bmatrix}<br />x & y & z & 1<br />\end{bmatrix}<br />\begin{bmatrix}<br />A & 0 & 0 & 0 \cr<br />0 & B & 0 & 0 \cr<br />0 & 0 & C & -1 \cr<br />0 & 0 & D & 0 <br />\end{bmatrix}<br />=<br />\begin{bmatrix}<br />xA & yB & zC + D & -z<br />\end{bmatrix}<br />\sim<br />\begin{bmatrix}<br />\frac{xA}{-z} & \frac{yB}{-z} & -(C + \frac{D}{z}) & 1<br />\end{bmatrix}<br />\]<br />The term that goes into the depth buffer is $-(C + \frac{D}{z})$, where $z$ is the linear depth. Assuming the value stored in the depth buffer that we read is $z'$, we just need to solve the following equation for $z$:<br />\begin{eqnarray}<br />-(C + \frac{D}{z}) = z' \cr<br />z = \frac{-D}{z' + C}<br />\end{eqnarray}<br />The constants $C$ and $D$ can be extracted from the projection matrix and passed to the pixel shader.<br /><br />DEDYKACJA: tÄ™ notkÄ™ dedykujÄ™ Karolinie P., ktÃ³ra w chwili obecnej nieco zmaga siÄ™ z algebrÄ… abstrakcyjnÄ… oraz programowaniem w C++, jednak jestem pewien, Å¼e jej upÃ³r i charakter pozwolÄ… jej przebrnÄ…Ä‡ przez zarÃ³wno te zajÄ™cia jak i caÅ‚y semestr :).Wojciech Sternahttp://www.blogger.com/profile/13160846785717328316noreply@blogger.com0tag:blogger.com,1999:blog-9110390413489967593.post-46318376043665556232012-10-03T15:08:00.000+02:002016-09-17T00:43:04.773+02:00Fast Texture Projection onto Geometry on the GPUWhile implementing various graphical algorithms, a problem that must be solved from time to time is projection of texture onto some geometry. This happens, for instance, when we want to make some water reflection or cast g-buffer textures onto light's geometry.<br /><br />The procedure is pretty straightforward. If we multiply a vertex's position by the whole world-view-projection matrix and perform the perspective division, we get vertex's position in normalized device coordinates, which all are in $[-1, 1]$ interval. If we wanted to project a texture onto geometry, we would need to do exactly the same thing (but use world-view-projection matrix of some sort of "projector", not the game's "camera") and remap the vertex's final $x$ and $y$ coordinates from $[-1, 1]$ interval to $[0, 1]$ interval, thus successfully generating texture coordinates, which can be used to sample a texture.<br /><br />Let's say that we have a vertex shader like this:<br /><pre class="brush: cpp;">uniform mat4 worldViewProjTransform;<br />uniform mat4 projectorWorldViewProjTransform;<br /><br />attribute vec4 attrib_position;<br /><br />varying vec4 varying_projectorTexCoord;<br /><br />void main()<br />{<br /> gl_Position = attrib_position * worldViewProjTransform;<br /><br /> varying_projectorTexCoord = attrib_position * projectorWorldViewProjTransform;<br />}<br /></pre>In this vertex shader we simply multiply the vertex's position by the projector's matrix. Perspective division and coordinates remap does not occur here yet. It will in the fragment shader: <br /><pre class="brush: cpp;">uniform sampler2D projectorSampler;<br /><br />varying vec4 varying_projectorTexCoord;<br /><br />void main()<br />{<br /> vec4 projectorTexCoord = varying_projectorTexCoord;<br /> <br /> projectorTexCoord /= projectorTexCoord.w;<br /> projectorTexCoord.xy = 0.5 * projectorTexCoord.xy + 0.5;<br /> <br /> gl_FragColor = texture2D(projectorSampler, projectorTexCoord.xy);<br />}<br /></pre>In this pixel shader we first do the perspective division and then, when we know we are in $[-1, 1]$ interval, we can do the remapping part.<br /><br />The title of this post contains word "fast". Is our approach fast? Well, with today's hardware it actually is but we can do better. The very first thing that should capture your attention is the division in the pixel shader. Can we do something about it? Yup. There is a function in basically any real-time shading language like <span style="font-family: "courier new" , "courier" , monospace;">texture2DProj</span> which, before taking a sample from the texture, will first divide all components of the second argument by the 4th component, $w$. So, we can try doing this: <br /><pre class="brush: cpp;">uniform sampler2D projectorSampler;<br /><br />varying vec4 varying_projectorTexCoord;<br /><br />void main()<br />{<br /> vec4 projectorTexCoord = varying_projectorTexCoord;<br /><br /> projectorTexCoord.xy = 0.5 * projectorTexCoord.xy + 0.5;<br /> <br /> gl_FragColor = texture2DProj(projectorSampler, projectorTexCoord);<br />}<br /></pre>Will this work? Of course not. That is because we switched the order of operations. In this version of code the projection happens <b>after </b>remapping whereas it should occur <b>before</b>. So, we need to fix the remapping code.<br /><br />Remember that the projected coordinates $[x, y, z, w]$ that come into the pixel shader are all in $[-w, w]$ interval. This means that we want to remap the coordinates from $[-w, w]$ to $[0, w]$. Then, the divison can be performed either by dividing manually and calling <span style="font-family: "courier new" , "courier" , monospace;">texture2D</span> or by calling <span style="font-family: "courier new" , "courier" , monospace;">texture2DProj</span> directly. So, our new pixel shader can look like this: <br /><pre class="brush: cpp;">uniform sampler2D projectorSampler;<br /><br />varying vec4 varying_projectorTexCoord;<br /><br />void main()<br />{<br /> vec4 projectorTexCoord = varying_projectorTexCoord;<br /><br /> projectorTexCoord.xy = 0.5 * projectorTexCoord.xy + 0.5 * projectorTexCoord.w;<br /> <br /> gl_FragColor = texture2DProj(projectorSampler, projectorTexCoord);<br />}<br /></pre>Are we done yet? Well, there is actually one more thing we can do. We can shift the "$0.5$" multiplication and addition to the vertex shader. So the final vertex shader is: <br /><pre class="brush: cpp;">uniform mat4 worldViewProjTransform;<br />uniform mat4 projectorWorldViewProjTransform;<br /><br />attribute vec4 attrib_position;<br /><br />varying vec4 varying_projectorTexCoord;<br /><br />void main()<br />{<br /> gl_Position = attrib_position * worldViewProjTransform;<br /> <br /> varying_projectorTexCoord = attrib_position * projectorWorldViewProjTransform;<br /> varying_projectorTexCoord.xy = 0.5*varying_projectorTexCoord.xy + 0.5*varying_projectorTexCoord.w;<br />}<br /></pre>and the final pixel shader is: <br /><pre class="brush: cpp;">uniform sampler2D projectorSampler;<br /><br />varying vec4 varying_projectorTexCoord;<br /><br />void main()<br />{<br /> vec4 projectorTexCoord = varying_projectorTexCoord;<br /><br /> gl_FragColor = texture2DProj(projectorSampler, projectorTexCoord);<br />}<br /></pre>Yes. Now we are done.<br /><br />One might wonder if we could do the perspective division in the vertex shader and call <span style="font-family: "courier new" , "courier" , monospace;">texture2D</span> instead of <span style="font-family: "courier new" , "courier" , monospace;">texture2DProj</span> in the pixel shader. The answer is: we can't. The hardware, during rasterization, interpolates all vertex-to-pixel shader varyings. Assuming that <span style="font-family: "courier new" , "courier" , monospace;">varying_projectorTexCoord</span> is $[x, y, z, w]$ vector, the value that is read in the pixel shader is $[\mbox{int}(x), \mbox{int}(y), \mbox{int}(z), \mbox{int}(w)]$, where $\mbox{int}$ is an interpolating function. If we performed the division in the vertex shader, the vector coming to the pixel shader would be of the form $[\mbox{int}(\frac{x}{w}), \mbox{int}(\frac{y}{w}), \mbox{int}(\frac{z}{w}), 1]$. And obviously $\mbox{int}(\frac{x}{w}) \neq \frac{\mbox{int}(x)}{\mbox{int}(w)}$ (same for other components). Note that the right hand side of this inequality is what happens when we do the perspective division in the pixel shader<br /><br />Instead of doing all this trickery we could include a special remapping matrix into the chain world-view-projector of the projector's matrix. What this matrix looks like can be found <a href="http://http.developer.nvidia.com/CgTutorial/cg_tutorial_chapter09.html">here</a> (this is also a great tutorial about texture projection in general by the way). The problem with this approach is that you might not always have this matrix injected into the whole chain and for some reason you either don't want or even can't do it. You could, naturally, create this remapping matrix in the vertex shader but that would raise the number of instructions. It is better to use the approach discussed throughout this post in that case, as it will make things running much quicker.Wojciech Sternahttp://www.blogger.com/profile/13160846785717328316noreply@blogger.com0tag:blogger.com,1999:blog-9110390413489967593.post-67903895255015930012012-06-22T00:53:00.000+02:002016-09-17T00:43:08.514+02:00Interactive Physics-based Grass AnimationRecently at <a href="http://www.madman-games.com/">Madman Theory Games</a> I had quite an interesting task to do. We're working on a Diablo-like hack & slash and the lead artist wanted grass (or actually some overgrown form of grass...) to deflect when, for instance, the player is wading through or when a fireball is sweeping over it. The general idea is rather trivial: compute deflected vertices positions based on some deflectors. My initial idea was to treat an entity (like the player) as a spherical deflector, and displace vertices positions based on the distance from the deflector. So when a grass's blade is right next to the player, it is displaced by some fixed maximum amount. At the border of the sphere it is not displaced at all. This worked just fine, but made the grass blades very stiff, with completely no spring-like motion when the deflector stops affecting the blade. So put simply, it was unnatural.<br /><br />To make blades swing a little like springs when a deflector stops affecting it I thought about adding some fake sinusoidal movement. That would however require tracking time telling how long a blade is not affected by deflectors at all, and scale the sinusoidal deflection based on that time. I didn't try this but I'm pretty sure that this approach would yield some problems, both with stability and realism.<br /><br />I eventually came to an inevitable conclusion - I need some simple, but physics-based model to control the blades. I thought that if I implemented a simple blades movement based on some external forces, it would be much easier and prdictable to control. So the first phase was to create a test square in a 2D world and try moving it by attaching external forces to it. This turned out to be piece of cake.<br /><br />The test square has a position $\mathbf{p}$ and velocity $\mathbf{v}$. We know that to move the square we should do this each frame:<br />\begin{eqnarray}<br />\mathbf{p} = \mathbf{p} + \mathbf{v} \Delta t<br />\end{eqnarray}<br />To model external forces affecting the square we need to find a way to modify $\mathbf{v}$. We know that the net force affecting the square has a value $\mathbf{F}$. We also know some of the most important formulas in the world:<br />\begin{eqnarray}<br />\mathbf{F} &=& m \mathbf{a} \cr<br />\mathbf{a} &=& \frac{\mathbf{F}}{m} \cr<br />\mathbf{a} &=& \frac{\Delta \mathbf{v}}{\Delta t} \cr<br />\Delta \mathbf{v} &=& \mathbf{a} \Delta t = \frac{\mathbf{F}}{m} \Delta t<br />\end{eqnarray}<br />So, we have a formula for $\Delta \mathbf{v}$. This is a value by which we should increase $\mathbf{v}$ each frame.<br /><br />After all this math, let's see some code:<br /><pre class="brush: cpp;">// do once<br /><br /> vec3 position = vec3(0.0f, 0.0f, 0.0f);<br /> vec3 velocity = vec3(0.0f, 0.0f, 0.0f);<br /><br />// do each frame <br /><br /> vec3 force = vec3(0.0f, 0.0f, 0.0f);<br /> float mass = 1.0f; <br /><br /> // modify force here<br /><br /> vec3 acceleration = force / mass;<br /> vec3 velocity_delta = acceleration * deltaTime;<br /> velocity += velocity_delta;<br /> position += velocity * deltaTime;</pre>This is all we need to have a simple force-based physical simulation.<br /><br />Given this simple system we enter the second phase: we need to add proper forces to model believable grass motion. First of all we need a force that will deflect the blades. To do this I simply check if a blade is in a spherical deflector and apply a force vector $bladePosition - deflectorPosition$, normalized and scaled by some strength parameter. I call this stimulate force, as it stimulates the movement of the blades.<br /><br />Stimulate force works very nice but it's not sufficient. You probably recall the first law of motion that says that if no force acts on a particle, it stands still or moves with a uniform speed. In our case the stimulate force causes movement and once we turn this force off, the blades would deflect into infinity because they are already in move, and no force acts on them! So we definitely need something that could stop them - some sort of friction is needed. My long-term friend and a colleague at work <a href="http://chomzee.gct-game.net/">chomzee</a>, who is much more skilled in physics than I am, suggested a simple solution - simply add a force that acts in the direction opposite to the velocity of a blade. This will look like this:<br /><pre class="brush: cpp;"> force -= friction * velocity;<br /></pre>Note here that we use the velocity of a blade/particle from the <b>previous </b>frame to modify the force. This force is <b>then </b>applied to compute new velocity in the <b>current </b>frame.<br /><br />Okay, so far we have two forces: stimulate and friction. Stimulate will make grass blades deflect, thus induce some movement, and the friction will bring the blades to a halt eventually. But there is one more piece to this puzzle. The grass blades can't stay deflected once a deflector stops acting on them. The blades need to get back to their initial positions. Here's where spring model comes handy. We can treat our grass blades as springs and use very simple <a href="http://en.wikipedia.org/wiki/Simple_harmonic_motion">spring force formula</a>:<br />\begin{eqnarray}<br />\mathbf{F} = -k \mathbf{x}<br />\end{eqnarray}<br />In this formula $k$ is the resiliency of the spring, and $\mathbf{x}$ is a displacement vector from the initial position. The code that simulates this is:<br /><pre class="brush: cpp;"> force -= resiliency * (position - position_initial);<br /></pre>Honestly, that is pretty much it! Using a couple lines of code and some basic understanding of Newton's laws of motion I was able to simulate a very, very realistic interactive grass animation. The game we are working on should have been finished somewhere by the end of this year so you will have the opportunity to see this system in a comercial product pretty quickly :).<br /><br />There is one last thing I would like to mention. In the beginning of this post I showed a formula that we use to accumulate (integrate, so-called Euler integration) particle's position based on velocity:<br />\begin{eqnarray}<br />\mathbf{p} = \mathbf{p} + \mathbf{v} \Delta t<br />\end{eqnarray}<br />This equation is actually just an approximation to the real position of the particle after a simulation step. To be more precise, this is the first order approximation. We can modify this formula to get slightly more accurate solution (but still an approximation!):<br />\begin{eqnarray}<br />\mathbf{p} = \mathbf{p} + \mathbf{v} \Delta t + \mathbf{a} \frac{{\Delta t}^2}{2}<br />\end{eqnarray}<br />Does it look familiar? This is the second-order approximation. I would love to say something more about but I'm still a newbie when it comes to physics so you would have to wait a couple of years to hear a more solid and credible explanation from me :).Wojciech Sternahttp://www.blogger.com/profile/13160846785717328316noreply@blogger.com3tag:blogger.com,1999:blog-9110390413489967593.post-22204551602166175602012-04-28T17:26:00.000+02:002016-09-17T00:43:50.147+02:00Managing Shaders CombinationsRecently, when working on the engine (BlossomEngine) for a game that I am co-creating (<a href="http://gct-game.net/">http://gct-game.net</a>) I encountered a problem that ultimtely every engine programmer must face - managing shader combinations.<br /><br />It is often the case that in the engine we have some super-duper shader(s) with many switches. Such shaders are called uber-shaders. The idea of an uber-shader is that we do not write a separate shader (file) for every single surface, but rather we have on big shader where we can turn some features on and off, like the use of specular lighting on the surface, soft shadows, or anything else that can describe the material.<br /><br />One way to accomplish this is by using static branching within the shader, using constant values supplied to the shader. This solution has some flaws, however. The first is that we generate more if-then instructions in the shader, which simply makes the shader longer. The other one is that we waste shader constants on feeding the shader with appropriate steering data.<br /><br />The second way to manage an uber-shader is to generate completely separate shaders by using <span style="font-family: "courier new" , "courier" , monospace;">#ifdef</span>s in the shader and thus avoiding the need to supply the shader with any constants steering data. The downside is that we will now generate a bunch of shaders which we have to manage somehow in the application. Note that the number of generated shaders grows exponentially with the number of <span style="font-family: "courier new" , "courier" , monospace;">#ifdef</span>s. For example, consider a situation where our shader contains two <span style="font-family: "courier new" , "courier" , monospace;">#ifdef</span>s, <span style="font-family: "courier new" , "courier" , monospace;">A</span> and <span style="font-family: "courier new" , "courier" , monospace;">B</span>. The possible combinations are:<br /><br /><span style="font-family: "courier new" , "courier" , monospace;">No #ifdef</span><br /><span style="font-family: "courier new" , "courier" , monospace;">A</span><br /><span style="font-family: "courier new" , "courier" , monospace;">B</span><br /><span style="font-family: "courier new" , "courier" , monospace;">AB</span><br /><br />Now, if we decide to add a third <span style="font-family: "courier new" , "courier" , monospace;">#ifdef C</span>, we will have to double all shaders that we have so far, with new <span style="font-family: "courier new" , "courier" , monospace;">#ifdef C</span> added to each mix:<br /><br /><span style="font-family: "courier new" , "courier" , monospace;">No #ifdef</span><br /><span style="font-family: "courier new" , "courier" , monospace;">A</span><br /><span style="font-family: "courier new" , "courier" , monospace;">B</span><br /><span style="font-family: "courier new" , "courier" , monospace;">AB</span><br /><span style="font-family: "courier new" , "courier" , monospace;">C</span><br /><span style="font-family: "courier new" , "courier" , monospace;">AC</span><br /><span style="font-family: "courier new" , "courier" , monospace;">BC</span><br /><span style="font-family: "courier new" , "courier" , monospace;">ABC</span><br /><br />So the general formula for the number of generated shaders is $2^n$, where n is the number of <span style="font-family: "courier new" , "courier" , monospace;">#ifdef</span>s.<br /><br />In BlossomEngine the most complex shader I have is a forward-shaded mesh sunlight shader. I must say that at the beginning of development, when I had a very, very few <span style="font-family: "courier new" , "courier" , monospace;">#ifdef</span>s (normal mapping, specular lighting) I didn't bother about having any sophisticated manager. So I simply declared each combination as a separate shader, like this:<br /><pre class="brush: cpp;">CShader meshSunlightPixelShader;<br />CShader meshSunlightPixelShader_NormalMapping;<br />CShader meshSunlightPixelShader_Specular;<br />CShader meshSunlightPixelShader_NormalMapping_Specular;<br /></pre>Obviously, as the game engine evolves, you are developing more and more complex shaders. So this way I ended up having 16 different combinations, what was really troublesome to manage in the application code. I decided to write some automatic system that would generate an array of shaders from given parameters. So I did that. My new class <span style="font-family: "courier new" , "courier" , monospace;">CShadersPack </span>was able to generate an array of <span style="font-family: "courier new" , "courier" , monospace;">CShader </span>objects from a single shader file that contains <span style="font-family: "courier new" , "courier" , monospace;">#ifdef</span>s. A call to init function of that class can look like this:<br /><pre class="brush: cpp;">meshSunLightPixelShadersPack.init("./BlossomEngine/shaders/mesh_sun_light.ps",<br /> stPixelShader, "-DSPECULAR -DCUBE_MAP_REFLECTIONS -DSOFT_SHADOWS <br />-DDIFFUSE_SHADING_NORMAL -DDIFFUSE_SHADING_NORMAL_MAPPED <br />-DAMBIENT_SHADING_NORMAL_MAPPED -DRIM_LIGHTING");<br /></pre>The list of all <span style="font-family: "courier new" , "courier" , monospace;">#ifdef</span>s is passed as one string, with all <span style="font-family: "courier new" , "courier" , monospace;">#ifdef</span>s separated with a space. The function splits that string and generates all combinations.<br /><br />Now the question is, when you already have those shaders generated, how to get the one you need from the <span style="font-family: "courier new" , "courier" , monospace;">CShadersPack </span>object. Say we have <span style="font-family: "courier new" , "courier" , monospace;">A</span>, <span style="font-family: "courier new" , "courier" , monospace;">B</span> and <span style="font-family: "courier new" , "courier" , monospace;">C</span> <span style="font-family: "courier new" , "courier" , monospace;">#ifdef</span>s, and we want to get <span style="font-family: "courier new" , "courier" , monospace;">AC</span> shader. One way to do this would be to search through the shaders array in <span style="font-family: "courier new" , "courier" , monospace;">CShadersPack </span>and find the one with name that matches the <span style="font-family: "courier new" , "courier" , monospace;">AC</span> combination. I think I don't have to point out how wasteful in CPU time that would be. There is a much better way. Let's again see how combinations are generated. First, we have <span style="font-family: "courier new" , "courier" , monospace;">#ifdef A</span>. Combinations are:<br /><br /><span style="font-family: "courier new" , "courier" , monospace;">No #ifdef</span><br /><span style="font-family: "courier new" , "courier" , monospace;">A</span><br /><br />Now we add <span style="font-family: "courier new" , "courier" , monospace;">B</span>:<br /><br /><span style="font-family: "courier new" , "courier" , monospace;">No #ifdef</span><br /><span style="font-family: "courier new" , "courier" , monospace;">A</span><br /><span style="font-family: "courier new" , "courier" , monospace;">B</span><br /><span style="font-family: "courier new" , "courier" , monospace;">BA</span><br /><br />Now <span style="font-family: "courier new" , "courier" , monospace;">C</span>:<br /><br /><span style="font-family: "courier new" , "courier" , monospace;">No #ifdef</span><br /><span style="font-family: "courier new" , "courier" , monospace;">A</span><br /><span style="font-family: "courier new" , "courier" , monospace;">B</span><br /><span style="font-family: "courier new" , "courier" , monospace;">BA</span><br /><span style="font-family: "courier new" , "courier" , monospace;">C</span><br /><span style="font-family: "courier new" , "courier" , monospace;">CA</span><br /><span style="font-family: "courier new" , "courier" , monospace;">CB</span><br /><span style="font-family: "courier new" , "courier" , monospace;">CBA</span><br /><br />So as you can see, there is some regularity. For instance, if we are starting with <span style="font-family: "courier new" , "courier" , monospace;">shaderIndex<span style="font-family: inherit;"><span style="font-family: inherit;"> = 0</span></span></span><span style="font-family: inherit;"> </span>and we want to pick a shader with feature <span style="font-family: "courier new" , "courier" , monospace;">A</span>, we will have to add $1$ to <span style="font-family: "courier new" , "courier" , monospace;">shaderIndex</span>. If we want also feature <span style="font-family: "courier new" , "courier" , monospace;">B</span>, we must add $2$, if we want feature <span style="font-family: "courier new" , "courier" , monospace;">C</span>, we must add $4$, and so on... So to get <span style="font-family: "courier new" , "courier" , monospace;">AC</span> combination we get <span style="font-family: "courier new" , "courier" , monospace;">shaderIndex</span> = 1 + 4 = 5. By looking at the list above we see that indeed, <span style="font-family: "courier new" , "courier" , monospace;">CA</span> is 5th on the list (0-based indexing). Adding a couple of integers and indexing the shaders array will probably be much faster than looping through the array and comparing strings. In real code this looks like this:<br /><pre class="brush: cpp;">int shaderIndex = 0;<br /><br />if (useSpecular)<br /> shaderIndex += 1;<br />if (useReflectionsCubeMap)<br /> shaderIndex += 2;<br />if (useSoftShadows)<br /> shaderIndex += 4;<br />if (meshesInstances[i]->material.diffuseShadingType != CMeshMaterial::dstNone)<br /> shaderIndex += 8;<br />if (meshesInstances[i]->material.diffuseShadingType == CMeshMaterial::dstNormalMapped)<br /> shaderIndex += 16;<br />if (meshesInstances[i]->material.ambientShadingNormalMapped)<br /> shaderIndex += 32;<br />if (useRimLighting)<br /> shaderIndex += 64;<br /></pre><br />Okay, so at this point I had a nice system to manage shader combinations. Unfortunately, I quickly discovered an annoying problem - with nested <span style="font-family: "courier new" , "courier" , monospace;">#ifdef</span>s. In my mesh sunlight shader I have <span style="font-family: "courier new" , "courier" , monospace;">#ifdef</span> called <span style="font-family: "courier new" , "courier" , monospace;">DIFFUSE_SHADING_NORMAL</span>. If it is turned on, the shader will perform classical NdotL lighting based on interpolated vertex normal. However, I want to have the ability to perform the shading using normal from a normal map. So inside <span style="font-family: "courier new" , "courier" , monospace;">DIFFUSE_SHADING_NORMAL</span> I have <span style="font-family: "courier new" , "courier" , monospace;">DIFFUSE_SHADING_NORMAL_MAPPED</span> which takes normal from the normal map into the computations. Does this cause a big problem? Yup, it does.<br /><br />Let's now assume we have a shader with <span style="font-family: "courier new" , "courier" , monospace;">#ifdef A</span>, <span style="font-family: "courier new" , "courier" , monospace;">B</span> and <span style="font-family: "courier new" , "courier" , monospace;">C</span>, but <span style="font-family: "courier new" , "courier" , monospace;">C</span> is located inside the <span style="font-family: "courier new" , "courier" , monospace;">#ifdef B</span>. Something like this:<br /><pre class="brush: cpp;">#ifdef A<br /> A code<br />#endif<br /><br />#ifdef B<br /> B code<br /> #ifdef C<br /> C code<br /> #endif<br /> B code<br />#endif<br /></pre>Combinations generated with my system would still be:<br /><br /><span style="font-family: "courier new" , "courier" , monospace;">No #ifdef</span><br /><span style="font-family: "courier new" , "courier" , monospace;">A</span><br /><span style="font-family: "courier new" , "courier" , monospace;">B</span><br /><span style="font-family: "courier new" , "courier" , monospace;">BA</span><br /><span style="font-family: "courier new" , "courier" , monospace;">C</span><br /><span style="font-family: "courier new" , "courier" , monospace;">CA</span><br /><span style="font-family: "courier new" , "courier" , monospace;">CB</span><br /><span style="font-family: "courier new" , "courier" , monospace;">CBA</span><br /><br />But hey, since <span style="font-family: "courier new" , "courier" , monospace;">C</span> is dependent on <span style="font-family: "courier new" , "courier" , monospace;">B</span>, the combination <span style="font-family: "courier new" , "courier" , monospace;">AC</span> will generate exactly the same code as combination <span style="font-family: "courier new" , "courier" , monospace;">A</span>. As we add more and more <span style="font-family: "courier new" , "courier" , monospace;">#ifdef</span>s we will have more and more duplicated shaders. This is wasteful in terms of compilation/assembling time, as well as in terms of wasted GPU memory to store those duplicated shaders.<br /><br />A way around this problem would be to, put simply, generate combinations in a more intelligent way. This raises some new problems, however, like computing a proper index to the shaders array in <span style="font-family: "courier new" , "courier" , monospace;">CShadersPack</span>, or passing the <span style="font-family: "courier new" , "courier" , monospace;">#ifdef</span>s list to <span style="font-family: "courier new" , "courier" , monospace;">CShadersPack::init</span>. The dependencies would have to marked in some way. This solution can certainly be done, but fortunately to me I figured a much easier to implement and manage way to handle this problem.<br /><br />First of all, the general idea of generating arguments combinations remains the same, so we always have $ 2^n$ combinations, where n is the number of arguments. However, I do not generate the shaders immediately. Instead, a particular shader combination is generated when a mesh with particular material is to be used. So let's say I want to use a shader that only has rim lighting (see "real" code above). In that case I look for an arguments combination under index $64$. I see that it has not been used, so I generate that shader, store the pointer in the shaders array, and hold index into that shaders array in the arguments combinations list. Here's the code for <span style="font-family: "courier new" , "courier" , monospace;">CShadersPack </span>class:<br /><pre class="brush: cpp;">class CShadersPack<br />{<br /> struct ShaderCombination<br /> {<br /> std::string args;<br /> int indexToShadersList;<br /> };<br /><br />private:<br /> std::string fileName;<br /> ShaderType shaderType;<br /><br /> int combinationsNum; // 2 raised to number of args<br /> int shadersNum; // number of actually generated shaders<br /><br /> ShaderCombination *combinationsList;<br /> std::vector< CShader* > shaders;<br /><br />public:<br /> void init(const std::string &fileName, ShaderType shaderType, const std::string &args = "", const std::string &argsToAffix = "");<br /> void free();<br /><br /> CShader* getShader(int index);<br />};<br /></pre>As you can see, there is <span style="font-family: "courier new" , "courier" , monospace;">ShaderCombination </span>structure. The list of that type (<span style="font-family: "courier new" , "courier" , monospace;">combinationsList</span>) is generated a priori, and has the size of $2^n$, where n is the number of arguments. <span style="font-family: "courier new" , "courier" , monospace;">indexToShadersList </span>is initialized with $-1$ value, indicating that this particular combination does not have a shader generated yet. Once the shader is generated, this value will point to shaders vector.<br /><br />Let's now see how <span style="font-family: "courier new" , "courier" , monospace;">CShadersPack::getShader</span> looks like:<br /><pre class="brush: cpp;">CShader* CShadersPack::getShader(int index)<br />{<br /> if (combinationsList[index].indexToShadersList == -1)<br /> {<br /> combinationsList[index].indexToShadersList = shaders.size();<br /><br /> CShader* shader = new CShader();<br /> shader->init(fileName, shaderType, combinationsList[index].args);<br /> shaders.push_back(shader);<br /> }<br /><br /> return shaders[combinationsList[index].indexToShadersList];<br />}<br /></pre>So, if <span style="font-family: "courier new" , "courier" , monospace;">indexToShadersList </span>is still $-1$, it means we have to compile/assemble the shader and store it in the shaders list. Finally, a proper pointer to the shader is returned.<br /><br />This solution is simple, robust and easy to implement and manage. One flaw is that we cannot generate all shaders (actual shaders) when the game is starting because we don't know what materials are used, so we have to generate them during actual gameplay. This will however introduce some slight jerking once an object with a new material combination shows up. But hey, can't we just cache already used arguments combinations on the disk, and generate shaders with those arguments once the game is starting? Yes, we can! And that is what my engine does.Wojciech Sternahttp://www.blogger.com/profile/13160846785717328316noreply@blogger.com0tag:blogger.com,1999:blog-9110390413489967593.post-11740279276190837732011-11-03T22:51:00.041+01:002016-09-17T00:43:53.132+02:00Moving between Coordinate FramesPerhaps the most common thing one does when works in the field of computer graphics is doing transformations, and one of them is moving objects from one frame of reference to another. A classical example is <span style="font-weight: bold;">camera transformation</span>. We have our objects defined in local space, or world space, and then, to make projection transformation simple, we move all objects to camera space. Camera space is a space where all objects are positioned in such a way as if the camera was positioned in point $(0, 0, 0)$ and looking directly toward $-Z$ (right-handed system) or $+Z$ (left-handed system) axis. When I read about how the view transform matrix is carried out, I wasn't too much surprised. The idea is rather simple. First, we want to translate the camera to the origin, and then we want to orient it such that it looks down the $Z$ axis. We can do so using two matrix transformations, the first one is a translation matrix (filled with the negative coordinates of the camera's position) and the second one is a rotation matrix (filled with the orientation vectors of the camera). The translation matrix was obviously straightforward to understand but I could never fully grasp why the rotation matrix looked the way it looked). I went through basic vector/matrix math in a couple of books but nowhere could find the answer, with one exception. I found the answer in <a href="http://www.amazon.com/Game-Engine-Design-Second-Interactive/dp/0122290631">3D Game Engine Design (2nd edition) by David H. Eberly</a>. He wrote some simple formula from which on everything became immediately clear. So, let's start.<br /><br />Let's first define our general problem of moving between coordinate frames (in 2D, for simplicity). Given some point $P$ we want to find its position $P'$ in some alternate frame of reference which is located at point $O$ and is spanned on two orthogonal vectors $R$ and $U$. The crucial condition here is that $P$, $O$, $R$ and $U$ are all defined in the same space, which is usually the case (if not, we can bring them using the very approach we're describing right now). The magic formula (which is nothing more than a simple linear combination of two vectors) then is:<br />\begin{eqnarray}<br />P = O + (P'_x*R) + (P'_y*U)<br />\end{eqnarray}<br />You can check empirically that this works. Just sketch a coordinate frame starting at $(0, 0)$ with two direction vectors $(1, 0)$ and $(0, 1)$. Then, choose some point $O$ in this coordinate frame, as well as vectors $R$ and $U$. And finally, pick some point $P$ and see yourself that the coordinates of $P$ in the starting coordinate frame correspond to the same point in the $(O, R, U)$ coordinate space. Here's some picture:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-FJwSjGozwzg/TrQHdC936nI/AAAAAAAAAEM/YyjPJ2zkOf0/s1600/coordinate_frames.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="288" src="https://3.bp.blogspot.com/-FJwSjGozwzg/TrQHdC936nI/AAAAAAAAAEM/YyjPJ2zkOf0/s320/coordinate_frames.png" width="320" /></a></div><br />So let's say that $P = (P_x, P_y) = (5, 4)$, $O = (4.2, 2)$, $R = (\frac{\sqrt{2}}{2}, \frac{\sqrt{2}}{2})$, $U = (-\frac{\sqrt{2}}{2}, \frac{\sqrt{2}}{2})$. Then, the coordinates $(P'_x, P'_y)$ in the coordinate frame $(O, R, U)$ are, more or less, $P' = (2, 0.85)$. Let's now see how to find $P' = (P'_x, P'_y)$.<br /><br />Okay, so we have one formula with two unknowns $P'_x$ and $P'_y$:<br />\begin{eqnarray}<br />P = O + (P'_x*R) + (P'_y*U)<br />\end{eqnarray}<br />Unsolvable? Of course it is. The equation above is actually not one but two equations:<br />\begin{eqnarray}<br />P_x &=& O_x + (P'_x*R_x) + (P'_y*U_x) \cr<br />P_y &=& O_y + (P'_x*R_y) + (P'_y*U_y)<br />\end{eqnarray}<br />We can now employ any technique to solve this system of equations for $P'_x$ and $P'_y$. However, we can use some assumptions that we have made to simplify the calculations. We assumed that $R$ and $U$ are orthogonal, which is usually the case, but doesn't have to be. If these vectors are not orthogonal then you need to solve the equations above as they are. But here, we take a shortcut, because we know that $R$ and $U$ are orthogonal (orthonormal, to be even more precise), so the following holds:<br />\begin{eqnarray}<br />R \cdot R &=& 1 \cr<br />R \cdot U &=& 0<br />\end{eqnarray}<br />Let's see what happens if we apply the dot product with $R$ on both sides:<br />\begin{eqnarray}<br />P &=& O + (P'_x*R) + (P'_y*U)\ \ \ / \cdot R \cr<br />P \cdot R &=& (O \cdot R) + (P'_x*R \cdot R) + (P'_y*U \cdot R) \cr<br />P \cdot R &=& (O \cdot R) + (P'_x*1) + (P'_y*0) \cr<br />P'_x &=& (P \cdot R) - (O \cdot R) \cr<br />P'_x &=& (P - O) \cdot R = (P_x - O_x) * R_x + (P_y - O_y) * R_y<br />\end{eqnarray}<br />Doing the same with $\cdot U$ applied to both sides will give:<br />\begin{eqnarray}<br />P'_y = (P - O) \cdot U = (P_x - O_x) * U_x + (P_y - O_y) * U_y<br />\end{eqnarray}<br />From these two equations we can now easily derive two matrices. The first one should translate the original point $P$ by $-O$. The second one is just filled with $R$ and $U$ vectors:<br />\[<br />\begin{bmatrix}<br />P'_x & P'_y & 1<br />\end{bmatrix}<br />=<br />\begin{bmatrix}<br />P_x & P_y & 1<br />\end{bmatrix}<br />\begin{bmatrix}<br />1 & 0 & 0 \cr<br />0 & 1 & 0 \cr<br />-O_x & -O_y & 1<br />\end{bmatrix}<br />\begin{bmatrix}<br />R_x & U_x & 0 \cr<br />R_y & U_y & 0 \cr<br />0 & 0 & 1<br />\end{bmatrix}<br />\]<br />Note that we're using homogenous coordinates here. Otherwise we wouldn't be able to "inject" translation into the matrix.<br /><br />That's pretty much it.<br /><br />EDIT 1:<br />Today is 7th of November. I actually put that note four days ago but have constantly modified it since then... mostly the LaTeX part. Seems that I finally got it working :).<br /><br />EDIT 2:<br />Today is 8th of November. I made some big mistake in this note because I thought that it was not possible to solve for $P'_x$ and $P'_y$ if vectors $R$ and $U$ were not orthogonal. Of course, it is possible, and now the note explains how to do that.Wojciech Sternahttp://www.blogger.com/profile/13160846785717328316noreply@blogger.com0tag:blogger.com,1999:blog-9110390413489967593.post-11114463141207165282011-03-21T17:40:00.029+01:002016-09-17T00:43:56.498+02:00Fourier Transform for Digital Image ProcessingI studied computer science (BSc.) at a technical university. The Fourier transform subject appeared at least three times on various courses (basics of electronics, calculus 2, networking technologies). I don't know, maybe it's just me, but I think the lecturers didn't pay enough attention to this subject and they didn't explain it. You know, they were either excited with formulas (which at that time had no sense to me) without letting us understand what is actually going on, or they though that we had already understood that subject before so they were doing short remainders, also without much explanation. The result was that I finished my studies and I didn't know what is Fourier transform actually for and, hold me, how to actually compute it numerically.<br /><br />Obviously, it's not only the fault of the lecturers that I didn't understand Fourier transform. It was also my fault because I didn't consider it important - I thought it was a concept that only people dealing with electronics are concerned. I was wrong. I eventually found out that Fourier transform plays a vital role in digital image processing. However, to better understand using Fourier transform on images, it's better to have some intuition with traditional 1-D functions.<br /><br />Ok, let's finally say what intuitively Fourier transforms does. Once upon a time, a fellow called Jean Fourier discovered, that a vast majority of functions can be approximated with a finite sum of sines and cosines (in inifnity, a function can be <span style="font-weight: bold;">exactly </span>expressed with sines and cosines) (<a href="http://www.files.chem.vt.edu/chem-ed/data/graphics/fourier-waves.gif">http://www.files.chem.vt.edu/chem-ed/data/graphics/fourier-waves.gif</a>). Take a look here: <a href="http://brokensymmetry.typepad.com/photos/uncategorized/2008/06/19/transform_pairs_3.gif">http://brokensymmetry.typepad.com/photos/uncategorized/2008/06/19/transform_pairs_3.gif</a>. Note the 3rd illustration, which depicts a sine function on the left in <span style="font-weight: bold;">time domain</span>. On the right we can see its Fourier transform (only its Real value - more on this in the next paragraph), that is, the sine function transformed to <span style="font-weight: bold;">frequency space</span>. Note that the function consists of exactly one frequency, namely $2\pi$ (present at point $f = \frac{1}{T} = \frac{1}{2\pi}$). This also explains why various forms of Fourier transform are used in data compression. Here, knowing a single value (the frequency) we can easily reconstruct the original function in time domain! Of course, various functions have various representations in frequency domain. Note for example how the boxcar's Fourier transform looks like.<br /><br />One basic thing we must remember is that Fourier transform operates not on real, but on complex numbers! That is, it takes as input a set of complex numbers and outputs a set of complex numbers. Naturally, since we deal with real functions in time domain, their complex values have imaginary parts equal to $0$. However, on the output, the Fourier transform will in general yield complex values. And this is good! This way we can know what the amplitudes are:<br />\begin{eqnarray}<br />amplitude = \sqrt{real*real + imaginary*imaginary}<br />\end{eqnarray}<br />and what the phases are:<br />\begin{eqnarray}<br />phase = \mbox{atan2}(imaginary, real)<br />\end{eqnarray}<br />And now yet another important thing. In this link <a href="http://www.dataq.com/images/article_images/fourier-transform/an11fig1.gif">http://www.dataq.com/images/article_images/fourier-transform/an11fig1.gif</a> on the right there are plots of power spectrum values (these are usually Ampltiude*Amplitude, but the sole Amplitude is also often plotted). We usually skip phases and plot amplitudes because we are usually more interested in how "big" particular frequencies are, and not how much they are offset.<br /><br />One intuition we need to have is how, more or less, the Fourier transform of a particular function can look like. Roughly speaking, if a function is smooth and don't have any sudden height variations, it will have only low/medium frequencies, but not high. On the other hand, if a function has sharp and sudden variations in values, its Fourier transform will depict high frequencues. Note that, for example, noises are high-frequency phenomenons. This is one of the most fundamental applications of Fourier transform - removal of noises. We can take the Fourier transform of a signal/function, eliminate high frequencies (noises mostly) and use inverse Fourier transform to get the input signal without noises.<br /><br />Okay, so how does it all translate to image processing? Roughly speaking, here we have a 2-dimensional function given in <span style="font-weight: bold;">spatial domain</span> (it's a counterpart of 1D <span style="font-weight: bold;">time domain</span>) where pixels' intensities form the input function (think of a whole bunch of functions, each one representing a single row of pixels). This can be used as input to the Fourier transform. Now I stop talking and point valuable Internet resources where you can grab a vast of useful information, including source codes showing implementation of Fourier transform:<br /><a href="http://www.cs.otago.ac.nz/cosc453/student_tutorials/fourier_analysis.pdf">http://www.cs.otago.ac.nz/cosc453/student_tutorials/fourier_analysis.pdf</a><br /><a href="http://homepages.inf.ed.ac.uk/rbf/HIPR2/fourier.htm">http://homepages.inf.ed.ac.uk/rbf/HIPR2/fourier.htm</a><br /><a href="http://www.cs.unm.edu/%7Ebrayer/vision/fourier.html">http://www.cs.unm.edu/~brayer/vision/fourier.html</a><br /><br />Finally, some of my pictures:<br /><br /><a href="http://4.bp.blogspot.com/-M9ZjDeEyaGY/TYePTj2EvyI/AAAAAAAAAAk/gUUr9tLMUeQ/s1600/fourier1.jpg"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5586591428978720546" src="https://4.bp.blogspot.com/-M9ZjDeEyaGY/TYePTj2EvyI/AAAAAAAAAAk/gUUr9tLMUeQ/s320/fourier1.jpg" style="cursor: pointer; height: 320px; width: 312px;" /></a><br /><br />Can you see the bra in the Fourier's amplitudes spectrum? :P<br /><br /><a href="http://4.bp.blogspot.com/-nc9w_4jsUBE/TYePm6hb6SI/AAAAAAAAAAs/gQN5K3VP_r4/s1600/fourier2.jpg"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5586591761483688226" src="https://4.bp.blogspot.com/-nc9w_4jsUBE/TYePm6hb6SI/AAAAAAAAAAs/gQN5K3VP_r4/s320/fourier2.jpg" style="cursor: pointer; height: 320px; width: 316px;" /></a><br /><br />Here I took the Fourier transform of the original image, cut off high frequencies, scaled those at the borders of the circle, and used the Inverse Fourier transform to get the final image on the left. As you can see, the image is blurred (no sudden variations in brightness due to high frequencies removal!).<br /><br />If you experience any problems or have any questions, let me know. I will try to resolve your doubts.Wojciech Sternahttp://www.blogger.com/profile/13160846785717328316noreply@blogger.com0tag:blogger.com,1999:blog-9110390413489967593.post-50127104352208214412011-03-03T00:10:00.013+01:002016-09-17T00:43:59.778+02:00Gimbal LockI have recently been thinking about gimbal lock problem. This article <a href="http://en.wikipedia.org/wiki/Gimbal_lock">http://en.wikipedia.org/wiki/Gimbal_lock</a> explains very clearly the problem, but I will describe it in my own words.<br /><br />Imagine a right-handed coordinate system, and the three cardinal axes $X$, $Y$ and $Z$ sticking out of the origin. Now imagine you are standing at position, say, $(0, 0, 5)$. So you are looking against $+Z$ axis. Now imagine you rotate the axes by $90$ degrees around $Y$ axis, counter-clockwise. At this moment $+Z$ is pointing at starting $+X$ direction, and $+X$ is pointing at starting $-Z$. The problem which arises now is that no matter whether you rotate around $X$ or $Z$ axis, the rotation will proceed around the starting $Z$ axis. At this configuration you cannot rotate around the starting $X$ axis. This problem is called gimbal lock, and roughly speaking it is about losing $1$ degree of freedom (we can no more rotate around $3$ axises, but $2$).<br /><br />What bothers me is that I always keep hearing that the solution to the problem is to use quaternions. People most often use matrices to perform transformations, but when gimbal lock is a problem then quaternions should be used. Well, rudely speaking it's bullshit ;). Gimbal lock can be avoided both with quaternions and matrices.<br /><br />I think that problem is that people mostly do not understand what the problem is. I must admit that I still get confused with all these rotation-related stuff, including gimbal lock. But what I know for sure is that gimbal lock is not a problem of, so to speak, a tool used to perform a rotation, but rather it is a problem of approach we take.<br /><br />Let's take a look at the following piece of code:<br /><pre class="brush: cpp;">mtx worldTransform *= mtx::rotateX(roll) * mtx::rotateY(yaw) * mtx::rotateZ(pitch)<br /></pre>This code performs rotation using matrices and using so-called Euler angles (roll, yaw, pitch). This code will suffer from gimbal lock. The problem is "nesting" of the transformations. If we perform some roation around $X$ axis only, we will get the result we want. But, if we additionally perform rotation around $Y$ axis, then this will also affect rotation around $X$ axis. As we have said before, doing rotation around $Y$ axis by $90$ degress will point the initial $+X$ direction towards $-Z$ direction. That's why rotation around $X$ axis in that configuration will result in doing roation around world space $Z$ axis. Moreover, note that in the formula we have additional outer-most rotation around world space $Z$ axis. Since this transform is outer-most it's not affected by any other transform and thus will always work as expected - do a rotation around world space $Z$ axis. So using the ordering as above, $XYZ$, and doing $90$-degrees rotation around $Y$ axis will cause that any rotation around $X$ axis and $Z$ axis will rotate "the world" around $Z$ axis (probably the ordering, clockowise/counter-clockwise, will only differ).<br /><pre class="brush: cpp;">objectQuat = quat::rotateX(roll) * quat::rotateY(yaw) * quat::rotateZ(pitch);<br />mtx worldTransform = objectQuat.toMatrix();<br /></pre>Well, we are using quaternions. But... this code will give exactly the same results! It really does not matter whether you use a matrix, quaternion, duperelion, or any other magical tool. You will have gimbal lock as long as your approach will try to construct the rotation in "one step". To actually get rid off gimbal lock you simply have to rotate your object "piece by piece", that is, not using one concretely-defined transformation matrix/quaternion. Take a look at this:<br /><pre class="brush: cpp;">worldTransform *= mtx::rotateX(rollDelta) * mtx::rotateY(yawDelta) * mtx::rotateZ(pitchDelta);<br /></pre>Note that here we do not assign the combined rotation matrix to <span style="font-family: "courier new" , "courier" , monospace;">worldTransform</span>, but rather we construct a "small rotation offset" (note we are using deltas here, not the definitive roll/yaw/pitch values!) and then we apply this "small rotation" to <span style="font-family: "courier new";">worldTransform </span>by multiplication. So in fact <span style="font-family: "courier new";">worldTransform </span>is not recreated each frame from scratch, but it is updated each frame by a delta rotation. So if you want to rotate your object around $Y$ axis in a frame, simply set <span style="font-family: "courier new";">yawDelta </span>to some value in that particular frame. Of course, keep in mind that in this case <span style="font-family: "courier new" , "courier" , monospace;">worldTransform </span>cannot ba a local variable recreated every frame, but rather a global or static variable.<br /><br />One thing that may be baffling you now is why can we have the same fixed $XYZ$ ordering and yet we do not get gimbal lock. Well, actually we do... but... I'm not completely sure about it, but for very small angles (and our delta angles are obviously small) the ordering is not important and the gimbal lock problem can be ignored. This is easy to test. If you use both of these approaches (the one without delta and the one with) and animate your object simultaneously around all three axes, you will get (visually, more or less) the same result even up to all all angles being smaller than $20$ degrees. So that means that your delta's can be actually quite big.<br /><br />The same can be achieved with quaternions (you had any doubts?):<br /><pre class="brush: cpp;">quat deltaQuat = quat::rotateX(rollDelta) * quat::rotateY(yawDelta) * quat::rotateZ(pitchDelta);<br />objectQuat *= deltaQuat;<br />mtx worldTransform = objectQuat.toMatrix();<br /></pre>This code does actually the same thing except for the fact it uses quaternions to combine the delta quaternion, which is then applied to (global or static) <span style="font-family: "courier new" , "courier" , monospace;">objectQuat</span>, which is further converted to a matrix.<br /><br />Huh, it was a pretty long note for the second one on this blog. Hope you found what you wanted here. Waiting for comments :).Wojciech Sternahttp://www.blogger.com/profile/13160846785717328316noreply@blogger.com0tag:blogger.com,1999:blog-9110390413489967593.post-33131645737761439252011-03-02T23:57:00.000+01:002011-11-04T00:21:32.466+01:00The Very First NoteHey,<br /><br />This is an introduction note. My name is Wojtek and at the moment I actually do nothing :D (at least "officialy"). This blog will be about me, my thoughts, studies. But above all I will be posting notes about programming, mostly games and graphics.<br /><br />Hope you find something interesting here,<br />CheersWojciech Sternahttp://www.blogger.com/profile/13160846785717328316noreply@blogger.com0