Recently at

Flying Wild Hog my task was to change the vertex format of meshes. Both static and skinned meshes used 12 bytes for storing normal, tangent, bitangent and color of the vertex. This looked something like this:

UByte4N normal;
UByte4N tangent;
UByte4N bitangent;

Each tangent basis vector is stored explicitly on 24 bits and the last 8 bits of each vector are intended to store one component of the vertex color (red stored in normal's alpha, green in tangent's alpha and blue in bitangent's alpha).

The

UByte4N type internally stores one 32-bit unsigned integer (hence the

U prefix) and has some helper functions for packing and unpacking four 8-bit values. Obviously, as tangent basis vector's components are floats in normalized $[-1, 1]$ interval (hence the

N postfix meaning normalized), there are also functions which convert it to decimal $[0, 255]$ interval.

This form of the tangent basis with color packed together is actually quite nice. The precision is not bad and all channels are fully utilized. The biggest problem is that in order to get color we must read three vectors. We can do better though.

The first thing is to get rid off the bitangent vector altogether. The whole tangent basis can nicely be packed with just normal, tangent and one value that indicates the handedness of the entire basis. To get the handedness of the basis you can compute the determinant of a matrix that is formed of that basis - it will be either $1$ or $-1$ for an orthonormal basis. This way we save two values (one handedness value as opposed to three defining the bitangent vector) and we can extract the color to a separate vector:

UByte4N normal; // .w unused
UByte4N tangent; // .w stores handedness
UByte4N color;

It looks like now the color has been enriched with the alpha channel. Also, the $w$ component of the normal vector is unused. And hey, do we really need to waste 8 bits for the handedness? One value would suffice. Fortunately, GPUs are equipped with RGB10A2 format, where 10 bits are devoted to RGB channels and 2 bits for the alpha channel. This way we can store the tangent vector with greater precision. So, our new tangent basis and color part of the vertex structure is this:

UX10Y10Z10W2N normal; // .w unused
UX10Y10Z10W2N tangent; // .w stores handedness
UByte4N color;

Now this one looks scary at first glance. The

U prefix again means that the stored values are unsigned - decimal $[0, 1023]$ interval for the $x$, $y$, and $z$ components and $[0, 3]$ for the $w$ component. Also, the

N postfix means the that input float values to the structure must be normalized, that is in $[-1, 1]$ interval.

Finally, some piece of working code:

struct UX10Y10Z10W2N
{
UInt32 m_value;
void Set( const float* xyz, int w )
{
m_value = 0;
for ( int i = 0; i < 3; i++ )
{
float f = ( xyz[ i ] + 1.0f ) * 511.5f;
if ( f < 0.0f )
f = 0.0f;
else if ( f > 1023.0f )
f = 1023.0f;
m_value |= ( ( int )f ) << 10*i;
}
m_value |= w << 30;
}
void Get( float& x, float& y, float& z ) const
{
x = ( float )( ( m_value >> 0 ) & 1023 );
y = ( float )( ( m_value >> 10 ) & 1023 );
z = ( float )( ( m_value >> 20 ) & 1023 );
x = ( x - 511.0f ) / 512.0f;
y = ( y - 511.0f ) / 512.0f;
z = ( z - 511.0f ) / 512.0f;
}
void Get( float& x, float& y, float& z, int& w ) const
{
Get( x, y, z );
w = ( ( m_value >> 30 ) & 3 );
}
};

I'm leaving you without much explanation as it should be quite easy to read.
One thing worth noting is that the $w$ component is not passed or read as float but rather as int. That is because I determined it does not make much sense dealing with float here as we have barely 2 bits. I pass here $0$ or $3$ (which will be converted to $1$ in the shader) depending on the handedness of the tangent basis.