Tiny Vector Matrix library using Expression Templates

Some Notes ...

Contents:

... on optimizing

This depends heavily on compiler and the flags used. The code produced with -O could be better than with -O2 even on gcc-2.9x suite. To get the best results, you should examine the assembler code generated by your compiler. Maybe I will write a benchmark suite for different compiler options one day. (Maybe you could contribute?)

... on temporaries

The use of expression templates (ET) and meta templates (MT) allows the generated code to avoid the creation of many temporaries -- especially with standard mathematical and assignment operations. There are times that you have to use actual temporaries e.g. when swapping variables of type double -- with integer values you can use the XOR operator.

Some implementations are using a loop with temporaries even if there is a solution with ET. Than the loops are faster than MT.

See also:: ... about Matrix-Matrix-Vector and Matrix-Matrix-Matrix-operations

... on operators and namespace element_wise

Some operations on matrices and vectors are not available at first glance. These are defined in the namespace element_wise because they are element wise (and not strictly mathematical) operations.

But there is more: some functions do element wise operations per se (e.g. vector addition) and are NOT inside namespace element_wise. Furthermore, all comparison operators perform element wise operations.

See also:: Compare Vectors and Matrices

... about Threads

For performance reasons the individual matrix/vector expression objects has no internal protection against concurrent access from multiple threads, and thus, are not thread safe. The library, as a whole, is thread safe in the sense that multiple threads may execute matrix/vector expressions as long as the expression objects are not shared between those threads. If you need to share matrix/vector expression objects between threads, you must wrap them yourself with protecting code.

... on expressions

The first versions of tvmet had only one expression (Xpr) which was shared for both vectors and matrices. This was working fine, but limited tvmet's use for arithmetic expressions expressions on complex values. For this reason, I had to separate expression types for vectors and matrices. The same problem appeared when using the eval() function for evaluating these expressions. (Which index operator should handle it?) Unfortunately, the use of separate expression types vastly increases the number of operators and functions needed to make the library viable. Fortunately, most boundary checks are not necessary since they are done at compile time (such as those needed by the assignment operator, etc).

... on namespaces and Koenig Lookup

IMO, the cleanest way would be to put all functions into their own namespace Functional instead of naming them with the fncl_ prefix they currently have. (I did beforehand, and have thought better since). Unfortunately, this technique doesn't work well. I got compiler errors like:

   template <class T> Functional::xyt<T>' is not a function
   conflict with `template <class E> xyz(Xpr<E>)' in call to `xyz'

when trying:

   typedef Vector<double, 3> vector3d;
   vector3d t1(1,2,3);
   vector3d t2(t1);
   vector3d r;
   r = sqrt( t1 * t2 );

ADL (argument dependent lookup), aka Koenig Lookup, is causing the compiler to check for a match in namespace Functional, since the template instantiation is part of Functional (the Xpr stuff), it matches before the global namespace (the Vector stuff) is checked. Writing:

   r = ::sqrt( t1 * t2 );

seems to solve the problem at first glance. However, to force the user of the library into this syntax is painful and could probably run cause other problems with other namespaces (I haven't checked this). Therefore, all "Functionals" have the prefix fncl_.

... about aliasing

tvmet assumes that all matrices and vectors are alias free. These means that source and destination memory layout of matrices and vectors never overlaps (during an operation).

This is very easy to understood if you see a matrix-vector product. Both contain different data in different (unique, non-overlapping) memory regions -- hence, they are alias free. Contrast this with a matrix-matrix multiply which maybe can have an aliasing, e.g. $A = A * B$ . When source and destination memory regions are the same, the computed results may be wrong. (Probably they will be.) But, $C = A * B$ is alias free.

Let's see an example in detail:

Example:

   Matrix<double,3,3>   M1;
   M1 = 1,2,3,4,5,6,7,8,9;

   cout << "M1 = " << M1 << endl;
   M1 = trans(M1);
   cout << "M1 = " << M1 << endl;

Output:

   M1 = Matrix<d, 3, 3> = [
     [1, 2, 3],
     [4, 5, 6],
     [7, 8, 9]
   ]
   M1 = Matrix<d, 3, 3> = [
     [1, 4, 7],
     [4, 5, 8],
     [7, 8, 9]
   ]

As you can see, the lower triangular matrix isn't what you expected due to the aliasing. These results depends on the compiler optimizations, too.

Unfortunately, to avoid the aliasing problem, you must use temporaries as shown here:

Example:

    matrix_type         temp_A(A);
    A = temp_A * B;
    cout << "matrix_type temp_A(A);\n"
         << "A = temp_A * B = " << A << endl;

Anyway, it seems there is a small exception (no guarantee, since it's compiler dependent I assume) for element wise operations with matrices or vectors on right hand side.

Starting with tvmet release 1.4.1 there is a new function alias. These function use a proxy to call special member functions of the Matrix/Vector class. These member functions introduce the temporary for you.

Example:

   typedef tvmet::Matrix<double, 3, 3>    matrix_type;
   matrix_type          M;
   std::generate(M.begin(), M.end(),
                 tvmet::util::Incrementor<matrix_type::value_type>());
   std::cout << "M = " << M << std::endl;

   alias(M) = trans(M);
   std::cout << "M = " << M << std::endl;

with the expected

Output:

   M = [
     [1, 4, 7],
     [2, 5, 8],
     [3, 6, 9]
   ]

These function/proxy will work for the element wise operators +=, -=, *= and /= with expressions, e.g. as trans() returns.

See also:: no match for ?= operator

... special Meta-Template Functions

From a principle point of view, there is no need for some special functions for Matrix and Vector functions, namely $M^T\, x$ , $M^T\,M$ , $M\,M^T$ , and $(M\,M)^T$ .

Unfortunately, the g++ compiler throws in the towel sometimes even on transposing matrices. Because of this, tvmet offers specialized functions which speed up at runtime (about factor 2 ... 3) using meta templates.

Example:

   using namespace tvmet;

   Matrix<double, 6, 3>     M1(0); // will be transposed to be conform to vector size
   Vector<double, 6>      v1(0);
   Vector<double, 3>      v2(0);

   M1 = ...
   v1 = ...

   v2 = Mtx_prod(M1, v1);   // equal to: v2 = trans(M1)*v1;

BTW, the Matrix-Matrix $M\,M$ and Matrix-Vector $M\,x$ products use Meta-Templates, too.

See also:

Mtx_prod

MMt_prod

MtM_prod

trans_prod

... about Matrix-Matrix-Vector and Matrix-Matrix-Matrix-operations

The problem is related to the optimizer - due to the expression and meta templates used.

Internally, an expression template may contain other expression templates (meta templates inside as well as) too - the compiler will unroll all of these expression into a single resultant expression (which is a hard job). Sometimes the code generated from this is worse (from a performance point of view) than just using simple temporaries.

You can chain matrix-matrix and matrix-vector operations without writing temporaries by yourself (if this is what you want).

from examples/hspiess.cc:

   tvmet::Matrix<double,3,2>  B;
   tvmet::Matrix<double,3,3>  D;
   tvmet::Matrix<double,2,2>  K;

   B =
     -0.05,  0,
      0,     0.05,
      0.05, -0.05;
   D =
      2000, 1000, 0,
      1000, 2000, 0,
      0,    0,    500;

   K = trans(B) * D * B;

The performance can be sub optimal due to the increasing complexity of operations. This can be reduced by a user specified temporary:

from examples/hspiess.cc:

   // as before

   K = tvmet::Matrix<double,2,3>(trans(B) * D) * B;

   K = prod(tvmet::Matrix<double,2,3>(prod(trans(B), D)), B);

At this moment an intelligent cache and pre-evaluating strategy is missing by tvmet.

See also:: ... special Meta-Template Functions
some notes ... on temporaries

Author: