# 7. Interfacing Lapack#

Lapack is the standard linear algebra library. It contains matrix operations, factorizations, linear systems, eigen-system calculation, and more. It is a quasi standard, highly optimized implementations are available for all major computer systems.

Some implementations are:

OpenBlas (open source, for many architectures)

Intel MKL (for Intel processors + compatible)

Accelerator framework (for Apple systems)

If not available, install some Lapack on your computer.

Switch to branch `lapack`

in ASC-bla.

There are these new lines in *CMakeLists.txt* file:

```
find_package(LAPACK REQUIRED)
add_executable (test_lapack tests/test_lapack.cc)
target_link_libraries (test_lapack PUBLIC ${LAPACK_LIBRARIES})
```

Run *cmake*, and look for output showing *BLAS* and *LAPACK* in the file *build/CMakeCacht.txt*. The `find_package(LAPACK REQUIRED)`

searches for a Lapack installation in the usual directories.

Lapack was written in Fortran, and automatically translated to C using the f2c compiler. There is one C-header file clapack.h which one includes like that:

```
#include <complex>
typedef int integer;
typedef integer logical;
typedef float real;
typedef double doublereal;
typedef std::complex<float> singlecomplex;
typedef std::complex<double> doublecomplex;
// Windows SDK defines VOID in the file WinNT.h
#ifndef VOID
typedef void VOID;
#endif
typedef int ftnlen;
typedef int L_fp; // ?
extern "C" {
#include <clapack.h>
}
```

maybe better: include also f2c.h.

## 7.1. The BLAS - functions#

Lapack contains basic functions for efficiently computing linar algebra operations (called **B**asic **L**inera **A**lebra **S**ubprograms BLAS). One refers to BLAS1, BLAS2, and BLAS3 operations according to computational complexity \(O(n)\), \(O(n^2)\), or \(O(n^3)\). Vector addition and inner products are BLAS1, matrix-vector products or outer products of vectors are BLAS2, and matrix-matrix multiplication belong to BLAS3.

### 7.1.1. The GEMM functions#

With origin in Fortran-times, all functions have a short name from abbreviations, like
**D**ouble-precission **GE**neral **M**atrix **M**atrix multiplication (see quick reference guide)

```
int dgemm_ (char *transa, char *transb, integer *m,
integer * n, integer *k, doublereal *alpha,
doublereal *a, integer *lda, doublereal *b,
integer *ldb, doublereal *beta, doublereal *c__,
integer *ldc);
```

Similarly, there are `sgemm`

, `cgemm`

and `zgemm`

for single precision, and single/double precision complex versions of matrix-matrix multiplication.
The functions are well documented, and google easily finds them, like
documentation dgemm

Since function calls with many arguments are not very clear, we prefer to wrap the low-level Lapack call into a function based on our `MatrixView`

classes. The matrices are provided by a pointer to the first element, and the leading distance (what is exactly our `dist`

variable).
In Fortran/Lapack matrices are stored column-major. The `dgemm`

function allows to specify whether the matrices \(A\) or \(B\) are transposed. If we have a row-major matrix, we set the transpose argument to ‘T’ (and otherwise ‘N’ for not):

```
// c = a*b
template <ORDERING OA, ORDERING OB>
void MultMatMatLapack (MatrixView<double, OA> a,
MatrixView<double, OB> b,
MatrixView<double, ColMajor> c)
{
char transa = (OA == ColMajor) ? 'N' : 'T';
char transb = (OB == ColMajor) ? 'N' : 'T';
integer n = c.Height();
integer m = c.Width();
integer k = a.Width();
if (n == 0 || m == 0) return 0;
double alpha = 1.0;
double beta = 0;
integer lda = a.Dist();
integer ldb = b.Dist();
integer ldc = c.Dist();
int errcode =
dgemm_ (&transa, &transb, &n, &m, &k, &alpha,
a.Data(), &lda, b.Data(), &ldb, &beta, c.Data(), &ldc);
if (errcode != 0)
throw exception ("Lapack-dgemm error "+to_string(errcode));
}
```

Only matrices \(A\) and \(B\) can be marked as transpose, not the result matrix \(C\). If we want a row-major result, we use

and transpose all three matrices. Since `Trans`

does not do actual computations (it only gives a different view on the memory), this additional wrapper has zero costs:

```
template <ORDERING OA, ORDERING OB>
void MultMatMatLapack (MatrixView<double, OA> a,
MatrixView<double, OB> b,
MatrixView<double, RowMajor> c)
{
MultMatMatLapack (Trans(b), Trans(a), Trans(c));
}
```

If we have to compute \(C = A^T B\) we can use the same technique:

```
MultMatMatLapack (Trans(a), b, c)
```

### 7.1.2. Syntactic sugar#

If people have the choice between coding like

```
C = A*B;
```

or

```
MultMatMatLapack (A, B, C);
```

they will still prefer the first option. It is also less prone to errors. Since efficient evaluation of the entry \((i,j)\) is not provided by Lapack, we cannot directly include Lapack into our hierarchy of expression templates. However, something like

```
C = A*B | Lapack;
```

can be easily provided. One overloads the *bit-wise or* operator `operator|`

acting on a `MultExpr<MatrixView,MatrixView>`

and an object `Lapack`

, which we may create like

```
class T_Lapack { };
static constexpr T_Lapack Lapack;
```

The `operator|`

returns an object of type `LapackMultExpr`

which contains the factors `A`

and `B`

. Finally, we define an assignment operator for the class `MatrixView`

taking such an `LapackMultExpr`

type, and calling the `MultMatMatLapack`

function.

## 7.2. LU-decomposition#

Lapack can compute an LU-decomposition of a matrix \(A\), what means a factorization of \(A\) of the form

where \(P\) is a permutation matrix, \(L\) a lower triangular matrix with unit diagonal elements, and \(U\) a upper triangular matrix. This is an \(O(n^3)\) algorithm.

Having such a factorization, the solution of a linear system \(A x = b\) can be performed via forward/backward substitution in \(O(n^2)\) operations. An \(LU\)-factorization is usually also the first step for computing the inverse \(A^{-1}\).

Lapack is very careful with memory usage. When computing the \(LU\) factorization, the original matrix \(A\) is overwritten with the \(L\) and \(U\) factors. The permutation matrix \(P\) is stored as an index vector. When we compute the inverse \(A^{-1}\) from the factors \(L\) and \(U\), the resulting matrix can be placed in the memory of the \(L\) and \(U\) factors, again.

A C++ - wrapper for the Lapack LU - factorization may look like this. The LU-Factorization is initialized by a matrix, which will be used to hold the LU-factors. Then we can use it to solve a linear system, and also to compute the inverse. The `Inverse`

function overwrites the factorization by the inverse matrix, and returns this matrix by move-semantics. The `LapackLU`

object will be destroyed.
Actually, we are computing the factorization \(A = PLU\) iff \(A\) is stored col-major. If \(A\) is stored row-major, we are computing \(A^T = PLU\), i.e. \(A = U^TL^TP^T\).

```
template <ORDERING ORD>
class LapackLU {
Matrix <double, ORD> a;
std::vector<integer> ipiv;
public:
LapackLU (Matrix<double,ORD> _a)
: a(std::move(_a)), ipiv(a.Height()) {
integer m = a.Height();
if (m == 0) return;
integer n = a.Width();
integer lda = a.Dist();
integer info;
/*
int dgetrf_(integer *m, integer *n, doublereal *a,
integer * lda, integer *ipiv, integer *info);
*/
dgetrf_(&n, &m, &a(0,0), &lda, &ipiv[0], &info);
}
// b overwritten with A^{-1} b
void Solve (VectorView<double> b) const {
char transa = (ORD == ColMajor) ? 'N' : 'T';
integer n = a.Height();
integer nrhs = 1;
integer lda = a.Dist();
integer ldb = b.Size();
integer info;
/*
int dgetrs_(char *trans, integer *n, integer *nrhs,
doublereal *a, integer *lda, integer *ipiv,
doublereal *b, integer *ldb, integer *info);
*/
dgetrs_(transa, &n, &nrhs, a.Data(), &lda, ipiv.data(), b.data(), &ldb, &info);
}
Matrix<double,ORD> Inverse() && {
double hwork;
integer lwork = -1;
integer n = a.Height();
integer lda = a.Dist();
integer info;
/*
int dgetri_(integer *n, doublereal *a, integer *lda,
integer *ipiv, doublereal *work, integer *lwork,
integer *info);
*/
dgetri_(&n, &a(0,0), &lda, ipiv.data(), &hwork, &lwork, &info);
lwork = integer(hwork);
std::vector<double> work(lwork);
dgetri_(&n, &a(0,0), &lda, ipiv.data(), &work[0], &lwork, &info);
return std::move(a);
}
Matrix<double,ORD> LFactor() const { ... }
Matrix<double,ORD> UFactor() const { ... }
Matrix<double,ORD> PFactor() const { ... }
};
```

The class `LapackLU`

can be used as follows: Copy matrix `A`

by copy-ctor before overwritten by `LU`

factors:

```
Matrix<> A;
Matrix<> invA = LapackLU(A).Inverse();
```

Reuse memory of matrix `A2`

for `LU`

factors, `A2`

has shape \((0,0)\) after calling `LapackLU`

:

```
// or reuse A2
Matrix<> A2
Matrix<> invA2 = LapackLU(std::move(A2)).Inverse(); // reuse
```

Other examples for useful matrix decompositions are QR-factorization, singular value decomposition (SVD), and more, see The Big Six Matrix Factorizations

## 7.3. Exercise:#

adjust the matrix-matrix multiplication function to your matrix class. Measure performance of Lapack matrix-matrix multiplication, and compare to your expression template matrix-matrix multiplication. Compare GFlop - rates, i.e. \(10^9\) mult-add operations per second, for various matrix sizes \(10 \times 10\), \(100 \times 100\), \(1000 \times 1000\).

```
#include <chrono>
size_t flops = n*n*n;
size_t runs = size_t (1e9 / flops) + 1;
auto start = std::chrono::high_resolution_clock::now();
for (size_t i = 0; i < runs; i++)
// call function
+auto end = std::chrono::high_resolution_clock::now();
double time = std::chrono::duration<double>(end-start).count();
cout << "n = " << n << ", time = " << time << " s, GFlops = "
<< (flops*runs)/time*1e-9 << endl;
```

Complete the

`LapackLU`

class. Look up documentation for functions`dgetrf`

,`dgetrs`

,`dgetri`

.

Pick some functions of your choice. Write C++ wrappers similar to `LapackLU`

for the following matrix operations:

QR decomposition using Householder reflections (

`dgeqrf`

,`dorgqr`

): \(A = QR\) with \(Q\) orthogonal, and \(R\) upper triangular. By storing \(Q\) as product of Householder reflections, \(Q\) and \(R\) together fit into the memory of \(A\).Eigenvalues and eigenvectors of a real-valued symmetric matrix (

`dsyev`

).Eigenvalues and eigenvector calculation (

`dgeev`

) of a real-valued general matrix. Even if the matrix is real, eigenvalues/eigenvectors are complex, in general.SVD decomposition of a matrix (

`dgesvd`

): \(A = U \Sigma V^T\) with \(U\) and \(V\) orthogonal, and \(\Sigma\) a (rectangular) diagonal matrix.

## 7.4. C++ libraries#

There are a couple of modern C++ linear algebra libraries:

In the core, most of them fall back to legacy Lapack for large matrices. Have a look into their documentations.