4. Pushing the limits#

from ngsolve import *
mesh = Mesh(unit_square.GenerateMesh(maxh=0.2))

# SetHeapSize(1000*1000*1000)
func = x
for i in range(100):
    func += x

# with TaskManager():
Integrate (func*dx(bonus_intorder=50), mesh)
50.49999999997998

4.1. Recursive evaluation#

  • Recursive evaluation requires a stack.

  • Stack-size is (seriously) bounded, depending on the system, and parallelization

  • Stack-overflow cannot be detected within C++

4.2. A possible solution: Compile#

A way out of recursive evaluation is to linearize the expression tree, this happens by the .Compile function.

We do not .Compile per default

  • since it does not work for some cases (mixed real/complex expressions) or

  • may even slow down in other situations (heavy Region-wise coefficients).

func = x
for i in range(1000): func += x
print ("Integral =", Integrate (func.Compile()*dx(bonus_intorder=50), mesh))
Integral = 500.49999999980145

4.3. Implementation:#

We use containers with optimized allocation to avoid dynamic memory allocation on the global heap.

Version 1: put array onto the stack. Use a macro for non-standard C++ definition of local variable-length arrays:

template <typename MIR, typename T, ORDERING ORD>
void ExpressionPlus::T_Evaluate (const MIR & ir, BareSliceMatrix<T,ORD> values) const 
{
  size_t np = ir.Size(); 
  size_t dim = Dimension();

  STACK_ARRAY(T, hmem, np*dim);
  FlatMatrix<T,ORD> temp(dim, np, &hmem[0]);
    
  c1->Evaluate (ir, values);
  c2->Evaluate (ir, temp);
  for (size_t i = 0; i < mydim; i++)
    for (size_t j = 0; j < np; j++)
      values(i,j) = values(i,j)+temp(i,j);
}

Version 2: Preserve a fixed size of memory, and use dynamic allocation for larger needs:

template <typename MIR, typename T, ORDERING ORD>
void ExpressionPlus::T_Evaluate (const MIR & ir, BareSliceMatrix<T,ORD> values) const 
{
  size_t np = ir.Size(); 
  size_t dim = Dimension();

  ArrayMem<100,T> hmem(np*dim);
  FlatMatrix<T,ORD> temp(dim, np, &hmem[0]);
    
  c1->Evaluate (ir, values);
  c2->Evaluate (ir, temp);
  for (size_t i = 0; i < mydim; i++)
    for (size_t j = 0; j < np; j++)
      values(i,j) = values(i,j)+temp(i,j);
}

New since NGSolve 2606:

  • A LocalHeap preallocates a big chunk of memory, and behaves like a stack. However size is user-defined, and over-flow can be detected by C++ code. Used for a long time within NGSolve. For many functions it is propagated by function arguments, but by far not for all.

  • Using one global variable does not work with multi-threading

  • Now, we introduced a thread_local LocalHeap, which is a global variable for each thread (thx Matthias).

template <typename MIR, typename T, ORDERING ORD>
void ExpressionPlus::T_Evaluate (const MIR & ir, BareSliceMatrix<T,ORD> values) const 
{
  size_t np = ir.Size(); 
  size_t dim = Dimension();

  LocalHeap &lh = TLHeap();
  HeapReset hr(lh);
  FlatMatrix<T,ORD> temp(dim, np, lh);
    
  c1->Evaluate (ir, values);
  c2->Evaluate (ir, temp);
  for (size_t i = 0; i < dim; i++)
    for (size_t j = 0; j < np; j++)
      values(i,j) = values(i,j)+temp(i,j);
}

Now, no big data is kept on the system-stack. Matrix entries are kept on the home-made LocalHeap stack. There are still limits, since control structures are kept on the system stack.