4. Pushing the limits#
from ngsolve import *
mesh = Mesh(unit_square.GenerateMesh(maxh=0.2))
# SetHeapSize(1000*1000*1000)
func = x
for i in range(100):
func += x
# with TaskManager():
Integrate (func*dx(bonus_intorder=50), mesh)
50.49999999997998
4.1. Recursive evaluation#
Recursive evaluation requires a stack.
Stack-size is (seriously) bounded, depending on the system, and parallelization
Stack-overflow cannot be detected within C++
4.2. A possible solution: Compile#
A way out of recursive evaluation is to linearize the expression tree, this happens by the .Compile function.
We do not .Compile per default
since it does not work for some cases (mixed real/complex expressions) or
may even slow down in other situations (heavy
Region-wise coefficients).
func = x
for i in range(1000): func += x
print ("Integral =", Integrate (func.Compile()*dx(bonus_intorder=50), mesh))
Integral = 500.49999999980145
4.3. Implementation:#
We use containers with optimized allocation to avoid dynamic memory allocation on the global heap.
Version 1: put array onto the stack. Use a macro for non-standard C++ definition of local variable-length arrays:
template <typename MIR, typename T, ORDERING ORD>
void ExpressionPlus::T_Evaluate (const MIR & ir, BareSliceMatrix<T,ORD> values) const
{
size_t np = ir.Size();
size_t dim = Dimension();
STACK_ARRAY(T, hmem, np*dim);
FlatMatrix<T,ORD> temp(dim, np, &hmem[0]);
c1->Evaluate (ir, values);
c2->Evaluate (ir, temp);
for (size_t i = 0; i < mydim; i++)
for (size_t j = 0; j < np; j++)
values(i,j) = values(i,j)+temp(i,j);
}
Version 2: Preserve a fixed size of memory, and use dynamic allocation for larger needs:
template <typename MIR, typename T, ORDERING ORD>
void ExpressionPlus::T_Evaluate (const MIR & ir, BareSliceMatrix<T,ORD> values) const
{
size_t np = ir.Size();
size_t dim = Dimension();
ArrayMem<100,T> hmem(np*dim);
FlatMatrix<T,ORD> temp(dim, np, &hmem[0]);
c1->Evaluate (ir, values);
c2->Evaluate (ir, temp);
for (size_t i = 0; i < mydim; i++)
for (size_t j = 0; j < np; j++)
values(i,j) = values(i,j)+temp(i,j);
}
New since NGSolve 2606:
A
LocalHeappreallocates a big chunk of memory, and behaves like a stack. However size is user-defined, and over-flow can be detected by C++ code. Used for a long time within NGSolve. For many functions it is propagated by function arguments, but by far not for all.Using one global variable does not work with multi-threading
Now, we introduced a
thread_local LocalHeap, which is a global variable for each thread (thx Matthias).
template <typename MIR, typename T, ORDERING ORD>
void ExpressionPlus::T_Evaluate (const MIR & ir, BareSliceMatrix<T,ORD> values) const
{
size_t np = ir.Size();
size_t dim = Dimension();
LocalHeap &lh = TLHeap();
HeapReset hr(lh);
FlatMatrix<T,ORD> temp(dim, np, lh);
c1->Evaluate (ir, values);
c2->Evaluate (ir, temp);
for (size_t i = 0; i < dim; i++)
for (size_t j = 0; j < np; j++)
values(i,j) = values(i,j)+temp(i,j);
}
Now, no big data is kept on the system-stack. Matrix entries are kept on the home-made LocalHeap stack. There are still limits, since control structures are kept on the system stack.