Itoyori  v0.0.1
Pitfalls

Itoyori enables concise and straightforward expression for parallel algorithms, but Itoyori-style programming has some (perhaps nonintuitive) pitfalls. If you encounter segmentation faults or weird behavior with Itoyori, it is recommended to consult this document to check if your program does not violate any of the following rules.

Lifetime of Global Objects

Global memory must be freed before finalization.

Bad example:

int main() {
/* ... */
// The global vector is destroyed after finalization!
}
Global vector to manage a global memory region.
Definition: global_vector.hpp:129
void fini()
Finalize Itoyori (collective).
Definition: ityr.hpp:84
void init(MPI_Comm comm=MPI_COMM_WORLD)
Initialize Itoyori (collective).
Definition: ityr.hpp:69

Good example:

int main() {
{
/* ... */
}
}

Captures in Lambda Expressions for Forking

In Itoyori, threads cannot have raw pointers/references to any other threads, including their parents. Therefore, function arguments should be passed to child threads by values, not by references. This means lambda expressions for fork operations should capture variables by copy.

However, this will enforce copy semantics to captured objects. For example, this can be problematic when ityr::global_vector is used for parallel execution.

Bad example:

ityr::global_vector<int> gv({.collective = true}, 100);
/* ... */
ityr::root_exec([=] { // A copy of the entire global vector is created
[=](int x) { /* ... */ });
});
constexpr read_t read
Read-only checkout mode.
Definition: checkout_span.hpp:19
constexpr parallel_policy par
Default parallel execution policy for iterator-based loop functions.
Definition: execution.hpp:89
void for_each(const ExecutionPolicy &policy, ForwardIterator first, ForwardIterator last, Op op)
Apply an operator to each element in a range.
Definition: parallel_loop.hpp:136
auto root_exec(Fn &&fn, Args &&... args)
Spawn the root thread (collective).
Definition: root_exec.hpp:47
global_iterator< T, Mode > make_global_iterator(ori::global_ptr< T > gptr, Mode)
Make a global iterator to enable/disable automatic checkout.
Definition: global_iterator.hpp:158

The above program will clone the whole data managed in the global vector across the ityr::root_exec() call. To prevent unnecessary copy, the user can instead use ityr::global_span, which does not hold ownership for the memory region.

Good example:

ityr::global_vector<int> gv({.collective = true}, 100);
ityr::global_span<int> gs(gv.begin(), gv.end());
/* ... */
[=](int x) { /* ... */ });
});
Global span to represent a view of a global memory range.
Definition: global_span.hpp:33

Or, create a global vector inside ityr::root_exec(), but this is allowed only in the root thread. Nevertheless, users should pay attention to vector copying across lambda functions (e.g., in ityr::parallel_invoke()).

Another pitfall exists in using lambda expressions inside a class/struct. To demonstrate this problem, suppose that an additional parameter cutoff is added to the Fibonacci example, so that sufficiently small leaf computations run sequentially.

Bad example:

struct fib {
fib(int c) : cutoff(c) {}
long calc(int n) const {
if (n <= cutoff) {
return calc_serial(n);
} else {
auto [x, y] =
[=] { return calc(n - 1); },
[=] { return calc(n - 2); });
return x + y;
}
}
long calc_serial(int n) const { /* ... */ }
int cutoff;
};
auto parallel_invoke(Args &&... args)
Fork parallel tasks and join them.
Definition: parallel_invoke.hpp:238

Until C++20, this is implicitly captured by reference, even if the default copy capture = is specified. This means that *this object in the parent is referred by children, which is not allowed in Itoyori. To prevent that, *this objects must be explicitly copy-captured (e.g., [=, *this]) when making fork/join calls inside a class/struct.

Good example:

struct fib {
fib(int c) : cutoff(c) {}
long calc(int n) const {
if (n <= cutoff) {
return calc_serial(n);
} else {
auto [x, y] =
[=, *this] { return calc(n - 1); },
[=, *this] { return calc(n - 2); });
return x + y;
}
}
long calc_serial(int n) const { /* ... */ }
int cutoff;
};

Nevertheless, copy semantics will apply to the class object with this fix, which may not be desired in some cases. One option is to move the cutoff parameter outside the class by globalizing it.

Usage of Heap Memory Across Thread Migration

As threads can be dynamically migrated to other processes in Itoyori, allocating objects in normal heap memory is not recommended. For example, standard containers such as std::vector will cause heap memory allocation.

Bad example:

std::vector<int> v(100);
// The executing process can be different from the previous one
for (auto&& x : v) {
x = /* ... */;
}
});

In the above example, the root thread allocates std::vector in the local process's heap memory, after which the thread forks child threads. However, at fork/join calls, the thread can be dynamically migrated to another process that does not have access to the previous process's heap memory. In order to keep heap-allocated objects across fork/join calls, they must be allocated in global heaps (e.g., by using std::global_vector) and accessed with checkout/checkin calls.

Good example:

auto vc = ityr::make_checkout(v.data(), v.size(), ityr::checkout_mode::read_write);
for (auto&& x : vc) {
x = /* ... */;
}
});
constexpr read_write_t read_write
Read+Write checkout mode.
Definition: checkout_span.hpp:39
checkout_span< T, Mode > make_checkout(ori::global_ptr< T > gptr, std::size_t n, Mode mode)
Checkout a global memory region.
Definition: checkout_span.hpp:238

Checkout/Checkin Across Thread Migration

Similar to the previous pitfall, checkout/checkin operations cannot go across fork/join calls.

Bad example:

auto vc = ityr::make_checkout(v.data(), v.size(), ityr::checkout_mode::read_write);
/* ... */
// The checkin operation occurs here
});

With ityr::make_checkout(), a checkin operation is automatically performed when its lifetime is over, but if fork/join calls are in the middle, the thread can be migrated to other processes. As performing a pair of checkout/checkin operation in different processes is not allowed, checked-out memory must be returned to the system before fork/join calls.

Good example:

auto vc = ityr::make_checkout(v.data(), v.size(), ityr::checkout_mode::read_write);
/* ... */
vc.checkin(); // explicit checkin
// checkout again after fork/join if needed
/* ... */
});

Note that, if the thread is eventually executed by the same process, the global memory is likely to be cached in the local process.

Nested Parallelism with Global Iterators

Global iterators are convenient to automatically checkout global memory with high-level parallel patterns, but unfortunately, they are incompatible with nested parallelism.

Bad example:

/* ... */
gv.begin(), gv.end(),
[=](int x) {
/* ... */
ityr::parallel_invoke(/* ... */});
return x + /* ... */;
});
});
Reducer::accumulator_type transform_reduce(const ExecutionPolicy &policy, ForwardIterator first, ForwardIterator last, Reducer reducer, UnaryTransformOp unary_transform_op)
Calculate reduction while transforming each element.
Definition: parallel_reduce.hpp:167
Definition: reducer.hpp:15

In the above example, ityr::transform_reduce() internally checks out memory for gv at some granularity, but performing fork/join operations at each iteration causes the above-mentioned issue of checkout/checkin across thread migration. If there is nested parallelism, automatic checkout should be disabled by specifing ityr::checkout_mode::no_access mode.

Good example:

/* ... */
[=](auto&& x_ref) { // type: ityr::ori::global_ref<int>
/* ... */
ityr::parallel_invoke(/* ... */);
return x_ref.get() + /* ... */;
});
});
constexpr no_access_t no_access
Checkout mode to disable automatic checkout.
Definition: checkout_span.hpp:48

With ityr::checkout_mode::no_access mode, global references (ityr::ori::global_ref) are passed as arguments to the user-provided function. Calling checkout/checkin operations is then on the user's responsibility. In the above case, global reference x_ref is used to get the global value with ityr::ori::global_ref::get().

Unintentional Data Races

Itoyori's checkout mode (ityr::checkout_mode) must be specified so that no data race occurs. Even if the program does not actually modify the checked-out data, the runtime system treats the checked-out region as dirty if read_write or write mode is specified. This means that the checkout mode is different from access privilege, and specifying ityr::checkout_mode::read_write is not a conservative approach.

Bad example:

[=] {
/* read-only access for `cs` */
},
[=] {
/* read-only access for `cs` */
});

The above program concurrently checks out the same region with the read_write mode. Even if the program does not actually write to the region, this is not allowed in Itoyori because it can lead to unintentional data update.

Good example:

[=] {
/* read-only access for `cs` */
},
[=] {
/* read-only access for `cs` */
});

The user should precisely specify the checkout mode for each checkout call.