Teaching/Parallel

print · login

Home » Teaching » Parallel

Theme: HPC

With the increased demand for artificial intelligence and big data applications, High-Performance Computing needs to make the transition from supercomputing centres to the wider audience. The increasing popularity of frameworks such as TensorFLow, PyTorch, Matlab, or NumPy are witness of this development.

This research theme tries to look at application areas, domain specific languages (DSLs), and language designs that bridge the help making HPC accessible to the masses. Some concrete topics of interest are:

a) Functional approaches to efficient graph processing; declarative graph processing.

Graph-processing typically is based on the idea of pointers, making it ideally suited for an imperative setting. While non-cyclic graphs can be easily manipulated in a declarative style, cyclic graphs pose a bigger challenge. Some non-trivial approaches exist, however, they lack an efficient implementation. Here, we look into ways to find simpler solutions as well as a code generator for generating efficient parallel code.

b) Advanced whole-world optimisations vs highly tuned libraries; how to beat highly-tuned libraries such as MKL or BLAS?

Libraries such as MKL or BLAS are often considered the non-plus-ultra. However, in several cases, it could be shown that compilers that take the calling context into consideration can outperform such libraries. Here, we look at applications areas such as vision or AI and investigate how whole-world compilers compete against tuned libraries.

c) New number formats for reduced memory bandwidth; control your number representations :-)

Many modern applications are memory bound, i.e., their performance is not dominated by the compute power that is thrown at them but by the throughput the memory subsystem can provide. Here, we look at new number formats such as POSITS instead of IEEE floating point, n-bit integers, or n-bit reals.

d) Increasing Computational Efficiency Through Rank Recursion;

As of lately, we have explored the opportunity to encode data locality through the use of higher-dimensional arrays. In Rank-Polymorphism for Shape-Guided Blocking, we demonstrate at the example of matrix multiply that this approach can achieve the highest levels of performance from very small generic specifications. As an added benefit, the genericity of this approach can also be leveraged to prove the correctness of it. In Parallel scan as a multidimensional array problem, the same approach has been used to define generic san operations. In this project, we want to further explore this approach by applying the idea to other algorithms such as FFT.

e) Exploring Locality of Stencil Operations; (2025 -- Ayrton Vû-Guérin ?)

Stencil Operations on larger arrays are usually memory bound as the computational density is rather low. A key approach to dealing with this problem is to perform several stencil operations on a fraction of the overall data at a time. That way, caches are being used much more effectively, delivering much better overall performance. This project looks at ways of formulating such algorithmic variants concisely and generically, again, building on rank-polymorphism as provided by SaC.

f) Code Generation for improved Coalescence on GPUs; (2025 -- Tom Philipsen)

When generating GPU code from data parallel specifications, bijections between index spaces and thread spaces are the key ingredient to performance. They control the scheduling of parallel computations to individual streaming processors. As such the choice of mapping is not only responsible to ensure that hardware limitations are respected, but it is also responsible for enabling favourable memory accesses, referred to as "'coalescing"'. This project looks at ways to derive such mappings for given kernel codes. The work is done in the context of SaC, where bijections between indices and threads can be controlled via a small set of pragmas. If none is specified, the compiler applies a simple heuristic that ensures target hardware compliance. This project looks at improving the heuristics by looking at the implications on coalescence that these pragmas have.

g) Swarm Intelligence in SaC; (with Mart Lubbers and Matan Yah Ben Zion (from AI)) (2026 -- Michael Schrijer)

Swarm intelligence in SaC. Swarm intelligence, swarm computing, or particle simulation is a domain where SaC probably flourishes but there are some open questions. For example, in order to simulate very big swarms, neighbour lists are created often and it is not obvious how to do this in an array language.

Theme: Compiler Technology

a) Consider Thrust as a target for code generation; how to use Thrust without learning Thrust?

Thrust (https://thrust.github.io) offers a wide range of highly optimised computational kernels for multi-core and GPU systems. The flexibility of these templates makes them generically applicable while providing excellent parallel performance. Orchestrating Thrust programs by hand, however, is tedious and error-prone. Here, we look at ways on how to generate Thrust programs from more abstract high-level programs such as SaC programs.

b) Advanced Reference Counting; on combining reference counting with uniqueness attributes.

Reference Counting sofar has not made it into the mainstream. With its advantages in a parallel setting there is a rise in interest in this area. Combining techniques such as uniqueness-inference and reference counting offers opportunities for further improvements. Another area of investigation in the context of reference counting is efficient support for nested data structures.

c) Integrating SaC into KNIME; facilitating support for generic SaC-defined data processing pipelines in KNIME.

KNIME is an open source tool for commercial data processing pipelines. In principle, it should be possible to create SaC-implemented data processors for such tool chains. However, it would have to rely on runtime-specialisation of code to ensure parallel runtimes competitive with hand-written code. This project investigates the feasibility and performance of such an approach.

d) A generic language server; generating language servers from compiler calls. (2025 -- Lucas van den Laan)

A language server enables hooking up programming language tool chains with a variety of IDEs, including visual studio code and others. However, building a language server for a full fledged language is a non-trivial task which typically requires re-implementing many parts of any pre-existing compiler/interpreter. This project aims at building a generic language server that uses dedicated calls to pre-existing language tools for implementing the actual language server capabilities.

e) Running SaC on Neuromorphic Architectures; compiling SaC programs into ONNX.(2025 -- Daan Spijkers)

ONNX is an open source standard for executing dataflow graphs of tensor expressions on all kinds of modern neuromorphic platforms. In principle, it should be possible to compile SaC's tensor comprehensions into such graphs. This project investigates the feasibility and performance of such an approach.

f) Running SaC on Neuromorphic Architectures; compiling SaC programs into NIR.

NIR is yet another open source standard for executing dataflow graphs of tensor expressions on all kinds of modern neuromorphic platforms. In principle, it should be possible to compile SaC's tensor comprehensions into such graphs. This project investigates the feasibility and performance of such an approach.

g) A lean Foreign-Function-Interface between managed and unmanaged runtimes;

Most languages offer an FFI (Foreign-Function-Interface) to communicate data between different managed or unmanaged runtime systems. Typically, these are either unsafe, i.e., if the programmer does not adhere to some specified behaviour, anything can happen, or, there are safe but introduce a significant overhead. In this project, we try to develop an approach that combines both, a safe yet efficient FFI.

h) Compilation to Microcontrollers;(2025 -- Patricia Ojeda Velázquez)

This project looks into generating code from SaC for microcontrollers such as the Raspberry Pi Pico. The key challenges are the need to support some form of IO and the development of ideas on how to port a runtime system developed for Unix systems into the realm of microcontrollers.

i) Parallel Computation on a Cluster of Raspberry Pi Zeros;(2025 -- Oscar Schoonakker)

This project investigates how efficiently MPI applications can run on very small systems such as the 15$ Raspberry Pi Zero. The key challenge here is to evaluate the latency and bandwidth of different possible communication channels between Raspberry Pi Zeros.

j) Reference Counting for Nested Data Structures;(2025 -- Quinten Kock)

Reference counting refers to a technique of automated memory management based on the idea of delayed copying, non-delayed garbage collection. In contrast to many classical garbage collection schemes, reference counting enables earliest possible reclamation of memory that is no longer needed for the current program execution. While this has been proven to be very efficient in the context of programming languages that support non-nested data structures, this project looks into ways to efficiently implement reference counting for nested data structures. We envision a prototypical implementation in the context of the SaC programming language as a proof of concept.

k) Subarray Selections: to copy or not to copy;(2026 -- Max Boer-Blazdell)

When implementing subarray selections, this can be done in two fundamentally different ways: either by copying out the selected elements or by sharing the allocated memory and simply referencing the selected elements. This project explores the implications that these two choices have in the context of implicit memory management through reference counting. To enable practical experimentation, we implement the latter as an alternative to the copying implementation that currently exists in the SaC compiler.

Theme: Array Programming

Array programming languages such as SaC offer high-levels of programmer productivity. This theme looks at extending such languages in ways that allow to further improve expressiveness of these languages and, with it, the programmer productivity. Some topics of interest are:

a) A new semantics for array programming languages; how to introduce some laziness without being lazy :-)

For some programs, it is possible to statically infer that not all data is needed for computing the overall result. Languages based on normal-order reduction or laziness guarantee that only those parts of a program are being evaluated that are needed for the result. While this is conceptually nice, the implementation of lazy evaluation comes at a potentially very high price. We try to look at languages that are based on a strict evaluation but who can nevertheless disregard terms that are known not to contribute to the overall result.

b) Transfinite Arrays; streaming as a special form of arrays.

Streaming can be seen as computing with arrays of infinite size. Using the notion of Ordinal numbers as indices allows for a new form of multi-dimensional streaming. There are many challenges in this context.

c) A Category of Reshapes (2025 -- Gideon de Hoop)

Given an n-dimensional array, how many permutations of it can be formed by repeatedly applying reshapes and transpositions? And is there an efficient datastructure to represent such permutations in memory? This is fundamentally a group-theoretic problem, but we can elegantly approach it using a bit of category theory, which lets such permutations be represented graphically as "string diagrams". This project may include finding proofs of special cases, and some experimentation using Computer Algebra software GAP. (suggested and co-supervised by Dario Stein!)

d) Arrays with higher-dimensional shapes; expressing proximities between the axes of a higher-dimensional array :-)

The shape of any n-dimensional array can be expressed by a shape vector. This implies a linear order of the indices. However, in several cases of program optimisation it would be convenient to express higher levels of neighbourhood. An example for this is the blocking of a 2-dimensional matrix. In this project, we look into higher-dimensional shapes that allow indices themselves to be multi-dimensional arrays.

e) Recursive Tensor Comprehensions; allowing for recursive dependencies in Tensor Comprehensions. (2025 -- Michel van Wijk)

Tensor comprehensions are a convenient way of defining data-parallel operations on arrays. In essence, they are map-like operations that compute arrays as manifestations of functions from index-spaces to values in memory. Efficient parallel implementations can be derived rather straight-forwardly building on the non-recursive nature of classical tensor comprehensions. In many cases, such tensor comprehension can be directly derived from mathematical specifications. Unfortunately, in mathematics, these definitions sometimes contain dependencies. Translating these directly into code would require a form of recursive tensor comprehension. This project looks into ways how a semantics of such expression can be defined and implemented within a compiler.

f) Code Generation for Sparse Arrays; how to generate code when we know something about the sparseness of arrays.

This project looks into code generation for classical sparse array problems. We start out from experiments of encoding algorithms in SaC, trying to identify what sparsity format can be implemented efficiently and to what extend sparse codes can be derived from specifications that also work for dense arrays.

g) Increased Type Safety; looking into state of the art type systems such as dependent types or intersection types to improve static guarantees for array languages. (2025 -- David Logtenberg, Quinten Cabo)

This project looks into ways of expressing and verifying domain restrictions in the context of shape polymorphic array operations.

h) Jupyter Notebooks for Advanced Array Programming; how to enable interactive array programming by blending compiler-calls and Jupyter notebook technology. (2025 -- Malte Wiethoff)

Starting from a pre-existing prototype, this project looks into improving the interactivity of array programming in SaC.

i) Platform Games in an Array Language. (2025 -- Jens Broeren)

In this project, we investigate how well array languages such as SaC are suited for developing platform games.

In case you are interested in any of these or related topics, please talk to Sven-Bodo Scholz M1.006

Department of Software Science