MPI

Overview

Trident has basic support for using MPI to solve multiple scenarios in parallel.

The trident python package comes with a default python script for settng up a MPI cluster and a Communicator implementation for OpenMpi and MSMPI.

When running Trident in a MPI-cluster, one has to use Client to communicate with the ngtlm engine.

Currently this is limited to only running scenarios in parallel, and the nodes have to share the same physical file system. Both of these limitations will be removed in the future.

Dependencies

trident-mpi

The python pip package trident-mpi is required when using the default way to run Trident with MPI. It can be downloaded on ngltm.no, and installed through pip:

pip install <path-to-package>

OpenMPI (Linux)

OpenMpi can be installed through apt:

sudo apt install openmpi-bin

See https://www.open-mpi.org/ for more information regarding OpenMpi.

MSMPI (Windows)

Instructions on how to install MSMPI can be found here: https://learn.microsoft.com/en-us/message-passing-interface/microsoft-mpi

Running

Trident provides a default way to start up a MPI cluster with the trident.mpi module that is part of the trident python package.

For information about the possible options, run:

python -m trident.mpi --help

The command for starting up a MPI-cluster should be the same whether you’re using OpenMPI or MSMPI.

mpiexec -n <num_nodes> python -m trident.mpi -d <data_path>

For an example of using Client to upload a data set and performing a run, see the notebook example basic_run_mpi found at trident.no.

Performance

The primary limitation of exploting parallelization when using Trident stems from the LP solver used, e.g. cplex or coin. Solvers are highly memory intensive and there’s thus a limit to the number of LP problems you can solve in parallell on a single CPU-chip, due to the size of the L3 cache + memory bandwidth, without performance degradation.

Based on our benchmarking, a rule of thumb is to set the number of MPI nodes per CPU-chip to be half the number of available cores (not logical CPUs). But we also strongly recommend the users to benchmark themselves on their speficifc hardware.

E.g. if you have a AMD Ryzen Threadripper 5995WX CPU (which has 64 cores, and 128 thread/logical cores), num_nodes=32 will most likely be the most optimal number.