.. _mpi:

=================================
MPI
=================================

Overview
---------------------------------

Trident has basic support for using MPI to solve multiple scenarios in parallel.

The `trident` python package comes with a default python script for settng up a MPI cluster and a Communicator implementation for OpenMpi and MSMPI.

When running Trident in a MPI-cluster, one has to use :class:`Client <trident.engine.Client>` to communicate with the ngtlm engine.

Currently this is limited to only running scenarios in parallel, and the nodes have to share the same physical file system. Both of these
limitations will be removed in the future.

Dependencies
---------------------------------

trident-mpi
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The python pip package `trident-mpi` is required when using the default way to run Trident with MPI. It can be downloaded on ngltm.no, and
installed through pip:

.. code-block:: bash

  pip install <path-to-package>

OpenMPI (Linux)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

`OpenMpi` can be installed through `apt`:

.. code-block:: bash

  sudo apt install openmpi-bin

See https://www.open-mpi.org/ for more information regarding `OpenMpi`.

MSMPI (Windows)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Instructions on how to install `MSMPI` can be found here: https://learn.microsoft.com/en-us/message-passing-interface/microsoft-mpi


Running
--------------------------------

Trident provides a default way to start up a MPI cluster with the `trident.mpi` module that is part of the `trident` python package.

For information about the possible options, run:

.. code-block:: bash

  python -m trident.mpi --help

The command for starting up a MPI-cluster should be the same whether you're using `OpenMPI` or `MSMPI`.

.. code-block:: bash

  mpiexec -n <num_nodes> python -m trident.mpi -d <data_path>


For an example of using :class:`Client <trident.engine.Client>` to upload a data set and performing a run, see the notebook example `basic_run_mpi` found at trident.no.

Performance
---------------------------------

The primary limitation of exploting parallelization when using Trident stems from the LP solver used, e.g. `cplex` or `coin`. Solvers are highly memory intensive
and there's thus a limit to the number of LP problems you can solve in parallell on a single CPU-chip, due to the size of the L3 cache + memory bandwidth, without
performance degradation.

Based on our benchmarking, a rule of thumb is to set the number of MPI nodes per CPU-chip to be half the number of available cores (not logical CPUs).
But we also strongly recommend the users to benchmark themselves on their speficifc hardware.

E.g. if you have a AMD Ryzen Threadripper 5995WX CPU (which has 64 cores, and 128 thread/logical cores), `num_nodes=32` will most likely be the most optimal number.