Skip to content

DDT not connecting to multiserver

When we run with DDT (allinea parallel debugger) the procedure used is usually the following: run ddt locally on the workstation (or laptop) and in the slurm submission script prepending to the MPI run the ddt command as follows:

module load ddt/19.0.5
ddt --connect mpirun -np N_CORES ./nemo

Anyway when trying to run with with ORCA1 we had the following problem: it was connecting just to the first n-cores. In the beginning I thought it was due to the Slurm directive

#SBATCH --ntasks-per-node 20

that we were using for being able to spawn then N openMP thread. After some tests came out that the problem is not Slurm nor OpenMP. Oriol suggested that the problem was when running on more that one node, and he was right.

After some research I found here instruction on how to run with more that one node. I am documenting here the steps:

  1. Launch DDT locally

  2. Remote launch the connection to MN4 login node

  3. Select "Run and debug a program"
    Screenshot_from_2023-01-30_17-05-35

  4. In the "application" box put the absolute path to you executable

  5. In the "working directory" put the path to the run folder with input data + namelists Screenshot_from_2023-01-30_17-05-26

  6. Select "Configure" in the "Submit to queue" Screenshot_from_2023-01-30_17-07-00

  7. In the "Submission template file" I put a path to the following script

    #SBATCH --nodes=NUM_NODES_TAG
    #SBATCH --time=WALL_CLOCK_LIMIT_TAG
    #SBATCH --job-name="ddt"
    #SBATCH --output=allinea.stdout
    #SBATCH --error=allinea.stdout
    
    source /gpfs/scratch/bsc32/bsc32402/RUN/ORIOL_OpenMP/param.cfg
    set_env
    AUTO_LAUNCH_TAG
  8. Close the tab and hit the "Submit" botton

The script is submitted on the "normal" queue, so change manually the queue from MN4 login node if you don't want to queue for a long time.

Probably changing the submission script, adding a line spiecifying the debug queue, would change this behavior.

@oduran @gutrera

Edited by sparonuz