Skip to content

[DestinE] Communication between two or more different workflows

DestinE phase 2

Cc, @mcastril ,

This issue should contain ideas and implementations about communicating between different workflows.

  • Dependencies? Complex(like a workflow graph) or simple( Run workflow b when workflow A does the signal)?
  • Launch command should be autosubmit launch launch_suite.yml or autosubmit launch a001,a002,a003,a004?
  • Signal should be file-based? How do you generate the signal?

  • How to set and read configured signals

Simple

a000

JOBS:
  SECTION_A:
    FILE:
    ...
    SUITE:
      METHOD: "ON_COMPLETED"    

The signal, setting would be:

  • Similar to the "checkpoint" function, we add the function generate_workflow_signal to all cmds
  • Users call to add this %WORKFLOW_SIGNAL% in the templates they want and code the logic themselves

Complex

launch_suite.yml located somewhere outside the experiments

Using ASconfigparser, read as_conf.experiment_data["JOBS"] and add it as as_conf.experiment_data["JOBS_%EXPID%] afterwards read the launch_suite.yml

JOBS_A000:
  DEPENDENCIES:
   jobs_a000.section_a: 
     job_names: (list)...
     or
     DATE: ... [n:m], any, all
     MEMBER: ...[n:m], any, all
     CHUNK: ...[n:m], any, all
     SPLIT: ...[n:m], any, all
     FROM_STATUS: "COMPLETED" or "RUNNING"
   jobs_a000.section_b: # equals to put everything to ALL
   jobs_a001.section_a:
   jobs_a002.section_a: 
JOBS_A001:
 ...

autosubmit launch needs:

  • A way of detecting which workflows can be created and run. (through reading the yaml )
  • A way of setting the dependencies between jobs of different workflows ( through reading the yaml)
  • A way of detecting that some workflow has failed jobs.
    • What to do? Stop all related experiments?
  • A way of stopping and retaking the launch from the previous status.
  • A way of detecting finished workflows so they don't run again.

I am not sure if I missed something.

Edited by dbeltran