Skip to content

Add the option to do lazy loading and use Compute()

This issue is to explore ideas and track the development of lazy loading (Start() call with retrieve = F) + using Compute() to handle operations on large datasets.

Several considerations for now.

  1. The current modules need to be adapted, to work with both arrays and s2dv_cube objects. This will require some changes to what we have right now:
  • CST_ functions do not work with arrays (Solution: define fun dynamically and use do.call())
  • Input parameters cannot be a list (Solution: Change the input parameters of each module)
  • logger cannot work inside Compute() (Solution: add 'if' condition)
  • The data cannot be saved inside Compute() (Solution: add 'if' condition)
  • In the Skill module, $data is used directly by the functions, but the element does not exist in the startR_array (Solution: use do.call()?)
  • Some modules require 'obs' to have the 'ensemble' dimension (Solutions: Add the dimension in a pre-processing module; change the functions in the packages to eliminate this requirement)
  1. Some ideas (testing them!):
  • Create a function ('compute_fun()') to be placed inside Compute(), which is basically the sequence of the modules that need to be used, in the correct order. I am exploring how to do this dynamically from the recipe, but it could also be done by the user.
  • Create a wrapper function that takes compute_fun() and builds the calls to Step(), AddStep() and Compute() (and if possible/necessary, Collect()). This function would be called from the main script.
  • Create a pre-processing module to perform several basic operations like: changing units, eliminating or adding array dimensions, performing preliminary checks on the data)
  • Create a function (as.s2dv_cube() wrapper) to convert the result back to s2dv_cube with the correct metadata. Then the workflow can continue.
  • The user could define the chunk dimensions and the number of chunks for each dimension in the recipe. Then they are added dynamically to the Step() call.
  1. Other potential challenges:
  • Compute() + Autosubmit
  • Saving the outputs
  • ...
  1. Restrictions:
  • The dimensions to chunk by need to be the same for all the modules within a workflow.
  • ...
Edited by vagudets