Add the option to do lazy loading and use Compute()
This issue is to explore ideas and track the development of lazy loading (Start() call with retrieve = F) + using Compute() to handle operations on large datasets.
Several considerations for now.
- The current modules need to be adapted, to work with both arrays and s2dv_cube objects. This will require some changes to what we have right now:
- CST_ functions do not work with arrays (Solution: define fun dynamically and use
do.call()) - Input parameters cannot be a list (Solution: Change the input parameters of each module)
- logger cannot work inside Compute() (Solution: add 'if' condition)
- The data cannot be saved inside Compute() (Solution: add 'if' condition)
- In the Skill module, $data is used directly by the functions, but the element does not exist in the startR_array (Solution: use
do.call()?) - Some modules require 'obs' to have the
'ensemble'dimension (Solutions: Add the dimension in a pre-processing module; change the functions in the packages to eliminate this requirement)
- Some ideas (testing them!):
- Create a function (
'compute_fun()') to be placed inside Compute(), which is basically the sequence of the modules that need to be used, in the correct order. I am exploring how to do this dynamically from the recipe, but it could also be done by the user. - Create a wrapper function that takes
compute_fun()and builds the calls toStep(),AddStep()andCompute()(and if possible/necessary,Collect()). This function would be called from the main script. - Create a pre-processing module to perform several basic operations like: changing units, eliminating or adding array dimensions, performing preliminary checks on the data)
- Create a function (
as.s2dv_cube()wrapper) to convert the result back to s2dv_cube with the correct metadata. Then the workflow can continue. - The user could define the chunk dimensions and the number of chunks for each dimension in the recipe. Then they are added dynamically to the
Step()call.
- Other potential challenges:
- Compute() + Autosubmit
- Saving the outputs
- ...
- Restrictions:
- The dimensions to chunk by need to be the same for all the modules within a workflow.
- ...
Edited by vagudets