DDBB sync improvements
By addressing issue #49 (closed) and following issue #34, I found that the DDBB tables can be improved to give better results for searching experiments.
In this way, some issues have to be handled:
-
Avoid deletion: First, in the
populate_detailsbackground task, thedetailstable could use aINSERTorUPDATEstrategy instead of deleting the whole table and populating it from scratch. This is important because, if there are a lot of experiments or this process breaks at some point, this might make data unavailable for searching until the next call to the background task (4 hours). Also, the same strategy must be applied in theexperiment_statustable, because the status registry is deleted after the experiment finishes, instead of being updated. -
Single Source of Truth (SSOT): Table data must have a Single Source Of Truth for each data concept. Then, I single function for getting a piece of data should be applied and mapped somewhere in the documentation, if possible (having a data catalog might help). For example, there should be just one way to obtain which is the user of the experiment for every endpoint and, with that, update the DDBB. Then, that information can be obtained from the table (as a cached snapshot) and, if needed in real-time, call the same function that was used to update it.
-
Reactive updating: As explained above, every time an SSOT function is called, data should be updated in the DDBB (if it is feasible and scalable). Then, DDBB can provide more recent data directly without calling the SSOT function (Is assumed that DDBB data is faster to get than calling the SSOT function).
-
Extend data available in DDBBs: As Autosubmit grows, other data concepts might be included in the
detailstable (e.g: wrapper type, job status counters, etc) or in one-to-many additional tables (e.g: metadata as @bdepaula suggested). This will enrich the search by using only the data available from the DDBB as optimally desired.
As a scratch, a way to handle these improvements might be following this:
- Declare SSOT functions for each relevant data
- Refactor workers (background tasks) to call the SSOT functions and populate tables
- Decide which endpoints (
/v4) need to call SSOTs or DDBBs and refactor them - Apply reactive update to the ones that call SSOTs if it is feasible
- Extend DDBB data and repeat 1-4 for the new relevant data