Skip to content

Split inner dimension while loading data

Hi @mlotto @allabres

Following our discussion about splitting the time dimension into two by Start(), I explored a bit the different usages and I'd like to make a summary here. We only tried to load one file and managed to split time dim into c(week, day), but if we want to load more than one file (e.g., year = c("2015", "2016") & month = c("06", "07")), the Start call can't work well. Fortunately, we have other possible ways to make it. I don't want to overwhelm you right now, but when you need it, you can go through the scripts and resources and we can have further discussion.

We load one data first, without reshaping. We will compare the reshaped results with it.

path1 <- "/esarchive/recon/ecmwf/era5/daily_mean/$var$_f1h/$var$_$year$$month$.nc"
variable <- "prlr"

# Without reshaping
data1 <- Start(dat = path1,
             var = variable,
             year = c('2015'), month = c('06', '07'),
             time = 'all',
             latitude = values(list(0, 6)), latitude_reorder = Sort(decreasing = TRUE),
             longitude = values(list(0, 5)), longitude_reorder = CircularSort(0, 360),
             synonims = list(latitude = c('lat', 'latitude'), longitude = c('lon', 'longitude')),
             return_vars = list(latitude = 'dat', longitude = 'dat',
                                time = c('year', 'month')),
             retrieve = TRUE)
dim(data1)
#      dat       var      year     month      time  latitude longitude 
#        1         1         1         2        30        21        18 
time1 <- attr(data1, 'Variables')$common$time
dim(time1)
# year month  time 
#    1     2    30 

[Method 1: time selector is an array of indices; split]
I said that the array must be time values, but I was wrong. It could be indices as well (thanks for this use case, I didn't know Start() could work like this!)

time_arr_ind <- array(1:30, dim = c(day = 10, week = 3))
data3 <- Start(dat = path1,
             var = variable,
             year = c('2015'), month = c('06', '07'),
             time = indices(time_arr_ind), # [day, week]
             latitude = values(list(0, 6)), latitude_reorder = Sort(decreasing = TRUE),
             longitude = values(list(0, 5)), longitude_reorder = CircularSort(0, 360),
             synonims = list(latitude = c('lat', 'latitude'), longitude = c('lon', 'longitude')),
             return_vars = list(latitude = 'dat', longitude = 'dat',
                                time = c('year', 'month')),
             split_multiselected_dims = TRUE,  #*reshape
             retrieve = TRUE)
dim(data3)
#      dat       var      year     month       day      week  latitude longitude 
#        1         1         1         2        10         3        21        18 

time3 <- attr(data3, 'Variables')$common$time
dim(time3)
# year month   day  week 
#    1     2    10     3 

identical(as.vector(data1), as.vector(data3))
#[1] TRUE

[Method 2: time selector is an array of value; merge & split]

merge_across_dims and split_multiselected_dims are used to reshape the data. This usage is more complicated but useful to load exp and obs with a consistent structure (see usecase 1_7). Notice that year and month need to combine because param time_across can only have one.

## Use time1 as the following time selector
time_arr <- array(time1, dim = c(yr_m = 2, time = 10, week = 3))
time_arr <- as.POSIXct(time_arr, origin = '1970-01-01', tz = 'UTC')

path2 <- "/esarchive/recon/ecmwf/era5/daily_mean/$var$_f1h/$var$_$dates$.nc"  # use $dates$ instead of $year$$month$
data2 <- Start(dat = path2,
             var = variable,
             dates = c('201506', '201507'),
             time = time_arr,  #[yr_m, time, week]  # must have 'time' dim
             time_across = 'dates',  #*reshape
             merge_across_dims = TRUE,  #*reshape
             split_multiselected_dims = TRUE,  #*reshape
             latitude = values(list(0, 6)), latitude_reorder = Sort(decreasing = TRUE),
             longitude = values(list(0, 5)), longitude_reorder = CircularSort(0, 360),
             synonims = list(latitude = c('lat', 'latitude'), longitude = c('lon', 'longitude')),
             return_vars = list(latitude = 'dat', longitude = 'dat',
                                time = c('dates')),
             retrieve = TRUE)
dim(data2)
#      dat       var      yr_m      time      week  latitude longitude 
#        1         1         2        10         3        21        18 

time2 <- attr(data2, 'Variables')$common$time
dim(time2)
#yr_m time week 
#   2   10    3 

identical(as.vector(data1), as.vector(data2))
#[1] TRUE

I'm going to create a use case to show the first method, then I'll close this issue. Let me know if you want to know more at some point.

Best,
An-Chi