Computing resources and granularity

By default all operations in SCL / AMML are performed either locally (instant and deferred evaluation) and on default resources (workflow evaluation). How default resources are defined depends on the computing platform used for the evaluation. In the most cases local evaluaton is done in one process and one thread on one processor core interactively.

Some of the operations might require computing resources that are not available locally or differ from the default resources.

In the workflow evaluation mode it is possible to specify the computing resources required for a specific statement. This is achieved by adding special annotation.

Granularity

Currently the evaluation’s granularity of the workflow executor is determined by the amount of computations in individual variables. The reason for this is that the interpreter maps variables to tasks within a task graph which is used to construct a workflow.

Rationale: If a variable has several parameters, different from references to other variables, these will be executed on the same resources. This is beneficial only if no parameter or only one parameter requires a large amount of computation. If two or more parameters require high and different amounts of computation the granularity has to be decreased by splitting the statement into two or more statements such that in every statement maximum one parameter requires large computing resources (for example >95%).

Example:

Let us have two functions f() and g() that for different inputs, here for the sake of simplicity denoted by ..., may have different behavior, i.e. require different amounts of computing resources. Let us show two typical cases:

b = f(...) # 1 hour on 10 cores
c = g(...) # 5 hours on 5 cores
a = h(b, c) # recommended
# a = h(f(...), g(...)) # not recommended

In this case f(...) and g(...) can be evaluated on different resources because they are mapped to two different tasks. This is more efficient than evaluating f(...) and g(...) sequentially on the same resources as exemplified by the commented statement which maps to one single task. The single-task example would request 10 cores for 6 hours total, of which 5 cores would remain idle for 5 hours.

As second case, consider the following snippet:

# b = f(...) # 1 second on 1 core
# c = g(...) # 3 seconds on 1 core
# a = h(b, c) # not recommended
a = h(f(...), g(...)) # recommended

In this case f(...) and g(...) should be written in one statement due to their low computing resource requirements. Splitting the evaluation in two statements, i.e. two tasks will only increase the overhead due latencies.

Here, we only consider computing resource requirements of parameters to adjust granularity of statements. In addition, we may want to reuse parameters or to increase readability of the model source. For example, in the first case above, the computationally expensive parameters f(...) and g(...) can be reused elsewhere in the model through the variables b and c, and this is desirable, whereas in the second case these parameters are not accessible elsewhere in the model.

Resource annotation

The resource annotations apply only to variable statements and are interpreted only in workflow mode. They are placed at the end of the statement.

Syntax:

[for <computing time> [seconds|munites|hours|...]]
[on <integer> cores | core [with <memory> [KiB|MiB|...]]]

The computing time and memory are positive integer or floating-point numbers. The ordering for time and cores can be reversed. The memory can be specified after the number of cores only if the number of cores is specified. The memory must be an integer multiple of a byte.

Example:

f(x) = x
a1 = f(1) on 2 cores with 2 [GB] for 2.0 [minutes]
a2 = f(1) on 4 cores for 1.0 [minute]
b = (numbers: 1, 2, 3, 4)

Note: SLURM or other batch systems may not support arbitrary memory size specifications. For example, 2 GiB cannot be represented in any way to be accepted by SLURM because only decimal GB, MB and KB are supported and just bytes are not.

Policies for interactive and batch execution

Currently, if a statement has resource annotations, then the interpreter creates a task for batch execution through the batch queuing system (see the glossary below for further details) such as Slurm. If no resource requirements are specified, then the interpreter creates a task for interactive execution.

Number of chunks

Some operations are data-parallel, i.e. the same operation is performed uniformly on many elements of the same type. By default, these operations are perfomed sequentially on the same resources within one task. The SCL provides an annotation to enable parallel execution of this type of operations. The syntax is in <integer> chunks [resources] at the end of the statement before optional resource specifications.

As result the data input parameters of these functions are split into (as much as possible) equally sized chunks with the specified number. Then the interpreter creates a separate task for each chunk. This enables evaluation of the chunks on different resources that can be performed also in parallel. Currently, the number-of-chunks annotation is interpreted only in workflow evaluation mode for map, filter and reduce functions.

Example:

c = map((x: x**2), b) in 2 chunks for 1.0 [hour] on 1 core with 3 [GiB]

In this example, b (which must be of Series type) is split into 2 chunks and the map operation is performed on each chunk in 2 independent tasks. After all tasks are completed their outputs are merged in one output that can be used by a reference to the variable c. It is noted that the chunks (input and output) cannot be accessed in the language.

Load balancing

The interpreter will split the input data as much as possible into equally sized chunks. Nevertheless, if the number of data elements is not divisible by the number of chunks some of the chunks may have different number of elements.

Resource configuration

Resource annotation enables constructing a qadapter object fully automatically by the interpreter. For this the interpreter uses the resource configuration created by the VRE Middleware. The configurations of computing resources and environment modules are captured fully automatically by running the resconfig tool but some settings specific for the run-time environment are not. These are:

  • environment variables

  • shell commands

  • launch directory

These and all other configurations can be set or modified (using Python) as described here.

Glossary

  • Computing resource requirement can be, for example, the number of processors, the size of memory and an estimate of the job running time.

  • Granularity (or grain size) of a task is a measure of the amount of work (or computation) which is performed by that task.

  • Batch queuing system helps managing jobs on computing clusters, in particular to manage the computing resources and to schedule jobs to resources according to their specific requirements.