# Computing resources and granularity

By default all operations in textS / textM are performed either locally ([instant and deferred evaluation](tools.md#instant-and-deferred-modes)) and on *default* resources ([workflow evaluation](tools.md#workflow-mode)). How default resources are defined depends on the computing platform used for the evaluation. In the most cases local evaluation is done in one process and one thread on one processor core interactively.

Some of the operations might require computing resources that are not available locally or differ from the default resources.

In the [*workflow* evaluation mode](tools.md#workflow-mode) it is possible to specify the computing resources required for a specific statement. This is achieved by adding special annotations.

## Resource annotations

The resource annotations apply only to [variable statements](scl.md#variables) and are interpreted only in [workflow mode](tools.md#workflow-mode). They are placed at the end of the statement.

Syntax:

```
[for <computing time> [seconds|munites|hours|...]]
[on <integer> cores | core [with <memory> [KiB|MiB|...]]]
```

The computing time and memory are positive integer or floating-point numbers. The ordering for time and cores can be reversed. The memory can be specified after the number of cores only if the number of cores is specified. The memory must be an integer multiple of a byte.

Example:

```
f(x) = x
a1 = f(1) on 2 cores with 2 [GB] for 2.0 [minutes]
a2 = f(1) on 4 cores for 1.0 [minute]
b = (numbers: 1, 2, 3, 4)
```

**Note:** SLURM or other batch systems may not support arbitrary memory size specifications. For example, 2 GiB cannot be represented in any way to be accepted by SLURM because only decimal GB, MB and KB are supported and just bytes are not.

## Policies for interactive and batch execution

If a statement has resource annotations, then the interpreter creates a task for *batch* execution through the batch queuing system (see the [glossary](#glossary) below for further details) such as Slurm. If no resource requirements are specified, then the interpreter creates a task for *interactive* execution.

## Granularity

The granularity of evaluations in workflow mode is determined by the amount of computations in individual [variables](scl.md#variables). This is because the interpreter maps variables to *tasks* within a task graph which is used to construct a workflow. By default, the interpreter maps each variable to one task and inserts only this task in a workflow node. However, there are alternative mappings used by the interpreter to achieve parallelization, better performance and/or avoid unnecessary evaluations.
The following table summarizes these mappings.

Statements -> tasks | Tasks -> nodes | Implementations
--------------------|----------------|----------------
 one-to-one         | one-to-one     | standard [variables](scl.md#variables)
 one-to-many        | one-to-one     | [parallel map, filter, reduce](#number-of-chunks), [lazy evaluation](evaluation.md#order-of-evaluation-and-short-circuiting)
 one-to-one         | many-to-one    | None

Packing several tasks in one node (the last unimplemented option) would be needed if 1) one task produces large data used only in one other task or 2) the model contains very large number of strictly sequential statements with very short evaluation times.

**Rationale:** If a variable has several parameters, different from references to other variables, these will be executed on the same resources. This is beneficial only if no parameter or only one parameter requires a large amount of computation. If two or more parameters require high and different amounts of computation the granularity has to be decreased by splitting the statement into two or more statements such that in every statement maximum one parameter requires large computing resources (for example >95%).

**Example:**

Let us have two functions `f()` and `g()` that for different inputs, here for the sake of simplicity denoted by `...`, may have different behavior, i.e. require different amounts of computing resources. Let us show two typical cases:

```
b = f(...) # 1 hour on 10 cores
c = g(...) # 5 hours on 5 cores
a = h(b, c) # recommended
# a = h(f(...), g(...)) # not recommended
```

In this case `f(...)` and `g(...)` can be evaluated on different resources because they are mapped to two different tasks. This is more efficient than evaluating `f(...)` and `g(...)` sequentially on the same resources 
as exemplified by the commented statement which maps to one single task.
The single-task example would request 10 cores for 6 hours total, of which 5 cores would remain idle for 5 hours.

As second case, consider the following snippet:

```
# b = f(...) # 1 second on 1 core
# c = g(...) # 3 seconds on 1 core
# a = h(b, c) # not recommended
a = h(f(...), g(...)) # recommended
```

In this case `f(...)` and `g(...)` should be written in one statement due to their low computing resource requirements. Splitting the evaluation in two statements, i.e. two tasks will only increase the overhead due to latency.

Here, we only consider computing resource requirements of parameters to adjust granularity of statements. In addition, we may want to *reuse* parameters or to increase *readability* of the model source. For example, in the first case above, the computationally expensive parameters `f(...)` and `g(...)` can be reused elsewhere in the model through the variables `b` and `c`, and this is desirable, whereas in the second case these parameters are not accessible elsewhere in the model.


## Number of chunks

Some operations are data-parallel, i.e. the same operation is performed uniformly on many elements of the same type. By default, these operations are performed sequentially on the same resources within one task. The textS language provides an annotation to enable parallel execution of this type of operations. The syntax is `in <integer> chunks [resources]` at the end of the statement before optional resource specifications. The number-of-chunks annotation is interpreted in workflow evaluation mode for the [`map`](scl.md#map-function), [`filter`](scl.md#filter-function-and-filter-expression) and [`reduce`](scl.md#reduce-function) functions. The input data parameters of these functions, that are [Series](scl.md#series) or [Table](scl.md#table) types, are split into the specified number of equally sized *chunks* (as much as possible). Then the interpreter creates, for each chunk, a separate task and assigns it to an individual node. This enables parallel chunk evaluation on different resources.

**Example:**

```
b = (s: 1, 2, 3)
c = map((x: x**2), b) in 2 chunks for 1.0 [hour] on 1 core with 3 [GiB]
```

In this example, the input data parameter `b` is split into 2 chunks and the map operation is performed on each chunk in 2 independent tasks. After all tasks are completed their outputs are merged in one output that can be used by a reference to the variable `c`.

**NOTE:** The number of chunks may not be larger than the number of elements in the input data parameter(s).

**NOTE:** Statements in COMPLETED state including parallel `map`, `filter` or `reduce` and all their ancestor statements may not be updated using the [:= operator](scl.md#dealing-with-failures) and may not be rerun using the [%rerun magic](tools.md#specific-features).

### Load balancing

The interpreter will split the input data as much as possible into equally sized chunks. Nevertheless, if the number of data elements is not divisible by the number of chunks some of the chunks may have different number of elements.

## Resource configuration

[Resource annotation](#resource-annotations) enables constructing a [*qadapter* object](https://vre-middleware.readthedocs.io/en/latest/qadapter.html) fully automatically by the interpreter. For this the interpreter uses the resource configuration created by the [VRE Middleware](https://vre-middleware.readthedocs.io/en/latest/resconfig.html). The configurations of computing resources (see the [glossary](#glossary)) and [environment modules](https://lmod.readthedocs.io/) are captured fully automatically by running the [resconfig tool](https://vre-middleware.readthedocs.io/en/latest/resconfig_cli.html) but some settings specific to the *run-time environment* are not. These are:
* custom *environment variables* not set during resconfig generation
* custom *environment modules* not found in the default `$MODULEPATH`
* any *shell commands* needed to be executed before actual evaluation starts
* custom *launch directory*

All configurations can be set or modified (using Python) as described [here](https://vre-middleware.readthedocs.io/en/latest/resconfig.html).


### Setting up the run-time environment by the interpreters

The procedure has the following stages:

1. Load the default resource requirements from the interpreter's internal *spec* if specified.
2. Process custom requirements, if provided in textS/textM input using [resource annotations](#resource-annotations), [number of chunks](#number-of-chunks) or [calculator](amml.md#calculator) version specification. These override the defaults.
3. The resulting requirements are checked against the resconfig. In instant, deferred and interactive mode a warning is issued in case of mismatch. In batch mode a mismatch causes a "Resource configuration error".
4. a) In instant, deferred and interactive mode, the required resources are checked in the current environment. An attempt is performed to set environment variables and load environment modules immediately before the evaluation is started. b) In batch mode a [custom qadapter](tools.md#custom-qadapter) is constructed and this is later used by the launcher.


## Glossary

* [**Computing resource**](https://en.wikipedia.org/wiki/System_resource) requirement can be, for example, the number of processors, the size of memory and an estimate of the job running time.

* [**Granularity**](https://en.wikipedia.org/wiki/Granularity_(parallel_computing)) (or grain size) of a task is a measure of the amount of work (or computation) which is performed by that task.

* [**Batch queuing system**](https://en.wikipedia.org/wiki/System_resource) helps managing jobs on computing clusters, in particular to manage the computing resources and to schedule jobs to resources according to their specific requirements.