Evaluation methods

In the basics section it has been noted that the order of the statements in the program provides no information for the interpreter and the output will not depend on this order. Rather the dependencies between variables and their parameters determine the execution order.

Language interpreters and evaluation modes

Currently, three different interpreters can be used to execute programs. These interpreters do not perform eager evaluation, i.e. do not evaluate parameters (such as function calls, expressions etc.) as soon as they are interpreted. Rather the evaluation is triggered only when a certain value is actually needed. Thus, the order of evaluation is neither the order of statements in the source code nor the order of interpretation.

Another aspect of evaluation is the time of evaluation - during interpreter runs (let us call it immediate) and after the interpreter has finished (deferred evaluation).

A third aspect is the location of evaluation: we make difference between local and remote evaluation. This is when we need additional computing resources that are not available locally (i.e. local HPC cluster with a batch system and a JupyterHub instance connected to it).

The choice of interpreter determines the evaluation mode that is selected with --mode | -m command-line flags.

The evaluation of parameters is essentially governed by the evaluation policies of the evaluation modes and by the occurrence of print and view statements in the code. See next section for more details.

Order of evaluation and short-circuiting

Some built-in function calls and some expression types are evaluated in the so-called normal order, i.e. only these input parameters that are actually needed are evaluated. The values of all input parameters are cached so that they are evaluated (if needed) only once. This is sometimes referred to as lazy evaluation, or call-by-need. Expressions including if, or, and evaluated in this way are known as short-circuiting expressions. In contrast, other parameters may be evaluated in applicative order (implemented as call-by-value), i.e. their evaluation begins only after all input parameters have been evaluated.

Obviously, the mode and order of evaluation do not affect the outputs of the model but rather the behavior, i.e. the performance, location, resources used, time and costs of evaluation. In particular, computational resources can be saved by avoiding unnecessary evaluations.

The order of evaluation in various cases is summarized in the table below. In instant (--mode instant, default) and deferred (--mode deferred) evaluation modes, all parameters are evaluated in normal order without any exceptions and the or, and and if expressions are short-circuiting.

In workflow mode (--mode workflow), the behavior is more complex and depends on the selected evaluation policy, the type of statement and the level of nesting, as shown in the table below. Normal order is used only when the if or the Boolean expressions (such including or and and) are parameters of variable, print or view statements, i.e. “top-level” and not nested. Variables including the annotation ? are evaluated in normal order.

Evaluation mode	Evaluation policy	Evaluation order	Nesting	Statements	Examples with `if` # behavior
instant	on demand	normal	any	any	`c = 2*if(true, a, b/2) # b not evaluated`
deferred	on demand	normal	any	any	`c = 2*if(not true, a, b) # a not evaluated`
workflow	all (-r)	applicative	any	any	`print(if(true, a, b)) # a and b evaluated`
workflow	on demand (-rd)	applicative	any	variable	`c = if(true, a, b) # a and b evaluated`
workflow	on demand (-rd)	normal	top-level	variable	`c = if(true, a, b)? # b not evaluated`
workflow	on demand (-rd)	applicative	nested	variable	`c = 2*if(true, a, b) # a and b evaluated`
workflow	on demand (-rd)	normal	nested¹	print/view	`print(2*if(true, a, b)) # b not evaluated`
workflow	on demand (-rd)	normal	top-level	print/view	`print(if(true, a, b)) # b not evaluated`

¹ Applicative order in the expression evaluation (or function call) within map, filter and reduce functions.

NOTE: Statements in COMPLETED state containing the ? annotation and all their ancestor statements may not be updated using the := operator and may not be rerun using %rerun magic.

NOTE: The level of nesting of normal evaluation order in workflow mode can become unlimited in future interpreter implementations. To achieve the desired behavior relying on the currently implemented top-level nesting, the user has to decompose the expressions and define a variable for every nested if or Boolean expression.

Examples

In the following example, the input parameter a of the if function is not evaluated because only the second input parameter, that is a string literal, is only needed and evaluated.

a = 'abc'
expr = true
b = if(expr, 'xyz', a)
print(b)

Swapping 'xyz' and a (or changing the first input to false) will cause a to be evaluated but not the string literal xyz. Only evaluation in instant or deferred modes will have this behavior. In workflow mode, because the if function is in a variable statement, both xyz and a will be evaluated before the evaluation of the if function is started, no matter of the chosen evaluation policy. However, in workflow evaluation mode with on-demand (--on-demand --autorun flags) or no-evaluation (no flags) policy, the if function is short-circuiting:

a = 'abc'
expr = true
print(if(expr, 'xyz', a))

In this mode it is also easy to check, that a is in fact not computed, with no-evaluation policy (omit --autorun flag), the statement print(a) will print n.c..

Example of rewriting an expression to allow deeply nested normal-order evaluation:

d = (a or b and not c)?

where a, b and c are variables of Boolean type (may have true or false values) but also may have null values. In this example, only the top-level or will be evaluated in normal order (note that or has highest precedence). The whole expression can be evaluated in normal order by defining an auxiliary variable d_:

d_ = (b and not c)?
d = (a or d_)?

Checkpoint and recovery

Evaluation may be interrupted for various reasons. In instant and deferred evaluation modes there is no persistence of completed evaluations and the model must be started from scratch. In contrast, there are several levels of checkpointing and recovery available in workflow evaluation mode.

Node level

In workflow evaluation mode, the workflow nodes that contain completed evaluations have COMPLETED state and their outputs are saved. This can be viewed as a checkpoint at the node level. This type of checkpointing is performed automatically by default and recovery at this level is performed when the %rerun magic is used.

Task level

Usually, one workflow node includes the evaluation of one statement. However, other mappings are also possible, for example nodes including several statements. The evaluation of such statements is mapped to a list of tasks within a single workflow node. The tasks are evaluated sequentially using the same computing resources. In cases where some tasks have been completed but some are not completed due to a failure, a partial recovery of the already completed tasks is possible using the %recover magic. Thus the recovery evaluation of the node starts by repeating the evaluation of the first failed task.

Step level

Some iterable parameters, such as Property, are evaluated in steps that are mapped to Custodian Job objects. At this level the evaluation of every single step is atomic. In the current implementation, the computing resources of the step evaluations are shared and the order of evaluation is sequential, similar to the tasks. A recovery at this level is always activated when the %recover magic has been used.

Application level

Many external applications used as backends for Algorithms and Calculators may provide further levels of checkpointing and restart. A recovery at this level is always activated when the %recover magic has been used. The applications are signalled about the recovery by the environment variable VRE_LANG_RECOVERY_LAUNCH. If set, it indicates the number of task-level recovery. Furthermore, the parameter restart is set to true, when the Algorithm or Calculator used in the relevant iterable step supports this parameter. In addition, the launch directory is restored automatically to the state of the last completed task and the last completed step.

Level	Node states	Recover with	Checkpoint storage	Evaluation starts with
Node	FIZZLED, RUNNING	%rerun	database	READY nodes
Task	FIZZLED, RUNNING	%recover	database & launch directory	first failed task
Step	FIZZLED, RUNNING	%recover	launch directory	first incomplete step
Application	FIZZLED, RUNNING	%recover	app-specific	app-specific

Typical use cases

There may be many reasons for failed evaluations. Some failed evaluations may not be recovered when the output does not exist, e.g. evaluations including devision by zero. The subsections below outline some typical scenarios in which a task/step/app level recovery would be beneficial.

Node in FIZZLED state

If the node has one task and Property with one step, then running %recover is recommended only if app-level recovery is possible, i.e. the Calculator/Algorithm understands the restart flag. If app-level recovery is not possible, then using the %recover magic is only recommended for many-step Property.

A reason for a FIZZLED state can be ConvergenceError. In this case, recovery is only effective at the application level and requires checkpointing at the same level.

Node in permanent RUNNING state

If a node is in RUNNING state for longer that certain timeout then the node is marked a lost run. In this state, such a node can be restarted in recovery mode with %recover.

A reason for lost runs can be that an evaluation in the node has exceeded a wall-time limit from the resource management system (e.g. Slurm).

NOTE: If a node in RUNNING state is not marked as lost run no restart / recovery action will be performed; an error will be issued instead.