Bulk processing
Using bulk processing a program can be run using a series of different inputs for a set of variables. This is very common in use cases that are often characterized by parameter sweeps/scans, high-throughput computing etc. Consider computing the function f(x) = x**2 for a series of values (a: 1, 2, 3). One can use the built-in map function to accomplish this:
a_ser = (a: 1, 2, 3)
result = map((x: x**2), a_ser)
print(result)
By processing in workflow mode, after the evaluation is performed, the result is persisted in a database storage. In this simple example, the statement with the map function is evaluated interactively but in many other more practical cases it may have to be evaluated on an computing cluster in batch mode. In such cases, the map function can also be parallelized. Nevertheless, to extend the list of values in a_ser series with a_ser_further_values = (a: 4, 5), one cannot reuse this model and simply expand a_ser because it is immutable. What can be done is to concatenate result with result_further_values where the map code has to be repeatedly written for a_ser_further_values that can be computed to produce the overall result. Apart from the bloated code lacking reuse, further extensions cannot be run in parallel because a second extension can start running only after the concatenated result from the first extension is completed.
A more scalable way to manage this is to process every value in a different model and hold the thus related models in a group. The code
vary ((a: 1, 2, 3))
result = a**2
print(result)
will produce three models, one for each value of a. The printed result will be 1, 4 or 9, depending on the active model (default active model is 0).
Now, to extend the existing group with two more models for a = 4, 5 one can simply add another vary statement:
vary ((a: 4, 5))
print(result)
Another very useful case is a model with a = 1
a = 1
result = a**2
print(result)
that has to be extended for a = 2, 3, 4, 5 with reusing the code and the result for a = 1. Extending this model with vary ((a: 2, 3, 4, 5)) produces the same effect.
Syntax and semantics of the vary statement
Syntax: The syntax of the vary statement is the vary keyword followed by a table.
Constraints: The vary statement can include either variables that are already used in the persistent models of the group, or variables that are not yet used (new variables). Several vary statements are merged using outer or Cartesian joins for old variables and for new variables, respectively. All vary statements must be of the same kind, i.e. they must include either old or new variables only. In the case of vary of old variables, these additional constraints apply:
the table must include full tuples of all varied variables;
the types of the new values must match the types of already used values;
the units of the new values must be the same as the units already in use.
In the following, several examples for valid and invalid model extensions will be given based on this persistent model:
a = 1
d = 'x'
Examples for valid extensions
Further values for old variable
a:vary ((a: 2, 3))
This will add two more models to the group. The effective
varystatement for the group will be((a: 1, 2, 3)).Values for a new variable
b:vary ((b: 1, 3))
This will add one model to the group. The total vary for the group will be
((a: 1, 1), (b: 1, 3)).Values for two new variables (Cartesian join):
vary ((b: 1, 2)) vary ((c: true, false, true))
This will add three models to the group. The duplicated
truevalue forcwill be dropped. The total vary after the update of the model group will be((a: 1, 1, 1, 1), (b: 1, 1, 2, 2), (c: true, false, true, false)).Values for two new variables (inner join)
vary ((b: 1, 2), (c: true, false))
This will add one model to the group. The resulting vary of the group will be
((a: 1, 1), (b: 1, 2), (c: true, false)).
Examples for invalid extensions
Mixing new values for old variable
aand values for a new variablebwith inner join:vary ((a: 2, 3), (b: 1, 3))
This vary statement is ambiguous because there is no value for
bcorresponding toa = 1. The solution is to extend the model in two cycles, i.e. in two subsequent input cells in Jupyter or in the interactive session or using two model update scripts.Here is a possible solution:
Cycle 1:
vary ((b: 1, 3))
Result:
vary ((a: 1, 1), (b: 1, 3))
Cycle 2:
vary ((a: 2, 3), (b: 1, 3))
Result:
vary ((a: 1, 1, 2, 3), (b: 1, 3, 1, 3))
Mixing new values for old variable
aand values for a new variablebwith Cartesian join:vary ((a: 2, 3)) vary ((b: 1, 3))
Also in this case, a possible solution is to perform two subsequent extensions, one for each variable. Starting with
athe result will bevary ((a: 1, 2, 3, 1, 2, 3), (b: 1, 3, 1, 3, 1, 3))
Starting with
b, the result will bevary ((a: 1, 1, 2, 3), (b: 1, 3, 1, 3))
Incomplete tuples of old variables. Consider the persistent model group:
vary ((a: 1, 2), (b: 2, 1))
The following statement is invalid because there is no value specified for
bcorresponding toa = 3:vary ((a: 3))
The following statement is a possible solution:
vary ((a: 3, 3), (b: 2, 1))
Mismatching types. Consider a model with this vary statement:
vary ((a: 1, 2))
The following extension is invalid because the type of
amust be quantity and not boolean:vary ((a: false))
Life cycle of vary statements
All vary statements in one script or in one Jupyter input cell are merged into one effective vary statement after applying the constraints. The vary statements in further input cells are interpreted independently.
Behavior of print and view statements
The print and the view statement are only applied to the active model. The active model can be selected in the interactive tools by using the %uuid magic. The active model cannot be selected in the script tool, i.e. in texts script, where it is always the first model in the group. The bahavior of print and view implies that only parameters in the active model are evaluated in case of on-demand evaluation.
Evaluation mode
The bulk processing feature is available only in workflow evaluation mode. Specifying a model UUID triggers the loading of the whole group of models.
Interactive sessions
In interactive sessions (Jupyter notebook or texts session) the %vary magic command prints the current varied parameters, together with their pertinent model UUIDs in table format. The magic %uuid displays the currently active model UUID and in parentheses () the UUIDs of all models in the processed group. The active model can be selected by specifying the model UUID after the %uuid keyword.