# Input and output operations (I/O)

In the [basics](scl.md#statements) section it has been noted that the order of the statements in the program provide no information for the interpreter and the output will not depend on this order. Rather *the dependencies* between variables and their parameters determine the execution order.

The statements for I/O operations, such as `view` and `print`, loading from file/URL and exports to file/URL introduce *side effects*. Without them it is impossible to interact with the program. These effects create situations where a specific ordering of evaluations becomes necessary under certain policies.

## Language interpreters and execution modes

Currently, three different interpreters can be used to execute programs. None of the interpreters performs *eager evaluation*, i.e. evaluates the parameters (such as function calls, expressions etc.) by the interpreter as soon as they are encountered. Rather the evaluation is triggered only when a certain value is actually needed. Thus, the **order of evaluation** is neither the order of statements in the source code nor the order of interpretation. This is sometimes referred to as *lazy evaluation*.

Another aspect of evaluation is the **time of evaluation** - during interpreter runs (let us call it *immediate*) and after the interpreter has finished (*deferred* evaluation).

A third aspect is the **location of evaluation**: we make difference between *local* and *remote* evaluation. This is when we need additional computing resources that are not available locally (i.e. local HPC cluster with a batch system and a JupyterHub instance connected to it).

The choice of executor determines the *mode* of execution that is selected with `--mode | -m` command-line flags.

## Order of evaluation

The `print` statements are evaluated in the order of occurrence in the program and their values (of string type) are concatenated and printed on screen. The `view` statements are executed in the order of occurence in the input. In *instant* and *deferred* executors the `view` statements are blocked by the evaluations of their parameters. In *workflow executor* the statements are not blocked by evaluations: if all necessary evaluations have been completed then they are executed, otherwise a warning about not computed values is issued. The `view` statements are blocking when used in the command-line tools `texts` or `ipython`. In Jupyter Notebook or JupyterLab they are non-blocking.

All other statements, including those with file/network I/O operations, have an order of execution that is determined by the dependencies and other factors, e.g. the evaluation policy (see below).

## Evaluation policies

In addition to execution mode, we make difference between three different evaluation policies.

### Evaluate none

The policy is applied in all executors with default flags.

### Evaluate all

The policy can be applied to *workflow executor* by adding the flag `--autorun`. All parameters are evaluated no matter whether they should be printed or not, and after all evaluations are completed, the interpreter executes the `print` statements in the order of their occurrence.

### Evaluate on demand

Evaluation is triggered by `print` statements in *instant executor* and in *deferred executor*, with default flags, and in *workflow executor* by adding two additional flags: `--autorun` and `--on_demand`.

## Implementation of I/O calls

In the instant and deferred executors, all I/O operations are implemented as blocking calls. This means that the interpreter does not return the control until all parameters of all print statements are evaluated and printed in the order of print statements.

In the workflow executor, the interpreter calls for all I/O statements different from `print` statements are non-blocking, i.e. the interpreter immediately returns after iterpreting them. In contrast, the `print` statements are all implemented with blocking calls, i.e. the interpreter returns the control only after the `print` output has been written on the screen.

### Print and view operations blocked by parameter evaluation

With the autorun and on-demand evaluation policies, the `print` and `view` statements are blocked by the evaluation of their parameters if the interpreter acts as workflow engine itself, i.e. performs the actual execution. In these cases, the `print` and `view` statements return only when all their parameters have been evaluated.

### Print operation not blocked by parameter evaluation

With no-evaluation policy or if an external workflow engine is used, the `print` statements return immediately. If at the time of the interpretation the parameters have been evaluated their values are printed on the screen, otherwise `n.c.` (*not computed*) is printed.


| interpreter       | location |  time     | policy    | persistence | print calls |
|-------------------|----------|-----------|-----------|-------------|-----------|
| instant executor  | local    | immediate | on demand | no          | blocked  |
| deferred executor | local    | deferred  | on demand | no          | blocked  |
| workflow executor | remote   | deferred  | none, all, on demand | all but not `view`, `print` and `vary` | blocked / not blocked |

## Persistence

The instant and deferred executors provide no persistence of the model. Therefore, the model and parts of it cannot be reused and the model cannot be extended. Also no data provenance information is saved.

The workflow executor interprets the model and stores the model instance as a workflow on a database (MongoDB). Provenance metadata is captured and saved and deferred remote or distributed execution is enabled. All statements are persisted except for `view`, `print` and `vary` statements. See chapter [Bulk processing](bulk.md) for more about `vary` statements.

## Background I/O operations

In the workflow executor, sometimes a computed parameter value allocates too much memory and cannot be stored into the node / launch documents on the database. Then a fully transparent mechanism is automatically triggered to store the value in a file. To this end, a threshold for the maximum size in memory is set; the default is 100,000 bytes. In addition, one can set different store types (local file system or GridFS in MongoDB), different file formats (JSON is the only one implemented currently), and data compression. Obviously, all these settings have effect on performance and storage volumes but do not change the results or any other behavior.

If you wish to change these settings you can create a custom datastore configuration file with these contents:

```yaml
type: file                          # can be 'gridfs' for database file object storage
inline-threshold: 100000            # threshold for offloading data, in bytes
path: /path/to/local/workspace      # directory used if type is 'file'
name: vre_language_datastore        # collection name used if type is 'gridfs'
launchpad: /path/to/launchpad.yaml  # path to custom launchpad file used if type is 'gridfs'
format: json                        # 'yaml' and 'hdf5' not implemented
compress: true                      # use compression
```

The `type: file` setting triggers storage in local files in `path`. Setting `type: null` deactivates the mechanism regardless of the other settings.
The default `path` is `$HOME/.fireworks/vre-language-datastore`. The `path` will be created automatically if it does not exist. The default `launchpad` is `LAUNCHPAD_LOC` as provided by FireWorks. All other default settings are shown in the example above.

The default path of the datastore configuration file is `$HOME/.fireworks/datastore_config.yaml`. It will be automatically loaded, if the file exists. If your datastore configuration has a different location then you must set the environment variable

```bash
export DATASTORE_CONFIG=/path/to/datastore_config.yaml
```

before running the `texts` tool. If the variable `$DATASTORE_CONFIG` should be used then it has to also be added to the `default_envvars` list in the relevant worker as described [here](https://vre-middleware.readthedocs.io/en/latest/resconfig.html#configuration-of-environment-variables).

It is not recommended to set `inline-threshold` to very small values, unless you are testing.

If your model has a large number of parameters with moderate storage requirements it is recommended to use `type: gridfs` and adapt `inline-threshold` as needed. If your model has a small number of parameters with large storage requirements then it is recommended to use a workspace in the local file system (`type: file`).

The datastore metadata, i.e. `type`, `name`, `launchpad`, `path`, `format` and `compress` are stored permanently per individual object and not for all objects in a model. Change of datastore configuration is possible at any time but effective only for the objects that are created as this configuration is active. The external resources, i.e. the file paths, databases, collection names, etc., used for the storage should be available permanently to ensure that the model data can be used later.