# Input and output operations (I/O)

The statements for I/O operations, such as `view` and `print`, and loading from and storing to file/URL introduce *side effects*. Without them it is impossible to interact with the program. These effects create situations where a specific ordering of evaluations becomes necessary under certain policies.

The `print` statements are evaluated in the order of occurrence in the program and their values (of string type) are concatenated and printed on screen. The `view` statements are executed in the order of occurrence in the input. 
All other statements, including those with file/network I/O operations, have an [order of evaluation](evaluation.md#order-of-evaluation-and-short-circuiting) that is determined by the dependencies and other factors, e.g. the evaluation policy (see below).

## Modes of evaluation

The behavior of `print` and `view` statements depends on the evaluation (execution) mode. In the [command-line tools](tools.md), the evaluation mode is selected by using the `--mode` switch and the keywords `instant`, `deferred` and `workflow`.


## Evaluation output

The `print`/`view` and `to file` statements show and store, respectively, the evaluation results. The `to file` operation always stores the eventual value of a variable, whereas the `print` and `view` display values depending on the current *state* of the evaluation of their parameters. In workflow evaluation mode with remote evaluation (selected by the flag `--async-run` in [command-line tools](tools.md)), or with evaluation turned off (see policies below), the `print` statement may print `n.c.` (not computed) if the evaluation has not been completed. Similarly, a warning about not computed values is issued by the `view` statement which returns no value itself.

## Evaluation policies

In addition to evaluation mode, we make difference between three different evaluation policies.

### Evaluate none

This policy is applied in workflow evaluation mode with default flags.

### Evaluate all

This policy can be applied in *workflow* mode by adding the flag `--autorun`. All parameters are evaluated no matter whether they should be printed or not, and after all evaluations are completed, the `print` and `view` statements are executed in the order of their occurrence.

### Evaluate on demand

Evaluation is triggered by `print` and `view` statements in *instant* and *deferred* modes, with default flags, and in *workflow* mode by adding two additional flags: `--autorun` and `--on_demand`.


## Behavior of I/O calls

The behavior of I/O operations in the different evaluation modes is summarized in the following table.

| Mode     | Location        |  Time     | Policy    | Persistence | `to file` | `from file` |
|----------|-----------------|-----------|-----------|-------------|-----------|-------------|
| instant  | local           | immediate | on demand | no          | blocking  | blocking    |
| deferred | local           | deferred  | on demand | no          | blocking  | blocking    |
| workflow | -            | -  | evaluate none | all without `view`, `print`, `vary`, `:=` | - | - |
| workflow | local (sync)    | deferred  | all, on demand | all without `view`, `print`, `vary`, `:=` | blocking | blocking |
| workflow | remote (async)  | deferred  | all, on demand | all without `view`, `print`, `vary`, `:=` | non-blocking | non-blocking |

It is noted that the blocking behavior of I/O calls described in the table is relative to the interpreter *process*. Obviously, the `from file` operations are blocking the *process* in which the evaluation (not the interpretation) of their descendants is taking place and `to file` operations are blocked by the evaluation of their ancestors. In instant and deferred evaluation this is always the same process. If the I/O operations are running in a different process, as in the case of remote evaluation (chosen by the flag `--async-run` in [command-line tools](tools.md)), then these are non-blocking.

In all cases the evaluation of `print` and `view` statements is **blocking** the interpreter. This means that the interpreter does not return the control until all `print` and `view` statements are evaluated.


## Model persistence

The instant and the deferred evaluation modes provide no persistence of the model. Correspondingly no data *provenance* information is saved. Therefore, the model (or parts of it) can be reused or extended only within the source code.

In workflow mode, the interpreter creates a workflow and stores it on a database (MongoDB). Provenance metadata is captured and saved and deferred remote or distributed execution is enabled. All statements are persisted except for `view`, `print`, `vary` and update (`:=`) statements. See chapter [Bulk processing](bulk.md) for more about `vary` statements.

## Background I/O operations

In the workflow mode, sometimes a computed parameter value allocates too much memory and cannot be stored into the node / launch documents on the database. Then a fully transparent mechanism is automatically triggered to store the value in a file. To this end, a threshold for the maximum size in memory is set; the default is 100,000 bytes. In addition, one can set different store types (local file system or GridFS in MongoDB), different file formats (JSON, YAML, HDF5 or application-specific), and data compression. Obviously, all these settings have effect on performance and storage volumes but do not change the results or any other behavior.

If you wish to change these settings you can create a custom datastore configuration file with these contents:

```yaml
type: file                          # can be 'url' or 'gridfs' for database file object storage
inline-threshold: 100000            # threshold for offloading data, in bytes
path: /path/to/local/workspace      # directory used if type is 'file'
name: vre_language_datastore        # collection name used if type is 'gridfs'
launchpad: /path/to/launchpad.yaml  # path to custom launchpad file used if type is 'gridfs'
format: json                        # can be 'json', 'yaml', 'hdf5' or 'custom'
compress: true                      # use compression (true or false)
```

The `type: file` setting triggers storage in local files in `path`. Setting `type: null` deactivates the mechanism regardless of the other settings.
The default `path` is `$HOME/.fireworks/vre-language-datastore`. The `path` will be created automatically if it does not exist. The default `launchpad` is `LAUNCHPAD_LOC` as provided by FireWorks. All other default settings are shown in the example above.

The default path of the datastore configuration file is `$HOME/.fireworks/datastore_config.yaml`. It will be automatically loaded, if the file exists. If your datastore configuration has a different location then you must set the environment variable

```bash
export DATASTORE_CONFIG=/path/to/datastore_config.yaml
```

before running the `texts` tool. If the variable `$DATASTORE_CONFIG` should be used then it has to also be added to the `default_envvars` list in the relevant worker as described [here](https://vre-middleware.readthedocs.io/en/latest/resconfig.html#configuration-of-environment-variables).

It is not recommended to set `inline-threshold` to very small values, unless you are testing.

If your model has a large number of parameters with moderate storage requirements it is recommended to use `type: gridfs` and adapt `inline-threshold` as needed. If your model has a small number of parameters with large storage requirements then it is recommended to use a workspace in the local file system (`type: file`).

The datastore metadata, i.e. `type`, `name`, `launchpad`, `path`, `format` and `compress` are stored permanently per individual object and not for all objects in a model. Change of datastore configuration is possible at any time but effective only for the objects that are created as this configuration is active. The external resources, i.e. the file paths, databases, collection names, etc., used for the storage should be available permanently to ensure that the model data can be used later.

## Data schema validation

All persisted data on all types of storage (i.e. `file`, `gridfs`, `url` and `null`)
have JSON schemas.

It is recommended to activate JSON schema validation. This will early detect errors and
increase security of loading data from files, databases or from the web.

To activate JSON schema validation, the lines

```yaml
JSON_SCHEMA_VALIDATE: true
JSON_SCHEMA_VALIDATE_LIST: []
```

in $HOME/.fireworks/FW_config.yaml have to be added. If `JSON_SCHEMA_VALIDATE_LIST` is already in the file it should not be changed otherwise an empty list should be added.