Input and output operations (I/O)

In the basics section it has been noted that the order of the statements in the program provide no information for the interpreter and the output will not depend on this order. Rather the dependencies between variables and their parameters determine the execution order.

The statements for I/O operations, such as view and print, loading from file/URL and exports to file/URL introduce side effects. Without them it is impossible to interact with the program. These effects create situations where a specific ordering of evaluations becomes necessary under certain policies.

Language interpreters and execution modes

Currently, three different interpreters can be used to execute programs. None of the interpreters performs eager evaluation, i.e. evaluates the parameters (such as function calls, expressions etc.) by the interpreter as soon as they are encountered. Rather the evaluation is triggered only when a certain value is actually needed. Thus, the order of evaluation is neither the order of statements in the source code nor the order of interpretation. This is sometimes referred to as lazy evaluation.

Another aspect of evaluation is the time of evaluation - during interpreter runs (let us call it immediate) and after the interpreter has finished (deferred evaluation).

A third aspect is the location of evaluation: we make difference between local and remote evaluation. This is when we need additional computing resources that are not available locally (i.e. local HPC cluster with a batch system and a JupyterHub instance connected to it).

The choice of executor determines the mode of execution that is selected with --mode | -m command-line flags.

Order of evaluation

The print statements are evaluated in the order of occurrence in the program and their values (of string type) are concatenated and printed on screen. The view statements are executed in the order of occurence in the input. In instant and deferred executors the view statements are blocked by the evaluations of their parameters. In workflow executor the statements are not blocked by evaluations: if all necessary evaluations have been completed then they are executed, otherwise a warning about not computed values is issued. The view statements are blocking when used in the command-line tools texts or ipython. In Jupyter Notebook or JupyterLab they are non-blocking.

All other statements, including those with file/network I/O operations, have an order of execution that is determined by the dependencies and other factors, e.g. the evaluation policy (see below).

Evaluation policies

In addition to execution mode, we make difference between three different evaluation policies.

Evaluate none

The policy is applied in all executors with default flags.

Evaluate all

The policy can be applied to workflow executor by adding the flag --autorun. All parameters are evaluated no matter whether they should be printed or not, and after all evaluations are completed, the interpreter executes the print statements in the order of their occurrence.

Evaluate on demand

Evaluation is triggered by print statements in instant executor and in deferred executor, with default flags, and in workflow executor by adding two additional flags: --autorun and --on_demand.

Implementation of I/O calls

In the instant and deferred executors, all I/O operations are implemented as blocking calls. This means that the interpreter does not return the control until all parameters of all print statements are evaluated and printed in the order of print statements.

In the workflow executor, the interpreter calls for all I/O statements different from print statements are non-blocking, i.e. the interpreter immediately returns after iterpreting them. In contrast, the print statements are all implemented with blocking calls, i.e. the interpreter returns the control only after the print output has been written on the screen.

Persistence

The instant and deferred executors provide no persistence of the model. Therefore, the model and parts of it cannot be reused and the model cannot be extended. Also no data provenance information is saved.

The workflow executor interprets the model and stores the model instance as a workflow on a database (MongoDB). Provenance metadata is captured and saved and deferred remote or distributed execution is enabled. All statements are persisted except for view, print and vary statements. See chapter Bulk processing for more about vary statements.

Background I/O operations

In the workflow executor, sometimes a computed parameter value allocates too much memory and cannot be stored into the node / launch documents on the database. Then a fully transparent mechanism is automatically triggered to store the value in a file. To this end, a threshold for the maximum size in memory is set; the default is 100,000 bytes. In addition, one can set different store types (local file system or GridFS in MongoDB), different file formats (JSON is the only one implemented currently), and data compression. Obviously, all these settings have effect on performance and storage volumes but do not change the results or any other behavior.

If you wish to change these settings you can create a custom datastore configuration file with these contents:

type: file                          # can be 'gridfs' for database file object storage
inline-threshold: 100000            # threshold for offloading data, in bytes
path: /path/to/local/workspace      # directory used if type is 'file'
name: vre_language_datastore        # collection name used if type is 'gridfs'
launchpad: /path/to/launchpad.yaml  # path to custom launchpad file used if type is 'gridfs'
format: json                        # 'yaml' and 'hdf5' not implemented
compress: true                      # use compression

The type: file setting triggers storage in local files in path. Setting type: null deactivates the mechanism regardless of the other settings. The default path is $HOME/.fireworks/vre-language-datastore. The path will be created automatically if it does not exist. The default launchpad is LAUNCHPAD_LOC as provided by FireWorks. All other default settings are shown in the example above.

The default path of the datastore configuration file is $HOME/.fireworks/datastore_config.yaml. It will be automatically loaded, if the file exists. If your datastore configuration has a different location then you must set the environment variable

export DATASTORE_CONFIG=/path/to/datastore_config.yaml

before running the texts tool. If the variable $DATASTORE_CONFIG should be used then it has to also be added to the default_envvars list in the relevant worker as described here.

It is not recommended to set inline-threshold to very small values, unless you are testing.

If your model has a large number of parameters with moderate storage requirements it is recommended to use type: gridfs and adapt inline-threshold as needed. If your model has a small number of parameters with large storage requirements then it is recommended to use a workspace in the local file system (type: file).

The datastore metadata, i.e. type, name, launchpad, path, format and compress are stored permanently per individual object and not for all objects in a model. Change of datastore configuration is possible at any time but effective only for the objects that are created as this configuration is active. The external resources, i.e. the file paths, databases, collection names, etc., used for the storage should be available permanently to ensure that the model data can be used later.