Input and output operations (I/O)

The statements for I/O operations, such as view and print, and loading from and storing to file/URL introduce side effects. Without them it is impossible to interact with the program. These effects create situations where a specific ordering of evaluations becomes necessary under certain policies.

The print statements are evaluated in the order of occurrence in the program and their values (of string type) are concatenated and printed on screen. The view statements are executed in the order of occurrence in the input. All other statements, including those with file/network I/O operations, have an order of evaluation that is determined by the dependencies and other factors, e.g. the evaluation policy (see below).

Modes of evaluation

The behavior of print and view statements depends on the evaluation (execution) mode. In the command-line tools, the evaluation mode is selected by using the --mode switch and the keywords instant, deferred and workflow.

Evaluation output

The print/view and to file statements show and store, respectively, the evaluation results. The to file operation always stores the eventual value of a variable, whereas the print and view display values depending on the current state of the evaluation of their parameters. In workflow evaluation mode with remote evaluation (selected by the flag --async-run in command-line tools), or with evaluation turned off (see policies below), the print statement may print n.c. (not computed) if the evaluation has not been completed. Similarly, a warning about not computed values is issued by the view statement which returns no value itself.

Evaluation policies

In addition to evaluation mode, we make difference between three different evaluation policies.

Evaluate none

This policy is applied in workflow evaluation mode with default flags.

Evaluate all

This policy can be applied in workflow mode by adding the flag --autorun. All parameters are evaluated no matter whether they should be printed or not, and after all evaluations are completed, the print and view statements are executed in the order of their occurrence.

Evaluate on demand

Evaluation is triggered by print and view statements in instant and deferred modes, with default flags, and in workflow mode by adding two additional flags: --autorun and --on_demand.

Behavior of I/O calls

The behavior of I/O operations in the different evaluation modes is summarized in the following table.

Mode	Location	Time	Policy	Persistence	`to file`	`from file`
instant	local	immediate	on demand	no	blocking	blocking
deferred	local	deferred	on demand	no	blocking	blocking
workflow	-	-	evaluate none	all without `view`, `print`, `vary`, `:=`	-	-
workflow	local (sync)	deferred	all, on demand	all without `view`, `print`, `vary`, `:=`	blocking	blocking
workflow	remote (async)	deferred	all, on demand	all without `view`, `print`, `vary`, `:=`	non-blocking	non-blocking

It is noted that the blocking behavior of I/O calls described in the table is relative to the interpreter process. Obviously, the from file operations are blocking the process in which the evaluation (not the interpretation) of their descendants is taking place and to file operations are blocked by the evaluation of their ancestors. In instant and deferred evaluation this is always the same process. If the I/O operations are running in a different process, as in the case of remote evaluation (chosen by the flag --async-run in command-line tools), then these are non-blocking.

In all cases the evaluation of print and view statements is blocking the interpreter. This means that the interpreter does not return the control until all print and view statements are evaluated.

Model persistence

The instant and the deferred evaluation modes provide no persistence of the model. Correspondingly no data provenance information is saved. Therefore, the model (or parts of it) can be reused or extended only within the source code.

In workflow mode, the interpreter creates a workflow and stores it on a database (MongoDB). Provenance metadata is captured and saved and deferred remote or distributed execution is enabled. All statements are persisted except for view, print, vary and update (:=) statements. See chapter Bulk processing for more about vary statements.

Background I/O operations

In the workflow mode, sometimes a computed parameter value allocates too much memory and cannot be stored into the node / launch documents on the database. Then a fully transparent mechanism is automatically triggered to store the value in a file. To this end, a threshold for the maximum size in memory is set; the default is 100,000 bytes. In addition, one can set different store types (local file system or GridFS in MongoDB), different file formats (JSON, YAML, HDF5 or application-specific), and data compression. Obviously, all these settings have effect on performance and storage volumes but do not change the results or any other behavior.

If you wish to change these settings you can create a custom datastore configuration file with these contents:

type: file                          # can be 'url' or 'gridfs' for database file object storage
inline-threshold: 100000            # threshold for offloading data, in bytes
path: /path/to/local/workspace      # directory used if type is 'file'
name: vre_language_datastore        # collection name used if type is 'gridfs'
launchpad: /path/to/launchpad.yaml  # path to custom launchpad file used if type is 'gridfs'
format: json                        # can be 'json', 'yaml', 'hdf5' or 'custom'
compress: true                      # use compression (true or false)

The type: file setting triggers storage in local files in path. Setting type: null deactivates the mechanism regardless of the other settings. The default path is $HOME/.fireworks/vre-language-datastore. The path will be created automatically if it does not exist. The default launchpad is LAUNCHPAD_LOC as provided by FireWorks. All other default settings are shown in the example above.

The default path of the datastore configuration file is $HOME/.fireworks/datastore_config.yaml. It will be automatically loaded, if the file exists. If your datastore configuration has a different location then you must set the environment variable

export DATASTORE_CONFIG=/path/to/datastore_config.yaml

before running the texts tool. If the variable $DATASTORE_CONFIG should be used then it has to also be added to the default_envvars list in the relevant worker as described here.

It is not recommended to set inline-threshold to very small values, unless you are testing.

If your model has a large number of parameters with moderate storage requirements it is recommended to use type: gridfs and adapt inline-threshold as needed. If your model has a small number of parameters with large storage requirements then it is recommended to use a workspace in the local file system (type: file).

The datastore metadata, i.e. type, name, launchpad, path, format and compress are stored permanently per individual object and not for all objects in a model. Change of datastore configuration is possible at any time but effective only for the objects that are created as this configuration is active. The external resources, i.e. the file paths, databases, collection names, etc., used for the storage should be available permanently to ensure that the model data can be used later.

Data schema validation

All persisted data on all types of storage (i.e. file, gridfs, url and null) have JSON schemas.

It is recommended to activate JSON schema validation. This will early detect errors and increase security of loading data from files, databases or from the web.

To activate JSON schema validation, the lines

JSON_SCHEMA_VALIDATE: true
JSON_SCHEMA_VALIDATE_LIST: []

in $HOME/.fireworks/FW_config.yaml have to be added. If JSON_SCHEMA_VALIDATE_LIST is already in the file it should not be changed otherwise an empty list should be added.