# Extending the language

This section of the documentation is inteded for developers interested in extending the Scientific Computing Language to other domains.

The [Scientific Computing Language (SCL)](scl.md) can be extended to cover other domains of scientific computing. Currently, the SCL extension [Atomic and Molecular Modeling Language (AMML)](amml.md) can be used as a blueprint. The modularity of the grammar and of the interpreter is continuously improved so that there may be frequent changes in this section until there is a stable package structure.

## Test cases (inputs)

It is strongly recommended to have a clear idea what abstractions from the new domain should be added to the language and with what language elements (such as types and operations) these will be implemented.

In the following there are two examples of extensions of the SCL core language from the past.

**Example 1**: Extend the language to support power in numeric expressions (supposed that only multiplication is currently supported). A simple test input will be something like: `a = 2 * 2; b = 2 ** 2; print(a, b)`.

**Example 2**: Extend the `print` statement to enable units conversion of numeric parameters. A good test input will be, for example, `a = 1 [m]; print(a [cm])`.

## General procedure

The following steps include all changes needed to extend the language. The SCL and its supporting tools are based on [textX](http://textx.github.io/textX/), a tool for creating domain-specific languages and their supporting tools using Python.

1. Add new rules to the grammar and modify existing rules. The new grammar should be checked for correctness by parsing it and visualizing it. 

2. Run the regression tests to ensure that the new grammar does not break existing test cases.

3. Parse the test input and visualize the created model.

4. Optional: Write functions to modify model objects and register them as either model or object processors.

5. Write functions to evaluate the type and the value of the new metamodel classes and register them.

6. Optional: Write functions to apply additional static constraints to the model and register them as either model or object processors.

7. Write a serializer class and a print formatter function for all new types (if any).

8. Integrate the test inputs into the set of regression tests.

9. Write a documentation of the extension.


## Grammar, metamodel and model parser

### Grammar

The main component of the domain-specific language is the *grammar*. The grammar describes the syntax of the langauge in a formal and machine-readable way. In SCL a [textX grammar](https://textx.github.io/textX/grammar.html) is used. The developer has to familiarize themself with the textX grammar before starting a language extension. A good tutorial section can be found [here](https://textx.github.io/textX/tutorials/hello_world.html).

The SCL grammar consists of a set of rules that are used to match the textual model.

#### Grammar version

After significant changes in grammar, the grammar version must be increased and added to `compatibility.py`. If the grammar changes break the compatibility with the intepreter before the changes, then the previous grammar versions must be removed from `compatibility.py`

#### Use of references

While any reference allowed in textX can be used (for example `myref = [Variable:ID]`) only reference objects of class `GeneralReference` are mapped to links in FireWorks workflows. Therefore, to have a working model in workflow evaluation mode, only this type of references should be used. The rules `IterableProperty` and `IterableQuery` can be used as blueprints for such uses.

### Grammar location

The grammar is located in the folder `src/virtmat/language/grammar` where `virtmat.tx` is the top-level grammar file. The grammar correctness can be checked by the command

```bash
textx check src/virtmat/language/grammar/virtmat.tx
```

### Metamodel

By *parsing* the grammar, textX creates the so-called *metamodel* (see Figure 1). The textX metamodel is a set of Python classes with certain relationships, for example the *parent-child* relationship. Another important relationship is the *reference*. Every common rule in the grammar is used to generate one class in the metamodel with the same name as the grammar rule. The metamodel can be visualized using the `graphviz` package as described [here](https://textx.github.io/textX/visualization.html) so that the metamodel classes with their attributes and relationships can be inspected.

![Figure 1](figs/textx-overview.png "Figure 1. An overview of textX concepts")


### DSL parser

The second artifact created by parsing the grammar is the DSL *parser*, i.e. the code that will process a *textual model* written in the domain specific language. The DSL parser and the metamodel are not provided as source code but rather created in memory from the grammar *on-the-fly* every time a textual model is processed (see Figure 2).

![Figure 2](figs/textx-flowchart.png "Figure 2. A textX flow diagram")

## Run the regression tests

Grammar extensions always require changing existing rules, e.g. extending an ordered choice rule with a newly added rule. Therefore, after checking the grammar correctness, the regression tests must be run. The regression tests are located in the top-level folder `tests`. The tests can be started, after changing to the test directory, with the command `pytest`. If any regression tests fail due to the changes in the grammar the grammar must be fixed so that all regression tests pass.

## Model

Using the option `--show-model` of the [CLI](tools.md#script-mode), the textual model is parsed and if parsing is successful, i.e. the textual model has valid syntax, then the *abstract model* (or simply the *model*) is created. In addition, a graphviz dot file is created with the same base name as the textual model file is created. This can be used to create e.g. a PDF file displaying the model, for example:

```bash
texts script --show-model -f series.vm
dot -Tpdf series.dot -o series.pdf
```

## Enrich the metamodel

The generated metamodel needs certain extensions that are used within the interpreter stage.

### `type_` properties

The SCL is a statically typed language and for this reason every new metamodel class must have a `type_` property method that evaluates and returns the Python type of a model object (instance of the metamodel class). The mapping between Python types and SCL types is provided in the internal module `src/virtmat/language/utilities/typemap.py`. If the values of the objects of newly added metamodel classes may have another type than one of the already provided types, then the type map must be extended with this new type correspondingly. If the type of an object cannot be inferred, `None` is returned. If the object has no `value` attribute, then the `type_` attribute is also not defined.

The type methods are located in the internal module `src/virtmat/language/constraints/typechecks.py`.

**NOTE**: To evaluate the type, the `value` property may not be used.

### `value` properties

Every metamodel class, whose objects have values and have `type_` attribute, must provide a `value` property method. These methods are located in the internal module `src/virtmat/language/interpreter/instant_executor.py`.

### `func` properties

For deferred and workflow evaluation, every class with a `value` property must also provide the `func` property method. The `func` property method is a Python function returning the `func` property that is a tuple consisting of a function returning eventually the object value (only if called) and a flat tuple of model objects whose values are used as *call parameters*.

The definitions of `func` properties are located in the internal module `src/virtmat/language/interpreter/deferred_executor.py`.

**NOTE**: Only named objects (references to variables and imported objects) are allowed as call parameters. The `func` property method may not use the `value` peroperty. In addition, the returned function (the first tuple element) may not contain object attributes or references to other model objects (`self.something`).

For example, if the model object of metamodel class `WeightedSum` has an attribute `vars` that is a list of variable references of scalar numeric types and `wgt` is a Python numeric scalar attribute (of type `int` or `float`) then the returned tuple can be defined as:

```python
def weighted_sum_func(self):
    wgt = self.wgt  # self not allowed
    return (lambda *x: wgt*sum(x), tuple(self.vars))
metamodel['WeightedSum'].func = property(weighted_sum_func)
```

### Object processors

The textX [object processors](https://textx.github.io/textX/metamodel.html#object-processors) are useful if a constraint cannot be enforced by the grammar or a class attribute cannot be set by parsing, such as default attribute values or values implied by some convention, i.e. a specification in the input is missing. For example, a complex number with missing optional real part in the input will be an object with real part attribute than has value `None`. An object processor finds this and replaces `None` with `0.0`.

The object processor is an (optional) function that allows the processing (performing checks, adding/modifying attributes etc.) the objects of a certain metamodel class. Every object processor is registered on a per-class basis in the internal module `src/virtmat/language/metamodel/processors.py` and run on a per-object basis as soon as an object of the class is instantiated. 

## Interpreter

The interpreter is implemented as a list of textX [model processors](https://textx.github.io/textX/metamodel.html#model-processors). The instant and the deferred evaluation is triggered by calling the `value` property of the top-level model object (named `Program`). The workflow evaluation is triggered by calling the model processor `workflow_model_processor(metamodel)`.

The model processors are only called within textX automatically (in the order of registration) right after the model is fully instantiated and all object processors have run.

The only needed action in this section upon language extension is to write and register relevant object and model processors.

### Constraints

The purpose of constraints is to introduce semantics into the model that is not included in the grammar. For example, a circular reference cannot be prevented by grammar or in the best case such grammar will decrease parser performance significantly due to necessary and potentially very long look-aheads. Therefore, a check for a circular reference can be done more efficiently after the parsing phase, after the whole model is completely constructed. Another type of constraint is the *type* constraint.

All constraints in SCL are implemented as textX model processors that are registered in the internal module `src/virtmat/language/metamodel/processors.py`. The individual constraint processors are located in the folder `src/virtmat/language/constraints` and registered in the module `src/virtmat/language/constraints/processors.py`. One example of such constraints is to check validity of types in `check_types_processor(metamodel)`. Other kinds of constraints are defined in the same folder and registered in the same module.

## Write serialization classes for new types

It can happend that for the language extension some language *parameters*, i.e. textX model objects with defined `value` property, have a new type. This new type has to be added to the internal module `src/virtmat/language/utilities/typemap.py` and mapped to the relevant Python type (class). For the Python class of such new type, a serialization class has to be written. The location of the serialization classes is `src/virtmat/language/utilities/serializable.py`. The serialization class is a subclass of the relevant Python class, that is the value type of the corresponding textX object, and of the base class [`FWSerializable`](https://materialsproject.github.io/fireworks/fireworks.utilities.html#fireworks.utilities.fw_serializers.FWSerializable). It provides the attribute `_fw_name` and implementations of the methods `to_dict()` and `from_dict()`.

The `to_dict()` method is used to serialize the values of the relevant textX objects for use in the workflow management system or for storage in the database or in a file in JSON format. When this method is changed or new serialization classes are added then the `DATA_SCHEMA_VERSION` must be incremented. The method must be decorated with `@versioned_serialize`.

The `from_dict()` is used to deserialize (re-construct) the thus serialized objects. If breaking changes are made in the serialization schema after some `version` then this method is further provided under the name `from_dict_{version}` and the `from_dict()` is decorated with `@versioned_deserialize` to maintain compatibility. A list of supported schema versions is maintained in `versions['data_schema']` in `compatibility.py`.

The `_fw_name` attribute is used for automatic recursive serialization and deserialization by the workflow management system using generic methods such as `load_object()`.

## Write print formatters for new types

For every new type, a print formatter has to be written in the module `src/virtmat/language/utilities/formatters.py`. The formatter returns a string representation of the model object value matching the common rule corresponding to the metamodel class of the object. For example, the value of the model object of class `Series` has `Series` type, and is represented by the Python class `pandas.Series`. The value of the model object, that is a `pandas.Series` object is represented by the formatter in a string like `(a: 1, 2, 3)` and this is how the value is displayed on the screen.

## Add the test inputs to the tests

The test inputs should be used to create test functions for `pytest` in the top-level folder `tests`. These tests will be run every time and ensure that the newly added features will be working after every change.

## Write documentation

Write about the language extensions in the top-level `docs` folder.