# Scientific Computing Language ## Hello world To print a string on the screen the `print` statement is used: ``` print('Hello world!') ``` ## Statements The building blocks of the language are the *statements*. The statements are separated either by semicolons `;` or by new lines. A statement can be: 1. variable 1. print 1. view 1. import from a Python module 1. export of arbitrary parameter 1. function definition 1. variable update **NOTE:** The order of evaluation does not depend on the order of statements. Rather the order of evaluation is determined by the dependencies between variables and their parameters. **NOTE:** The `print` and `view` statements are evaluated in the same order as they are in the program. See more details [here](io.md). ### Variables *Variables* are the most used statements. A variable is initialized immediately with its definition with a *variable name* on the left hand side and a *parameter* on the right hand side of the `=` sign. A variable may not be redefined, i.e. the *variable name* may not be used to define other parameters. The variable's parameter can be a string, integer, Boolean, a data structure, etc. A parameter can also be a reference to a variable. Parameters are *immutable*, i.e. they cannot be modified after having been created. Example: ``` var_1 = 'Hello world!' var_2 = var_1 var_3 = 0.5 [meter] ``` In the first line `var_1` is a variable name. However, in the second line, `var_1` is a parameter of the variable with name `var_2`, more specifically a reference to the variable with name `var_1`. The third statement is a variable with name `var_3` and a parameter of numeric floating point type with units meters. ### The `print` statement The print statement displays the values of one or more parameters on the screen. The syntax is `print(par1[, par2[, ...]])` where the optional parameters are displayed in square brackets `[]`. Example: ``` print(var_1) # "print the value of variable with name var_1" var_1 = 'Hello world!' # initialize variable with name var_1 with a string literal ``` In this example the variable with name `var_1` is defined after the `print` statement. Because the value of variable with name `var_1` has to be printed on the screen it has to be evaluated first. Therefore, the variable with name `var_1` (i.e. the second statement) is evaluated first and only after that the `print` statement is evaluated. **NOTE:** This is the behavior in the case of instant evaluation and deferred on-demand evaluation. In other forms of deferred evaluation, `var_1` may be printed without having been evaluated. In the latter case `n.c.` (*not computed*) is printed instead of the value of `var_1` that is not available yet. **NOTE:** In the case of on-demand evaluation policy, only these variables or parameters are evaluated that are used in the [print](#the-print-statement), [view](#the-view-statement) or [export](#export-statement) statements. In the example above, the variable `var_1` would not be evaluated if there were no `print` statement. Further more detailed explanation of the behavior of input/output operations [is provided separately](io.md). ### The `view` statement This statement displays its parameters graphically. The syntax is ``` view (parameter_1, parameter_2, ...) ``` Currently the modes `lineplot` and `scatterplot` are implemented and these will be explained below. #### Plotting datasets in 2D This is achieved with the modes `lineplot` and `scatterplot` that have the meaning of plot types here. Further modes of 2D plotting will be implemented in future but all these have common parameters patterns. Data shape | Parameter 1 | Parameter 2 | Parameter 3 | Parameter 4 | -----------|-------------|-------------|-------------|-----------| long-form data | values, [Table](#table) | column to use as x axis, [String](#string-type) | column to use as y axis, [String](#string-type) | optional units for output, [Series](#series)([String](#string-type)) | wide-form data | values, [Series](#series)(1D-[Array](#arrays)), shape of [values:array](#retrieve-the-array): (len(index), len(columns)) or (len(columns), len(index)) | index, [Series](#series)(scalar) or 1D-[Array](#arrays) | columns, [Series](#series)(scalar) or 1D-[Array](#arrays) or [Tuple](#tuple)(scalar) | not used | wide-form data | values, 2D-[Array](#arrays), shape: (len(index), len(columns)) or (len(columns), len(index)) | index, [Series](#series)(scalar) or 1D-[Array](#arrays) | columns, [Series](#series)(scalar) or 1D-[Array](#arrays) or [Tuple](#tuple)(scalar) | not used | xy data | values, [Series](#series)(scalar) or 1D-[Array](#arrays) | index, [Series](#series)(scalar) or 1D-[Array](#arrays) | not used | not used **Examples**: * Long-form data: ``` tab = ( (number: 1, 2, 3, 4, 1, 2, 3, 4) [meter], (type_: 'square', 'square', 'square', 'square', 'cube', 'cube', 'cube', 'cube'), (value: 1, 4, 9, 16, 1, 8, 27, 64), (time: 0., 1., 2., 3., 4., 5., 6., 7.) [hour] ) units = (units: 'cm', '', '', 'min') view lineplot (tab, 'number', 'value', units) ``` * Wide-form data: ``` ind = (number: 1, 2, 3, 4) [meter] val_ser = (values: [1, 4, 9, 16], [1, 8, 27, 64]) columns = (columns: 'square', 'cube') view lineplot (val_ser, ind [cm], columns) ``` * XY-data: ``` ind = (number: 1, 2, 3, 4) [meter] sqr = (square: 1, 4, 9, 16) view lineplot (sqr, ind [mm]) view lineplot (sqr:array, ind:array) view lineplot (sqr, ind:array) view lineplot (sqr:array, ind) ``` ### Units in `print` and `view` parameters Parameters of numeric types in `print` and `view` statements are printed with their units. The parameter's value can be printed in other than the default units by specifying the units in the `print` or `view` parameter. For example, to print a mass in grams we can use this: ``` mass = 1.5 [kg] print(mass [g]) ``` The output of this print statement is `1500.0 [gram]`. Using the following special units keywords the units can be converted without specifying the target units explicitly: Keyword | Comment | Example for `e = 1 [eV]` -----------|--------------------------|-------------------------------------------------------------------------- `_reduced` | reduced units | `print(e [_reduced]); 1 [electron_volt]` `_root` | root units | `print(e [_root]); 1.602176634e-16 [gram * meter ** 2 / second ** 2]` `_base` | plain units | `print(e [_base]); 1.602176634e-19 [kilogram * meter ** 2 / second ** 2]` `_compact` | human-readable magnitude | `print(e [_compact]); 1 [electron_volt]` ### Export statement The *export* statement allows exporting a *value* to a file or a URL. The syntax of the export statement is ` to (file | url )` Where `` can be: 1. a reference to a variable ``` a = 'Hello world!' a to file 'hello_world.yaml' ``` 2. a reference to a variable with optional supscripting of [series](#subscripting), [table](#subscripting-1), or [array](#operations-with-array) ``` b = [[1, 0, 0], [0, 2, 0], [0, 0, 3]] b[1][1] to file 'b11.json' # exports the array element `2` ``` 3. operations on iterable objects such as [Series slice](#slice), [Series filter](#filter-function-and-filter-expression), [Table slice](#slice-1), [Table filter](#filter-expressions-applied-to-table), [Array slice](#operations-with-array), or [Tuple membership](#testing-membership-1). ``` # Exporting the slice of a series: lens = (length: 1, 2, 3, 4, 5, 6) [m] lens[0:4:2] to file 'lens_042.json' # exports the series slice `(length: 1, 3) [meter]` # Exporting the output of a filter operation: tabtp = ((temp: 100., 200., 300.) [K], (pressure: 1., 2., 3.) [bar]) tabtp where column:temp > 100 [K] to file 'tabtp_200-300K.json' # exports `((temp: 200.0, 300.0) [kelvin], (pressure: 2.0, 3.0) [bar])` ``` The file extension (the portion of the path after the `.`) indicates the format in which the value will be exported. Currently, YAML (extensions '.yml' or '.yaml'), JSON (extension '.json'), and HDF5 (extensions: '.hdf', '.h4', '.hdf4', '.he2', '.h5', '.hdf5', '.he5') formats are supported for all variable types. Domain formats are supported for some domain-specific types (see the relevant sections). **NOTE:** If a file with the same name as specified already exists, the export statement will not work, i.e. export allows no file overwriting. **NOTE:** While *relative paths*, as in the example above, are supported it is strongly recommended to use *absolute paths*, especially in the [workflow evaluation mode](tools.md#workflow-mode). ### Other statements Further statements are *imports* from external Python modules and *function definitions*. They are more advanced and are outlined [here](#imported-objects-and-functions) and [here](#internal-functions), respectively. ### Comments and white space Comments are ignored and not interpreted. All input after the hash sign `#` up to the end of the same line is ignored. All input enclosed by a pair of three double quotes `"""` is ignored. All white space is needed only to separate *keyword* inputs otherwise white space is ignored. ``` # this is a comment a = 'Hello' # this is a comment b = 1 """ This is a multi-line comment. """ ``` ## Type Variables and parameters have *type*. The type is *fixed* with the definition and checked before evaluation begins, i.e. it is *static*. The type determines in what *operations* a parameter or a variable can be used. If a statement contains operations on incompatible types a *Type error* is issued. In the most cases, type errors are issued before the evaluation begins, as long as the types of parameters and variables can be evaluated without computing their values. ### String type A *string* literal is a Unicode string enclosed by single or double quotes. Empty strings are allowed. ``` hello = "Hello world!" empty = '' # empty string print(hello == empty) # string match, result: false print(hello != empty) # string match, result: true ``` Currently, no operations on strings, except for string match, are available. ### Boolean type Parameters and variables of *Boolean* type have values of either `true` or `false`. Unlike in other languages, parameters and variables of other types have no Boolean values. Also variables and parameters of Boolean type *cannot* be interpreted as *numeric* types, such as integer, and used in [numeric expressions](#numeric-expressions). Boolean literals are parameters matching either `true` or `false`: ``` bool_1 = true bool_2 = false ``` #### Boolean expressions Using the operators `and`, `or` and `not` and any parameters of Boolean type, arbitrary Boolean expressions can be composed. Expressions with `and` and `or` are currently not *short circuiting*. ``` a = true and (false or true) print(not a) # result: false ``` Boolean expressions always have *Boolean type*. ### Numeric types The parameters of numeric type can be *integer* (`Integer`), *floating point* (`Float`) or *complex* (`Complex`) quantities. Numeric types can be Quantity, Array and Series. #### Numeric expressions Using the operators `+`, `-`, `*`, `/` and `**`, and any *scalar* numeric parameters (of type `Quantity`), arbitrary numeric expressions can be composed. ``` a = 2 b = (2.0*a + 1)**2 - 1.5 ``` Numeric expressions always have *numeric type*. #### Physical units All parameters of numeric type have *physical units* assigned. Here some examples: ``` number = 1 # dimensionless integer type quantity length = 2.0 [meter] # floating point type quantity with units meter s = number + length; print(s) ``` Because `number` is dimensionless it cannot be added to `length` and the following evaluation error occurs: ``` Dimensionality error: None:3:5 --> number + length <-- Cannot convert from 'dimensionless' (dimensionless) to 'meter' ([length]) ``` In contrast to type, physical units are checked only during evaluation. This is why this error message will not be issued if we remove the `print(s)` statement. **NOTE:** Dimensionless quantities also have units. This becomes evident in the error message above. These units are `[dimensionless]`. These can be optionally specified, for example `number = 1 [dimensionless]`. #### Complex numbers Complex numbers have the format `real [+-] imag [jJ]` where `real` and `imag` are the real and the imaginary part of the complex number, respectively. Complex numbers can be used as scalars, as well as in Series and Arrays. The real and the imaginary part of a complex scalar can be retrieved using the built-in functions `real()` and `imag()`, respectively. For example, one can define a function to compute the complex conjugate: ``` conjg(z) = real(z) - imag(z) * (0 + 1 j) ``` #### Comparison expressions *Comparison expressions* are defined for scalar numeric types. They can include one of the operators `==`, `!=`, `>`, `<`, `>=`, `<=`. In comparisons with *String*, *Boolean* and *Complex* operands only the operators `==` and `!=` are allowed. String matching using the operators `==` and `!=` can be regarded as a comparison expression. Comparison expressions always have *Boolean type*. ``` b = 2 < 1 print(b) # result: false ``` #### Uncertainties Parameters of scalar floating-point type (i.e. Quantity of Float datatype) can have optional uncertainty specification. The Quantity literal with uncertainty has syntax shown in the following examples: ``` time = 12.7 +/- 0.1 [seconds] # easy to type distance = 2.56 ± 0.02 [angstrom] # easy to read ``` The number after the ± (or +/-) is the standard deviation and therefore it must be a non-negative floating point number. Using `0` as uncertainty is allowed but not recommended as it produces the warning "*Using UFloat objects with std_dev==0 may give unexpected results.*" Quantities with uncertainties can be used e.g. in all [numeric expressions](#numeric-expressions), comparisons expressions, user-defined [internal](#internal-functions) and many [external](#built-in-module-with-commonly-used-functions) functions. Comparisons of quantities with uncertainties should be performed with caution. For more details see the [user guide](https://pythonhosted.org/uncertainties/user_guide.html#comparison-operators) and the [technical guide](https://pythonhosted.org/uncertainties/tech_guide.html#comparison-operators) of the uncertainties package. ## Data structures In *data structures* several parameters of different or the same types can be combined to express a certain *type of interrelation*. The types that are no data structures will be called *scalar types*. ### Tuple Parameters of any type can be combined in a fixed order using a *Tuple*. The syntax is like in this example: `t = (a, 1.3, 'abc', false); a = 2`. Tuples are most useful if used as parameters of tuples of variables but also to pass bundled heterogeneous data. A tuple containing one parameter should contain a comma before the closing parenthesis, otherwise it may be parsed as an expression. For example, use `(1,)` or `(true,)` but not `(1)` or `(true)`. Empty tuples are not allowed as input. ### Series The *Series* contains a list of parameters: *elements*. The elements must be of the same type unless they are all scalar numerical types. In the latter case the *datatype* of the series will be the most generic type found in the series. For example, if the elements are floating-point and integer numbers then the series datatype will be Float. The common syntax of Series is: `(name: e1[, e2[, e3[...]]]) [units for numeric type]`. The Series data structure must have a name. The Series *elements* are the items between the `:` and the `)` separated by commas. Series literals must have at least one element. Series of numeric types must have elements of the same units. Series literals have syntax that is shown in the following example. ``` a = 3. [s] s1 = (time: 1. [s], 2. [s], a) s2 = (lengths: 1., 2., 3.) [nm] s3 = (booleans: true, bval); bval = false s4 = (numbers: 0, 3, -2) ``` The units of Series of numeric type can be specified if all elements are numeric literals, either after every single element, as shown for `s1`, or after the closing parenthesis `)` as shown for `s2`. If an element is another numeric parameter then units may not be specified as it is shown for the parameter `a` in `s1` (the parameter holds the units itself). If units specification is omitted, as in `s4`, then the Series still has units but it is *dimensionless*. One can also specify `[dimensionless]` but this is optional. Series of non-numeric types may have no units. Empty Series is not allowed as input. ### Table The *Table* data structure consists of an ordered set of Series parameters of the same length that can be viewed as *columns*. There are two different syntaxes for Table literals: ``` t1 = ((numbers: 1, 2, 3), (lengths: 1., 2., 3.) [nm]) t2 = Table ((numbers: 1, 2, 3), s2) s2 = (lengths: 1., 2., 3.) [nm] ``` The *rows* of the Table are Tuples of the Series elements at the same position (see subscripting operations below). Empty tables are not allowed. ### Dict The *Dict* can be regarded as a tuple of key-value pairs ``` d = {key1: value2, key2: value2, ...} ``` where the values can be any parameters. Dict is mostly used to define a Table row within the functions (the first parameters) of [`map`](#map-function) and [`reduce`](#reduce-function-applied-to-table) or to define [tags](query.md#the-tag-statement) and [search queries](query.md#the-find-command). A *Dict* requires that each key appears only once within the same dictionary. A duplicate key is considered invalid and will result in an error. It is also noted that the input order of key-value pairs has no meaning and may not be preserved. ### Arrays *Arrays* are data structures with fixed types. Compared to Series an Array has no name, may be multidimensional (whereas Series is one-dimensional) and may not be used as a column in Tables. In addition, an Array may only have Numeric, Boolean or String datatype whereas Series may have any datatype. Array literals have the following syntax: ``` pbc = [True, True, False] # switching boundary condition cell = [[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]] [angstrom] # cubic unit cell ``` Empty arrays are not allowed as input. ## Internal functions An *internal function*, or simply a *function*, is a named expression in which some variables are bound. The bound variables (also called dummy variables) are provided comma-separated in a list enclosed by parentheses. The function is called by specifying the parameters to be used for each of the dummy variables: ``` f(x) = x**3 - 2*x**2 + 3*x - 4 # function definition (a statement) print(f(1)) # function call (a parameter) b = f(2); print(b) # another call g(x) = b*x # another function definition where b is unbound ``` **NOTE:** The dummy variables are bound to the scope of the relevant functions. This is why the same names can be reused as dummy variables in other functions. **NOTE:** The list of dummy variables may not be empty, e.g. `f() = 2*a` is not valid. In all such cases, simply the expression `2*a` should be used instead of the call `f()`. If the expression has to be used more than once, then a variable `b = 2*a` can be defined and a reference to `b` can be used. ## The `if` function and `if` expression The value of the `if` *function* depends on the value of the first parameter that is always of Boolean type: if it is `true` then the value is equal to the value of the second parameter. If the value of the first parameter is `false` then the value of the function equals the value of the third parameter. All three parameters are mandatory. ``` c = if(true, 1, 2); print(c) # result: 1 ... b = f(...) # function call with boolean type d = if(b, 'b was true', 'b was false'); print(d) ``` The `if` *expression* has a different syntax but the same meaning (semantics) as the `if` function. The same examples are shown below with using the expression syntax. ``` c = 1 if true else 2; print(c) # result: 1 ... b = f(...) # function call with boolean type d = 'b was true' if b else 'b was false'; print(d) ``` ## Expression nesting Expressions of the *same type* can be nested by using parentheses `()`. One typical use case is nesting comparison expressions in Boolean expressions. ``` print((3 > 4) or (-1 <= 0)) # result: true ``` ## Map function The `map()` function iterates over the tuples from the elements or rows of the second, third, etc. parameters, which must be Series or Tables of equal length, and calls the function defined as first parameter with the tuple per each iteration. The type of `map()` depends on the type of the first parameter. If the type of the first parameter is [Dict](#dict), then the type of `map()` is Table. Otherwise the type is Series with elements of the type of the first parameter. The length of the returned Series or Table is the same as the length of the input data (second, third etc. parameters). **Example** with Series as input data: ``` s = (length: 1, 2, 3) [m] sqr(x) = x*x area = map(sqr, s) print(area) # result (area: 1, 4, 9) [meter ** 2] ``` The first parameter in `map()` can also be a so-called *lambda* function. Lambda function is [an internal function](#internal-functions) with no name. ``` s = (length: 1, 2, 3) [m] area = map((x: x*x), s) print(area) # result (area: 1, 4, 9) [meter ** 2] ``` A typical use case of `map()` is to apply operations element-wise to one or more series. In the following example an expression with the elements of two series is computed. ``` sx = (sx: 0.1, 1.3, -1.2) sy = (sy: 2.1, -3.7, 4.6) print(map((x, y: 3*x + 2*y - 1), sx, sy)) ``` In this example, the returned type will be Series with elements of the type of the lambda function `(x, y: 3*x + 2*y - 1)`, i.e. Integer type. **Example** with table and with series and table as input data: ``` t = ((a: 1, 2, 3), (b: 4, 5, 6)) print(map((x: {a: x.a, b: x.b, c: x.a + x.b}), t)) s = (b: true, false, true) print(map((x, y: {c: x.a + x.b, b: not y}), t, s)) series = map((x: x.a + x.b), t) print(series) ``` ``` program output: >>> ((a: 1, 2, 3), (b: 4, 5, 6), (c: 5, 7, 9)) ((c: 5, 7, 9), (b: false, true, false)) (series: 5, 7, 9) <<< ``` ## Operations with Series In the following, operations with parameters of type Series will be outlined. ### Slice A *slice* of a Series is a new parameter of type Series returning a selection of elements from a parameter of type Series. Syntax: `[start:stop]` or `[start:stop:step]` In the first syntax the default step is 1. ``` lens = (length: 1, 2, 3, 4, 5, 6) [m] print(lens[0:1]) # result: (length: 1) [meter] print(lens[0:4:2]) # result: (length: 1, 3) [meter] print(lens[6:0:-1]) # (length: 6, 5, 4, 3, 2) [meter] print(lens[6::-1]) # invert the order, result: (length: 6, 5, 4, 3, 2, 1) [meter] ``` ### Subscripting Individual Series elements can be retrieved by *subscripting*. The syntax is `[index]`. The type and, if appropriate, the units of the returned parameter are the same as these of the Series. ``` lens = (length: 1, 2, 3, 4, 5, 6) [m] print(lens[0]) # first element: 1 [meter] print(lens[1]) # second element: 2 [meter] print(lens[-1]) # last element: 6 [meter] print(lens[-2]) # second to last element: 5 [meter] ``` ### Retrieve the name The name of a parameter of Series type can be retrieved using `:name`: ``` lens = (length: 1, 2, 3, 4, 5, 6) [m] print(lens:name) # result: 'length' ``` The type of Series name is string type. ### Retrieve the array The array of a parameter of Series type can be retrieved using `:array`: ``` lens = (length: 1, 2, 3, 4, 5, 6) [m] print(lens:array) # result: [1, 2, 3, 4, 5, 6] [m] ``` The returned type is Array type. ### Reduce function The `reduce()` function calls the function of two arguments provided as first parameter successively and cumulatively with the elements of the Series provided as second parameter. Example: ``` s = (n: 1, 2, 3, 4) prod(x, y) = x*y print(reduce(prod, s)) # with internal function, result: 24 print(reduce((x, y: x*y), s)) # with lambda function # with nested function calls: print(prod(prod(prod(1, 2), 3), 4)) # equivalent print(prod(prod(prod(s[0], s[1]), s[2]), s[3])) # equivalent ``` The type of the `reduce()` function is the same as the type of the first parameter of `reduce()`. In the example above, this will be the type of the lambda function `(x, y: x*y)` or of the function `prod(x, y)`. By combining `map()` and `reduce()` various algorithms can be implemented, for example the scalar product of two series: ``` s1 = (s1: -1., 2., -3.); s2 = (s1: 1., -2., 3.) print(reduce((x, y: x+y), map((u, v: u*v), s1, s2)) ``` ### Functions `sum`, `all` and `any` The `sum()` function has two syntaxes: * If `sum()` has only one parameter, then it must be of type Series of numeric type and computes the sum of the parameter elements. In this case `sum(...)` is equivalent to `reduce((x, y: x+y), ...)`. * If `sum()` has more than one parameter then these parameters must be of scalar numeric type and the function computes the sum of all parameters. The `all()` function has two syntaxes: * If `all()` has only one parameter, then it must be of type Series of Boolean type and has `true` value when *all* parameter elements have `true` values; otherwise it has `false` value. In this case `all(...)` is equivalent to `reduce((x, y: x and y), ...)`. * If `all()` has more than one parameter then these parameters must be of scalar Boolean type and has `true` value when *all* parameters have `true` values; otherwise it has `false` value. The `any()` function has two syntaxes: * If `any()` has only one parameter, then it must be of type Series of Boolean type and has `true` value if *any* parameter element has `true` value; otherwise it has `false` value. In this case `any(...)` is equivalent to `reduce((x, y: x or y), ...)`. * If `any()` has more than one parameter then these parameters must be of scalar Boolean type and has `true` value when *any* parameter has `true` value; otherwise it has `false` value. ### Filter function and filter expression The *filter function* performs a selection of elements from a Series type parameter that satisfy a condition. The condition is provided as a Boolean type internal or lambda function. ``` print(filter((x: x > 2), (n: 1, 2, 3, 4)) ``` The *filter expression* is semantically equivalent to the filter function but has a different syntax (note particularly the variable reference). ``` s = (n: 1, 2, 3, 4); print(s where column:n > 2) ``` More complex conditions are possible. For example: ``` s = (n: 1, 2, 3, 4) ffunc = filter((x: (x < 2) or (x > 3)), s) fexpr = s where column:n < 2 or column:n > 3 print(ffunc, fexpr) # (ffunc: 1, 4) (n: 1, 4) ``` The type of filter functions and expressions is always the same as the type of the input Series. ### Testing membership The membership test is an expression with two input parameters separated by the `in` keyword: a parameter of any type and a parameter of Series type. The test returns `true` if the first parameter is an element of the Series, `false` otherwise. If the series contains a `null` element and the value of the first parameter is not found in the series, the test will return `null` only if the data types of both parameters match; otherwise, it will return `false`. Example: ``` s = (s: 'a', 'b', null) print('a' in s) # true print('c' in s) # null print(1 in s) # false n = (n: 1, 2, 3) print(4 in n) # false ``` The membership expression is semantically equivalent to applying `map` and `any` in a sequence: ``` s = (s: 'a', 'b', null) print(any(map((x: x == 'a'), s))) # true print(any(map((x: x == 'c'), s))) # null print(any(map((x: x == 4), (n: 1, 2, 3)))) # false ``` The advantages of using the `in` operator are the better code readability and brevity, and possibly shorter evaluation time due to short circuiting which is not available in `map`. ## Operations with Table In the following, operations including parameters of type Table will be outlined. ### Slice Similarly to Series, a slice from a Table is a Table. The syntax and semantics are the same as with Series. ``` tab = ((bools: true, false, true), (numbers: 1, 2, 3)) print(tab[0:2]) # result: ((bools: true, false), (numbers: 1, 2)) print(tab[3::-1]) # result: ((bools: true, false, true), (numbers: 3, 2, 1)) ``` ### Subscripting Individual Table rows can be retrieved by *subscripting*. The type of this operation is always a Tuple type. The syntax is `[index]`. ``` tab = ((bools: true, false, true), (numbers: 1, 2, 3)) print(tab[0]) # result: (true, 1) ``` ### Retrieve a column Individual Table columns can be retrieved using the syntax `.` where `` is the column name. ``` tab = ((bools: true, false, true), (numbers: 1, 2, 3)) print(tab.numbers) # result: (numbers: 1, 2, 3) ``` The type of this operation is always Series type. ### Retrieve the list of column names The type of this operation is Series type. ``` tab = ((bools: true, false, true), (numbers: 1, 2, 3)) print(tab:columns) # result: (columns: 'bools', 'numbers') ``` ### Filter expressions applied to Table Filter expressions can be used with parameters of Table type. ``` tab = ((temp: 100., 200., 300.) [K], (pressure: 1., 2., 3.) [bar]) print(tab where column:temp > 100 [K]) print(tab select pressure where column:temp > 100 [K]) ``` After evaluation this result is printed: ``` ((temp: 200.0, 300.0) [kelvin], (pressure: 2.0, 3.0) [bar]) ((pressure: 2.0, 3.0) [bar]) ``` ### Filter function applied to Table The *filter function* performs a selection of rows from a Table type parameter that satisfy a condition. The condition is provided as a Boolean type internal or lambda function. Example: ``` tabl = ((numbers: 1, 2, 3), (strings: 'a', 'b', 'c')) print(filter((x: x.numbers > 2), tabl) ``` This will return a new table with the same columns as `tabl` but only the rows where numbers is greater than 2. This is semantically equivalent to the [filter expression](#filter-expressions-applied-to-table) `tabl where numbers > 2`. ### Reduce function applied to Table The `reduce()` function applies a function of two arguments provided as first parameter successively and cumulatively to the rows of the Table provided as second parameter. Example: Sum the elements of the first column and multiply the elements of the second column. ``` t = ((a: 1, 2, 3), (b: 4, 5, 6)) print(reduce((x, y: {a: x.a + y.a, b: x.b * y.b}), t)) ``` ``` program output: >>> ((a: 6), (b: 120)) <<< ``` The type of the `reduce()` function is a Table with one row and columns the same as the second parameter. ## Operations with Array In the following, operations including parameters of type Array will be outlined. The examples are with an array of integer type but the operation can be used with all data types supported in [arrays](#arrays). ### Subscripting The purpose of subscripting is to retrieve individual elements or sub-arrays. #### Retrieving individual elements The following example demonstrates the retrieval of a single array element: ``` a = [[1, 2], [3, 4]] [m] print(a[1][0]) # result: 3 [meter] ``` To retrieve a single array element, the number of subscripts must be the same as the number of dimensions (axes) of the array. The returned type is a scalar type the same as the array data type. #### Retrieving sub-arrays If the number of subscripts is less than the number of array dimensions (axes) then a sub-array is returned: ``` a = [[1, 2], [3, 4]] [m] print(a[0]) # result: [1, 2] [meter] ``` #### Subscripting errors If the subscript is larger than the largest index in the relevant axis then an *Invalid index* is issued: ``` a = [[1, 2], [3, 4]] [m] b = a[2] # Index out of range, index: 2, data length: 2 c = a[1][3] # Index out of range, index: 3, data length: 2 ``` If the number of subscripts exceeds the number of axes in the array then a *Type error* is issued: ``` a = [[1, 2], [3, 4]] [m] b = a[0][0][0] # Invalid use of index in type Quantity ``` Because `a[0][0]` is a Quantity (a numerical scalar type) it cannot be subscripted with `[0]`. ### Slice The slice returns selected elements within the same axis of an array. The slice syntax is the same as with [Series](#slice) and [Table](#slice-1). The slice can be applied only once per statement after all (optional) subscripts. Example: ``` a = [[1, 2], [3, 4]] [m] print(a[1][0:1:1]) # result: [3] [meter] print(a[::]) # result: [[1, 2], [3, 4]] [meter] print(a[0:1][0]) # Syntax error, intended result [1, 2] [meter] print(a[0:1][0:1]) # Syntax error, intended result: [[1, 2]] [meter] ``` To concatenate multiple slices or to combine slices with subscripts in arbitrary ordering, with the current syntax one has to define auxiliary variables: ``` a = [[1, 2], [3, 4]] [m] # print(a[0:1][0]) # Syntax error, intended result [1, 2] [meter] b0 = a[0:1]; print(b0[0]) # result [1, 2] [meter] # print(a[0:1][0:1]) # Syntax error, intended result: [[1, 2]] [meter] e0 = a[0:1]; print(e0[0:1]) # result: [[1, 2]] [meter] # print(a[1:][0:]) # Syntax error, intended result: [[3, 4]] [meter] f0 = a[1:]; print(f0[0:]) # result: [[3, 4]] [meter] ``` ## Operations with Tuple ### Subscripting Individual elements or slices of tuples can be retrieved using subscripting. Example: ``` tup = (1, (2, 3), true, 'abc', [4, 5]) print(tup[0]) # result: 1 print(tup[1]) # result: (2, 3) print(tup[-1]) # result: [4, 5] print(tup[3:4]) # result: ('abc',) print(tup[0::2]) # result: (1, true, [4, 5]) ``` ### Testing membership The membership expression consists of the operator `in` and two operands that are a parameter of any type and a Tuple literal, respectively. It returns `true` if the left operand is equal to any of the elements of the tuple, `false` otherwise. If there is a `null` element in the tuple then `null` is returned in case of no match. Example: ``` print(5 in (1, 4, 5)) # true print(0 in (1, 4, 5)) # false print(3 in (-1, 5, null)) # null ``` ## Range function and range expression The *range function* with syntax `range(start, stop, step)` creates Series in a given numeric range from `start` to (but not including) `stop` incrementing by `step`. ``` lens = range(1 [m], 6 [m], 1 [m]) print(lens) # result: (lens: 1, 2, 3, 4, 5) [meter] ``` There is *range expression* with the same semantics as the range function. ``` lens = range from 1 [m] to 6 [m] step 1 [m] print(lens) # result: (lens: 1, 2, 3, 4, 5) [meter] ``` The parameters `start`, `stop` and `step` must be of the same scalar numeric type, i.e. either integers or floating-point numbers. ## Imported objects and functions External objects and functions from arbitrary Python modules can be imported and used in the language. There are three syntaxes: 1. `use .` 2. `use , from ` 3. `from use , ` The first syntax is shorter while the second and third allow several imports in the same statement and namespace including modules with arbitrary number of sub-modules separated by periods. The imported objects and functions can be used as parameters (references) with the same names as in the imports. This means that no variables with these names can be used. ### Examples The example below shows using an imported function `len()` from the `builtins` module from the standard Python package using the first syntax: ``` use builtins.len s = (numbers: 1, 2, 3) print(len(s)) # result: 3 ``` The following example demonstrates usage of imported object `pi` and functions `sin()` and `cos()` from the `numpy` module (package) using the second and third syntaxes. ``` use pi, sin from numpy from numpy use cos print(sin(2.*pi)) # result: -2.4492935982947064e-16 print(cos(2.*pi)) # result: 1.0 ``` The name of an imported object cannot be used any more: ``` use math.pi pi = 3.14 Initialization error: None:1:? --> pi = 3.14 <-- Repeated initialization of "pi" ``` ### Imported functions Using external Python functions in textS (with the `use` keyword) assumes/requires some knowledge in Python. While using functions from the `builtins` and `math` Python modules, or from the `numpy` package, can be straight forward, using self-written functions provides more options but also hides more difficulties. #### Function signature If an imported function has a [call signature](https://docs.python.org/3/library/inspect.html#inspect.Signature) this is used to perform a static check. Particularly, it is checked that the call includes the correct number of required parameters by extracting the positional arguments without defaults from the signature. **NOTE:** Currently, imported Python functions can be called only with their positional arguments, i.e. calls with pure keyword arguments are not possible. #### Type annotations If type annotations are provided in the function call signature for any function positional arguments, a type check is performed for these particular arguments. If a return-type annotation is available it is used to perform a static type check. Currently, only types processed by the interpreter may be used in the type hints. More information about the valid types is provided in [this section](extending_scl.md#overview-of-used-types). #### Built-in module with commonly used functions The `virtmat.functions` module provides a set of commonly used functions. Compared to functions directly imported from Python's `math` module and the `numpy` package, the functions from the `virtmat.functions` module provide better support for typing, physical units and quantities with uncertainties. Examples: ``` use exp from virtmat.functions use boltzmann_constant from virtmat.constants f(e, T) = exp(-e/(boltzmann_constant*T)) print(f(0.01 [eV], 300 [K])) # result: 0.6792151960927103 ``` Currently, all numpy functions and universal functions that are supported by the pint package are available in `virtmat.functions`. A list of names of all supported functions can be retrieved from the same module: ``` use FUNCTIONS from virtmat.functions print(FUNCTIONS) ``` Some of these functions provide support for inputs with uncertainties via the `uncertainties.unumpy` module. This support is currently limited to `unumpy`-supported functions with one output (i.e. no functions returning tuple) and to only scalar inputs. Additionally, some functions from Python's `math` module, adapted to textS, are available in the module `virtmat.functions.math`. Example: ``` from virtmat.functions.math use sqrt print(sqrt(4 [meter**2])) # result: 2.0 [meter] ``` ## The built-in `info` function The `info` function takes one argument that can be any parameter. It returns a [Table](#table) with information about the parameter, such as type, datatype (for Series and Arrays), and dimensionality / units if the parameter is numeric and has been evaluated. In [workflow evaluation mode](tools.md#workflow-mode), additional metadata is included in the table when the parameter is a variable. Example with an expression: Input: ``` print(info(prop.energy)) ``` Output: ``` ((name: null), (type: 'Series'), (scalar: false), (numeric: true), (datatype: 'float'), (dimensionality: '[mass] * [length] ** 2 / [time] ** 2'), (units: 'electron_volt')) ``` Example with a variable: Input: ``` print(info(prop)) ``` Output: ``` ((name: 'prop'), (type: 'Property'), (scalar: false), (numeric: false), (datatype: null), ('group UUID': 'd9fe0968ed6740888750eb534aababa0'), ('model UUID': '181091fc725242258b3788ed797bf932'), ('node UUID': 'c68c476ef82141d29ae5be555722c27a'), ('node ID': 751), ('parent IDs': (752, 753, 754)), ('node state': 'COMPLETED'), ('created on': '2025-02-25T14:23:32+01:00'), ('updated on': '2025-02-25T14:23:44+01:00'), ('grammar version': 32), ('data schema version': 7), ('python version': '3.10.12 (main, Feb 4 2025, 14:57:36) [GCC 11.4.0]'), (category: 'interactive'), (fworker: null), (dupefinder: true), ('reservation ID': null), ('number of launches': 1), ('number of archived launches': 0), (launch_dir: ('/mnt/data/ubuntu/work/vre-language/examples/launcher_2025-02-25-13-23-33-167195',)), ('archived launch_dir': ()), (runtime_secs: 3.495358), ('runtime_secs total': 3.495358)) ``` The output of `info` cannot be used as parameter of a variable and therefore is not persistent. Currently, `info` can only be used in [print statements](#the-print-statement) but **does not** trigger [on-demand evaluation](io.md#evaluate-on-demand) of the referenced variables. The `info` function replaces the deprecated `type` function (grammar version 32 or newer). The `type` function in older compatible grammar versions produces the same output as `info`. ## Loading parameters from file or URL Some parameters have the optional syntax allowing to load them from file or download from a URL. The common syntax is ``` = from (file | url ) ``` Current list of textS parameters supporting this syntax: `Quantity`, `Bool`, `String`, `Series`, `Table`, `BoolArray`, `StrArray`, `IntArray`, `FloatArray` and `ComplexArray`. Additionally, there are [textM](amml.md) parameters that also support this syntax. The `` is a string that must contain the path to the input data file. While *relative paths* are supported it is strongly recommended to use *absolute paths*, especially in the [workflow evaluation mode](tools.md#workflow-mode). By default the internal serialization format (in JSON) is used and the file type may be JSON (filename extension `json`) or YAML (file name extensions `yml` or `yaml`). Domain-specific parameters may support further domain-specific formats. ## Dealing with missing data or default values Sometimes measurements include data gaps, i.e. some data elements may be missing. Furthermore, in modeling often some parameters have default values that should be used without specifying them. For these two use cases, Series allow the placeholders `null` and `default` to specify unknown elements. Note that `null` and `default` have no type because they are no parameters. The Series type is inferred from the type of all other elements that must have the same type. If all elements are either `null` or `default` then the type of the series is `Any` (unknown type). ``` numbers = (numbers: 1, 2, null) sqrs = map((x: x**2), numbers) print(sqrs) # result: (sqrs: 1, 4, null) print(sqrs[2]) # result: null ``` If some quantity or an element in a data structure critically depends on such elements it gets the `null` value. For example, `print(any((bools: true, null)))` yields `true` and not `null` because the missing value is not critical for the value of the `any()` function. In contrast, `print(all((bools: true, null)))` yields `null`, i.e. undetermined between `true` and `false`, because the placeholder `null` can be both `true` or `false`. ### Implications for Boolean operations Structures, where Boolean values are missing and denoted by `null`, are processed using a [three-valued logic](https://en.wikipedia.org/wiki/Three-valued_logic). This means that a non-Boolean value `null` is returned when a Boolean output is ambiguous. This affects Boolean expressions, the `if`, `filter`, `all` and `any` built-in functions. ## Dealing with failures As all other parameters in textS, the variables are immutable objects. This means that they cannot be modified (updated) once they have been defined and initialized. This behavior can lead to the following situation. Let us have this model that we run in an interactive session: ``` Input > a = 1 Input > b = a / 0 Input > c = 2 Input > f(x) = b * x Input > %start Input > print(f(c)) Arithmetic error: None:2:1 --> b = a / 0 <-- float division by zero ``` Obviously, we will never be able to use function `f` because of the run-time error in evaluating `b`. One workaround is to define a new variable `b_correct` and a new function `f_correct` that uses `b_correct` instead of `b`: ``` Input > b_correct = a / 2 Input > f_correct(x) = b_correct * x Input > f_correct(c) Output > 1.0 ``` Though this is the recommended approach, there are some cases when this is not desirable or practical. One case is if the model contains a large number of the descendants of variable `b`. In this case all these statements have to be rewritten. Furthermore, the statements that are descendants of `b` will never be evaluated but also cannot be removed from the model which is likely to lead to confusions. In another case, the evaluations have not failed but a mistake leading to wrong results is found in a statement. Thus the affected statement and all its descendants have to be invalidated or removed. Effectively, this can be accomplished by updating the statement with the error / mistake: ``` Input > b := a / 2 Input > print(f(c)) Output > 1.0 ``` Using this approach, all descendants have been found and reevaluated. Currently, there are some restrictions to this approach: 1. The set of references in the updated variable parameter must be identical with that in the parameter of the original variable. For example, in the example above, the update `b := 1 / 2` is not valid because the reference to `a` is not used. Also, `b := c + 1` is not valid because it includes a reference to `c` that is not in the original version of `b`. 1. A variable can be updated only once per model extension. The update becomes ambiguous otherwise. For example, the update `b := 1 / a; b := a / 2` is not valid. 1. The variable may not be part of a [parameter variation](bulk.md#bulk-processing) across several models (a model group). For example, if variable `a` is in such a variation, that has been added with the statement `vary ((a: 1, 2))` then it cannot be updated with `a := 3` in further model extensions. 1. Variables containing `if` expressions or Boolean expressions with [`?` annotations](evaluation.md#order-of-evaluation-and-short-circuiting) as parameters, as well as the ancestors of such variables, cannot be updated. 1. Variables containing [parallel `map`, `filter` and `reduce`](resources.md#number-of-chunks) cannot be updated. The approach described here is recommended if the evaluation error is caused by the model input. If the error during evaluation is due to failure of computing nodes, network or file system, or other similar failures, then the evaluation can be rerun in an interactive session by using the [%rerun magic](tools.md#specific-features). ## Using physical constants Physical constants are provided in the module `constants` in the `virtmat` namespace. Example: ``` use speed_of_light from virtmat.constants print(speed_of_light) # result: 1.0 [speed_of_light] print(speed_of_light) [m/s] # result: 299792458.0 [meter / second] print(speed_of_light [_base]) # result: 299792458.0 [meter / second] print(speed_of_light [_compact]) # result: 1.0 [speed_of_light] ``` A list of the names of all currently provided constants can be retrieved from the same module: ``` use CONSTANTS from virtmat.constants print(CONSTANTS) ``` Additionally, the definitions of these constants can be found [here](https://github.com/hgrecco/pint/blob/master/pint/constants_en.txt). ## Using random numbers Random number generators are automatically enabled in textS. In order to use a random sampling function, the collection in `virtmat.functions.random` can be used: ``` from virtmat.functions.random use var = ([Table|Dict]) ``` The `function name` is the name of the relevant function for random sampling from `numpy.random`. All [sampling functions](https://numpy.org/devdocs/reference/random/generator.html) from `numpy.random` are supported except for `bytes` and `shuffle`. The table, or dictionary, contain the parameters to pass to the function, according to `numpy.random` specifications. The table (dictionary) are optional, i.e. a function can be called without parameters. Additional to the parameters specified in the original `numpy.random` functions, all `virtmat.functions.random` functions accept two extra parameters: `rng` and `seed`. The `rng` parameter is the name of one of the numpy [bit generators](https://numpy.org/devdocs/reference/random/bit_generators/index.html), i.e. `MT19937`, `PCG64`, `PCG64DXSM`, `Philox`, or `SFC64`. If not specified, the default `PCG64` is chosen. After time, another bit generator might be selected by default in numpy (see this [link](https://numpy.org/neps/nep-0019-rng-policy.html#nep19)) and if the generator has not been explicitly specified, then the result cannot be reproduced. The `seed` parameter is a 128-bit integer in hexadecimal format, e.g. `8ed8c93f1db74abebc32d228a25f7628`. A random seed will be generated automatically if `seed` is not specified. Reproducibility of the results is only guaranteed if `rng` and `seed` are explicitly specified, so it is recommended to always set these two parameters. The [type](#type) of the returned (generated) value depends on the function `function name` and the keyword `size` that may be specified in the table (or dictionary), as explained in the following table. The datatype (Integer or Float) depends on the specific function. | `size` | Returned type | Example | Returned value (sample) |-------------|-----------------|------------------------------|--------------- | --- | scalar Quantity | `print(random())` | `0.94310827` | scalar | numeric Series | `print(random(((size: 3))))` | `(None: 0.91326699, 0.70248697, 0.38115418)` | Tuple | numeric Array | `print(uniform(((size: (3,)))))` | `[0.48435748, 0.88414761, 0.97680693]` | Tuple | numeric Array | `print(uniform(((size: (1, 3)))))` | `[[0.48716348, 0.245202471, 0.219273127]]` The [physical unit](#physical-units) of the returned value depends on the units of some numerical inputs specified in the table. If all numerical inputs are dimensionless or no inputs are specified then the returned value is dimensionless. For example, the generated quantity stored in the `distr` variable, ``` from virtmat.functions.random use normal distr = normal {loc: 0 [nm], scale: 0.2 [nm]} ``` will be in nanometers. **NOTE**: Though it is possible to import the same functions directly from `numpy.random`, this is not recommended. For example, instead of using this code ``` from numpy.random use normal s = normal(0.0, 0.1) ``` one should use ``` from virtmat.functions.random use normal s = normal(((loc: 0.0), (scale: 0.1))) ```