This chapter describes LME (Lightweight Math Engine), the interpreter for numeric computing used by Sysquake.
An LME program, or a code fragment typed at a command line, is composed of statements. A statement can be either a simple expression, a variable assignment, or a programming construct. Statements are separated by commas, semicolons, or end of lines. The end of line has the same meaning as a comma, unless the line ends with a semicolon. When simple expressions and assignments are followed by a comma (or an end of line), the result is displayed to the standard output; when they are followed by a semicolon, no output is produced. What follows programming constructs does not matter.
When typed at the command line, the result of simple expressions is assigned to the variable ans; this makes easy reusing intermediate results in successive expressions.
A statement can span over several lines, provided all the lines but the last one end with three dots. For example,
1 + ... 2
is equivalent to 1 + 2. After the three dots, the remaining of the line, as well as empty lines and lines which contain only spaces, are ignored.
Inside parenthesis or braces, line breaks are permitted even if they are not escaped by three dots. Inside brackets, line breaks are matrix row separators, like semicolons, unless they follow a comma or a semicolon where they are ignored.
Unless when it is part of a string enclosed between single ticks, a single percent character or two slash characters mark the beginning of a comment, which continues until the end of the line and is ignored by LME. Comments must follow continuation characters, if any.
a = 2; % comment at the end of a line x = 5; // another comment % comment spanning the whole line b = ... % comment after the continuation characters a; a = 3% no need to put spaces before the percent sign s = '%'; % percent characters in a string
Comments may also be enclosed between /* and */; in that case, they can span several lines.
Pragmas are directives for the LME compiler. They can be placed at the same location as LME statements, i.e. in separate lines or between semicolons or commas. They have the following syntax:
_pragma name arguments
where name is the pragma name and arguments are additional data whose meaning depends on the pragma.
Currently, only one pragma is defined. Pragmas with unknown names are ignored.
Name | Arguments | Effect |
---|---|---|
line | n | Set the current line number to n |
_pragma line 120 sets the current line number as reported by error messages or used by the debugger or profiler to 120. This can be useful when the LME source code has been generated by processing another file, and line numbers displayed in error messages should refer to the original file.
Functions are fragments of code which can use input arguments as parameters and produce output arguments as results. They can be built in LME (built-in functions), loaded from optional extensions, or defined with LME statements (user functions).
A function call is the action of executing a function, maybe with input and/or output arguments. LME supports different syntaxes.
fun fun() fun(in1) fun(in1, in2,...) out1 = fun... (out1, out2, ...) = fun... [out1, out2, ...] = fun... [out1 out2 ...] = fun...
Input arguments are enclosed between parenthesis. They are passed to the called function by value, which means that they cannot be modified by the called function. When a function is called without any input argument, parenthesis may be omitted.
Output arguments are assigned to variables or part of variables (structure field, list element, or array element). A single output argument is specified on the left on an equal character. Several output arguments must be enclosed between parenthesis or square brackets (arguments can simply be separated by spaces when they are enclosed in brackets). Parenthesis and square brackets are equivalent as far as LME is concerned; parenthesis are preferred in LME code, but square brackets are available for compatibility with third-party applications.
Output arguments can be discarded without assigning them to variables either by providing a shorter list of variables if the arguments to be discarded are at the end, or by replacing their name with a tilde character. For example to get the index of the maximum value in a vector and to discard the value itself:
(~, index) = max([2, 1, 5, 3]);
Input arguments are usually recognized by their position. Some functions also differentiate them by their data type. This can lead to code which is difficult to write and to maintain. A third method to distinguish the input arguments of a function is to tag them with a name, with a syntax similar to an assignment. Named arguments must follow unnamed arguments.
fun(1, [2,3], dim=2, order=1);
For some functions, named arguments are an alternative to a sequence of unnamed arguments.
When a function has only literal character strings as input arguments, a simpler syntax can be used. The following conditions must be satisfied:
In that case, the following syntax is accepted; left and right columns are equivalent.
Command | Function |
---|---|
fun str1 | fun('str1') |
fun str1 str2 | fun('str1','str2') |
fun abc,def | fun('abc'),def |
Arguments can also be quoted strings; in that case, they may contain spaces, tabulators, commas, semicolons, and escape sequences beginning with a backslash (see below for a description of the string data type). Quoted and unquoted arguments can be mixed:
fun 'a bc\n' | fun('a bc\n') |
fun str1 'str 2' | fun('str1','str 2') |
The command syntax is especially useful for functions which accept well-known options represented as strings, such as format loose.
Libraries are collections of user functions, identified in LME by a name. Typically, they are stored in a file whose name is the library name with a ".lml" suffix (for instance, library stdlib is stored in file "stdlib.lml"). Before a user function can be called, its library must be loaded with the use statement. use statements have an effect only in the context where they are placed, i.e. in a library, or the command-line interface, or a Sysquake SQ file; this way, different libraries may define functions with the same name provided they are not used in the same context.
In a library, functions can be public or private. Public functions may be called from any context which use the library, while private functions are visible only from the library they are defined in.
The basic type of LME is the two-dimensional array, or matrix. Scalar numbers and row or column vectors are special kinds of matrices. Arrays with more than two dimensions are also supported. All elements have the same type, which are described in the table below. Two non-numeric types exist for character arrays and logical (boolean) arrays. Cell arrays, which contain composite types, are described in a section below.
Type | Description |
---|---|
double | 64-bit IEEE number |
complex double | Two 64-bit IEEE numbers |
single | 32-bit IEEE number |
complex single | Two 32-bit IEEE numbers |
uint32 | 32-bit unsigned integer |
int32 | 32-bit signed integer |
uint16 | 16-bit unsigned integer |
int16 | 16-bit signed integer |
uint8 | 8-bit unsigned integer |
int8 | 8-bit signed integer |
uint64 | 64-bit unsigned integer |
int64 | 64-bit signed integer |
64-bit integer numbers are not supported by all applications on all platforms.
These basic types can be used to represent many mathematic objects:
Unless a conversion function is used explicitly, numbers are represented by double or complex values. Most mathematical functions accept as input any type of numeric value and convert them to double; they return a real or complex value according to their mathematical definition.
Basic element-wise arithmetic and comparison operators accept directly integer types ("element-wise" means the operators + - .* ./ .\ and the functions mod and rem, as well as operators * / \ with a scalar multiplicand or divisor). If their arguments do not have the same type, they are converted to the size of the largest argument size, in the following order:
double > single > uint64 > int64 > uint32 > int32 > uint16 > int16 > uint8 > int8
Literal two-dimensional arrays are enclosed in brackets. Rows are separated with semicolons or line breaks, and row elements with commas or spaces. Here are three different ways to write the same 2-by-3 double array.
A = [1, 2, 3; 4, 5, 6]; A = [1 2 3 4 5 6]; A = [1, 2, 3; 4, 5 6];
Functions which manipulate arrays (such as reshape which changes their size or repmat which replicates them) preserve their type.
To convert arrays to numeric, char, or logical arrays, use functions + (unary operator), char, or logical respectively. To convert the numeric types, use functions double, single, or uint8 and similar functions.
Double and complex numbers are stored as floating-point numbers, whose finite accuracy depends on the number magnitude. During computations, round-off errors can accumulate and lead to visible artifacts; for example, 2-sqrt(2)*sqrt(2), which is mathematically 0, yields -4.4409e-16. Integers whose absolute value is smaller than 2^52 (about 4.5e15) have an exact representation, though.
Literal double numbers (constant numbers given by their numeric value) have an optional sign, an integer part, an optional fractional part following a dot, and an optional exponent. The exponent is the power of ten which multiplies the number; it is made of the letter 'e' or 'E' followed by an optional sign and an integer number. Numbers too large to be represented by the floating-point format are changed to plus or minus infinity; too small numbers are changed to 0. Here are some examples (numbers on the same line are equivalent):
123 +123 123. 123.00 12300e-2 -2.5 -25e-1 -0.25e1 -0.25e+1 0 0.0 -0 1e-99999 inf 1e999999 -inf -1e999999
Literal integer numbers may also be expressed in hexadecimal with prefix 0x, in octal with prefix 0, or in binary with prefix 0b. The four literals below all represent 11, stored as double:
0xb 013 0b1011 11
Literal integer numbers stored as integers and literal single numbers are followed by a suffix to specify their type, such as 2int16 for the number 2 stored as a two-byte signed number or 0x300uint32 for the number whose decimal representation is 768 stored as a four-byte unsigned number. All the integer types are valid, as well as single. This syntax gives the same result as the call to the corresponding function (e.g. 2int16 is the same as int16(2)), except when the integer number cannot be represented with a double; then the number is rounded to the nearest value which can be represented with a double. Compare the expressions below:
Expression | Value |
---|---|
uint64(123456789012345678) | 123456789012345696 |
123456789012345678uint64 | 123456789012345678 |
Literal complex numbers are written as the sum or difference of a real number and an imaginary number. Literal imaginary numbers are written as double numbers with an i or j suffix, like 2i, 3.7e5j, or 0xffj. Functions i and j can also be used when there are no variables of the same name, but should be avoided for safety reasons.
The suffices for single and imaginary can be combined as isingle or jsingle, in this order only:
2jsingle 3single + 4isingle
Command format is used to specify how numbers are displayed.
Strings are stored as arrays (usually row vectors) of 16-bit unsigned numbers. Literal strings are enclosed in single quotes:
'Example of string' ''
The second string is empty. For special characters, the following escape sequences are recognized:
Character | Escape seq. | Character code |
---|---|---|
Null | \0 | 0 |
Bell | \a | 7 |
Backspace | \b | 8 |
Horizontal tab | \t | 9 |
Line feed | \n | 10 |
Vertical tab | \v | 11 |
Form feed | \f | 12 |
Carriage return | \r | 13 |
Single tick | \' | 39 |
Single tick | '' (two ') | 39 |
Backslash | \\ | 92 |
Hexadecimal number | \xhh | hh |
Octal number | \ooo | ooo |
16-bit UTF-16 | \uhhhh | 1 UTF-16 code |
21-bit UTF-32 | \Uhhhhhhhh | 1 or 2 UTF-16 codes |
For octal and hexadecimal representations, up to 3 (octal) or 2 (hexadecimal) digits are decoded; the first non-octal or non-hexadecimal digit marks the end of the sequence. The null character can conveniently be encoded with its octal representation, \0, provided it is not followed by octal digits (it should be written \000 in that case). It is an error when another character is found after the backslash. Single ticks can be represented either by a backslash followed by a single tick, or by two single ticks.
Depending on the application and the operating system, strings can contain directly Unicode characters encoded as UTF-8, or MBCS (multibyte character sequences). 16-bit characters encoded with \uhhhh escape sequences are always accepted and handled correctly by all built-in LME functions (low-level input/output to files and devices which are byte-oriented is an exception; explicit UTF-8 conversion should be performed if necessary).
UTF-32 sequences \Uhhhhhhhh assume UTF-16 encoding. In sequences \uhhhh and \Uhhhhhhhh, up to 4 or 8 hexadecimal digits can be provided, respectively, but the first non-hexadecimal character marks the end of the sequence.
For large amounts of text or binary data, the syntax described above is impractical. Inline data is a special syntax for storing strings as raw text or uint8 arrays as base64.
Strings (char arrays of dimension 1-by-n) can be defined in the source code as raw text without any escape sequence with the following syntax:
@/text marker text marker
where @/text is that literal sequence of six characters followed or not by spaces and tabs, marker is an arbitrary sequence of characters without spaces, tabs or end-of-lines which does not occur in the text, and text is the text itself. The spaces, tabs and first end-of-line which follow the first marker are ignored. The last marker must be at the beginning of a line; this means that the string always ends with an end-of-line. The whole text inline data is equivalent to a string with the corresponding characters and can be located in an assignment or any expression. End-of-line sequences (\n, \r or \r\n) are replaced by a single linefeed character.
Here is an example of a short fragment of C code, assigned to variable src. The sequence \n is not interpreted as an escape sequence by LME; it results in the two characters \ and n in src. The trailing semicolon suppresses the display of the assignment, like in any LME expression.
src = @/text""" int main() { printf("Hello, data!\n"); } """;
Arrays of uint8, of dimension n-by-1 (column vectors), can be defined in the source code in a compact way using the base64 encoding in inline data:
@/base64 data
where @/base64 is that literal sequence of eight characters, followed by spaces and/or line breaks, and the data encoded with base64 (see RFC 2045). The base64-encoded data can contain lowercase and uppercase letters a-z and A-Z, digits 0-9, and characters / (slash) and + (plus), and is followed by 0, 1 or 2 characters = (equal) for padding. Spaces, tabs and line breaks are ignored. Comments are not allowed.
The first character which is not a valid base64 character signals the end of the inline data and the beginning of the next token of source code. Inline data can be a part of any expression, assignment or function call, like any other literal value. In the case where the inline data is followed by a character which would erroneously be interpreted as more base64 codes (e.g. neither padding with = nor statement terminator and a keyword at the beginning of the following line), it should be enclosed in parenthesis.
Inline data can be generated with the base64encode function. For example, to encode uint8(0:255).' as inline data, you can evaluate
base64encode(uint8(0:255))
Then copy and paste the result to the source code, for instance as follows to set a variable d (note how the semicolon will be interpreted as the delimiter following the inline data, not the data iteself):
d = @/base64 AAECAwQFBgcICQoLDA0ODxAREhMUFRYXGBkaGxwdHh8gISIjJCUmJygpKiss LS4vMDEyMzQ1Njc4OTo7PD0+P0BBQkNERUZHSElKS0xNTk9QUVJTVFVWV1hZ WltcXV5fYGFiY2RlZmdoaWprbG1ub3BxcnN0dXZ3eHl6e3x9fn+AgYKDhIWG h4iJiouMjY6PkJGSk5SVlpeYmZqbnJ2en6ChoqOkpaanqKmqq6ytrq+wsbKz tLW2t7i5uru8vb6/wMHCw8TFxsfIycrLzM3Oz9DR0tPU1dbX2Nna29zd3t/g 4eLj5OXm5+jp6uvs7e7v8PHy8/T19vf4+fr7/P3+/w== ;
Lists are ordered sets of other elements. They may be made of any type, including lists. Literal lists are enclosed in braces; elements are separated with commas.
{1,[3,6;2,9],'abc',{1,'xx'}}
Lists can be empty:
{}
List's purpose is to collect any kind of data which can be assigned to variables or passed as arguments to functions.
Cell arrays are arrays whose elements (or cells) contain data of any type. They differ from lists only by having more than one dimension. Most functions which expect lists also accept cell arrays; functions which expect cell arrays treat lists of n elements as 1-by-n cell arrays.
To create a cell array with 2 dimensions, cells are written between braces, where rows are separated with semicolons and row elements with commas:
{1, 'abc'; 27, true}
Since the use of braces without semicolon produces a list, there is no direct way to create a cell array with a single row, or an empty cell array. Most of the time, this is not a problem since lists are accepted where cell arrays are expected. To force the creation of a cell array, the reshape function can be used:
reshape({'ab', 'cde'}, 1, 2)
Like lists and cell arrays, structures are sets of data of any type. While list elements are ordered but unnamed, structure elements, called fields, have a name which is used to access them.
There are three ways to make structures: with field assignment syntax inside braces, with the struct function, or by setting each field in an assignment. s.f refers to the value of the field named f in the structure s. Usually, s is the name of a variable; but unless it is in the left part of an assignment, it can be any expression which evaluates to a structure.
a = {label = 'A', position = [2, 3]}; b = struct(name = 'Sysquake', os = {'Windows', 'macOS', 'Linux'}); c.x = 200; c.y = 280; c.radius = 90; d.s = c;
With the assignments above, a.os{3} is 'Linux' and c.s.radius is 90.
While the primary way to access structure fields is by name, field order is still preserved, as can be seen by displaying the strcture, getting the field names with fieldnames, or converting the structure to a cell array with struct2cell. The fields can be reordered with orderfields.
While structure fields can contain any type of array and cell arrays can have structures stored in their cells, structure arrays are arrays where each element has the same named fields. Plain structures are structure arrays of size [1,1], like scalar numbers are arrays of size [1,1].
Values are specified first by indices between parenthesis, then by field name. Braces are invalid to access elements of structure arrays (they can be used to access elements of cell arrays stored in structure array fields).
Structure arrays are created from cell arrays with functions structarray or cell2struct, or by assigning values to fields.
A = structarray('name', {'dog','cat'}, 'weight', {[3,100],[3,18]}); B = cell2struct({'dog','cat';[3,100],[3,18]}, {'name','weight'}); C(1,1).name = 'dog'; C(1,1).weight = [3,100]; C(1,2).name = 'cat'; C(1,2).weight = [3,18];
Column struct arrays (1-dimension) can be defined with field assignments inside braces by separating array elements with semicolons. Missing fields are set to the empty array [].
D = {a = 1, b = 2; a = 5, b = 3; b = 8};
Value sequences are usually written as values separated with commas. They are used as function input arguments or row elements in arrays or lists.
When expressions involving lists, cell arrays or structure arrays evaluate to multiple values, these values are considered as a value sequence, or part of a value sequence, and used as such in context where value sequences are expected. The number of values can be known only at execution time, and may be zero.
L = {1, 2}; v = [L{:}]; // convert L to a row vector c = complex(L{:}); // convert L to a complex number
Value sequences can arise from element access of list or cell arrays with brace indexing, or from structure arrays with field access with or without parenthesis indexing.
Function references are equivalent to the name of a function together with the context in which they are created. Their main use is as argument to other functions. They are obtained with operator @.
Inline and anonymous functions encapsulate executable code. They differ only in the way they are created: inline functions are made with function inline, while anonymous functions have special syntax and semantics where the values of variables in the current context can be captured implicitly without being listed as argument. Their main use is as argument to other functions.
Sets are represented with numeric arrays of any type (integer, real or complex double or single, character, or logical), or lists or cell arrays of strings. Members correspond to an element of the array or list. All set-related functions accept sets with multiple values, which are always reduced to unique values with function unique. They implement membership test, union, intersection, difference, and exclusive or. Numerical sets can be mixed; the result has the same type as when mixing numeric types in array concatenation. Numerical sets and list or cell arrays os strings cannot be mixed.
Null stands for the lack of data. It is both a data type and the only value it can represent. It can be assigned to a variable, be contained in a list or cell array element or a structure field, or passed as an input or output argument to/from a function.
Null is a recent addition to LME, where the lack of data is usually represented by the empty matrix []. It is especially useful when LME is interfaced with languages or libraries where the null value has a special meaning, such as SQL (Structured Query Language, used with relational databases) or the DOM (Document Object Model, used with XML).
Objects are the basis of Object-Oriented Programming (OOP), an approach of programming which puts the emphasis on encapsulated data with a known programmatic interface (the objects). Two OOP languages in common use today are C++ and Java.
The exact definition of OOP varies from person to person. Here is what it means when it relates to LME:
Here is an example of the use of polynom objects, which (as can be guessed from their name) contain polynomials. Statement use polynom imports the definitions of methods for class polynom and others.
use polynom; p = polynom([1,5,0,1]) p = x^3+5x^2+1 q = p^2 + 3 * p / polynom([1,0]) q = x^6+10x^5+25x^4+2x^3+13x^2+15x+1
LME identifies channels for input and output with non-negative integer numbers called file descriptors. File descriptors correspond to files, devices such as serial port, network connections, etc. They are used as input argument by most functions related to input and output, such as fprintf for formatted data output or fgets for reading a line of text.
Note that the description below applies to most LME applications. For some of them, files, command prompts, or standard input are irrelevant or disabled; and standard output does not always correspond to the screen.
At least four file descriptors are predefined:
Value | Input/Output | Purpose |
---|---|---|
0 | Input | Standard input from keyboard |
1 | Output | Standard output to screen |
2 | Output | Standard error to screen |
3 | Output | Prompt for commands |
You can use these file descriptors without calling any opening function
first, and you cannot close them. For instance, to display the value of
fprintf(1, 'pi = %.6f\n', pi); pi = 3.141593
Some functions use implicitly one of these file descriptors. For instance disp displays a value to file descriptor 1, and warning displays a warning message to file descriptor 2.
File descriptors for files and devices are obtained with specific functions. For instance fopen is used for reading from or writing to a file. These functions have as input arguments values which specify what to open and how (file name, host name on a network, input or output mode, etc.), and as output argument a file descriptor. Such file descriptors are valid until a call to fclose, which closes the file or the connection.
When an error occurs, the execution is interrupted and an error message explaining what happened is displayed, unless the code is enclosed in a try/catch block. The whole error message can look like
> use stat > iqr(123) Index out of range for variable 'M' (stat/prctile;61) -> stat/iqr;69
The first line contains an error message, the location in the source code where the error occurred, and the name of the function or operator involved. Here stat is the library name, prctile is the function name, and 61 is the line number in the file which contains the library. If the function where the error occurs is called itself by another function, the whole chain of calls is displayed; here, prctile was called by iqr at line 69 in library stat.
Here is the list of errors which can occur. For some of them, LME attempts to solve the problem itself, e.g. by allocating more memory for the task.