>> Table of Contents >> Article

Syntactic properties

SML demands some syntactic properties which differ from XML. First, it should be mentioned that encoding a file in UTF-8 or UTF-16, as common in XML, unfortunately still may cause problems with many languages when reading these files. Among these languages are at the present time, inter alia, Perl 5 and regrettably PHP 4. While this might be seen as a failure of the developers and not a property of XML as such, it is still an unacceptable condition, when it comes to initialize an application. Therefore, it is useful for the purpose of the initialization of the framework not to use these encodings (at least for the current moment). Instead ISO Latin-1 is used as default charset. Characters, which are not included in this charset, should be encoded by using proper entities.

Unlike XML the SML syntax does not demand, that a document must contain exactly one single tag as root element. If a document contains a forest of multiple node trees, they are interpreted as child elements of a virtual anonymous root element.

Furthermore the SML format demands that all opened tags need to have a corresponding closing tag. The names of these tags are by definition not case-sensitive.

There is a convention, that a line break has to be entered after each closing tag. Also by convention, only white-space characters are allowed prior to an opening tag. A tag may either contain character data (CDATA) or other tags, but not both. If the tag is a CDATA-section, both start and end tags need to be in one line. If the tag contains other tags, a line break must be inserted directly after the opening tag. Line breaks in CDATA-section may be inserted by using escape sequences like "\n".

All CDATA-section outside a tag and all section which do not comply with this syntax are ignored and treated like comments. Unlike XML, no special handling for CDATA-sections or comments is demanded.

It is not allowed to use the XML-like syntax for empty tags "<br />" (use "<br></br>" instead).

Finally, the SML format uses no attributes by definition.

The reason for these rather restrictive syntax will become clearer with the following example.

Example

The following examples will be described, how native data types in PHP scripts are represented by the SML format.

scalar variables

Presentation in PHP-code:
<?php 
$a 1;
$b 'string';
$c 12.5;
$d true;
?>
equivalent presentation in SML:
<a>1</a>
<b>string</b>
<c>12.5</c>
<d>true</d>

At this point, it is important to understand a syntactic constraint of PHP. A PHP function may have only one single return value. (The case that the function takes it's parameter by reference is not considered here.) A PHP function the reads the contents of a SML document, is thus forced to return the (multi-dimensional) content as an array. The contents of this array can be copied to variables within the context of the calling program fragment, so that the initial condition is restored.

To a second example: Documents in JSO-notation are subject to the same restrictions.

numeric data fields
Presentation in PHP-code:
<?php 
$A = array();
$A[0] = 1;
$A[1] = 'string';
$A[2] = 12.5;
$A[1000] = true;
$B = array();
$B[0] = 2;
?>
equivalent presentation in SML:
<A>
    <0>1</0>
    <1>string</1>
    <2>12.5</2>
    <1000>true</1000>
</A>
<B>
    <0>2</0>
</B>
associative data fields
Presentation in PHP-code:
<?php 
$SCREEN = array();
$SCREEN["width"] = 1024;
$SCREEN["height"] = 768;
$SCREEN["depth"] = "32-Bit";
?>
equivalent presentation in SML:
<SCREEN>
    <width>1024</width>
    <height>768</height>
    <depth>32-Bit</depth>
</SCREEN>
multi-dimensional and mixed data fields
Presentation in PHP-Code:
<?php 
$A = array();
$A["a"] = 1;
$A["b"] = 2;
$A[0] = "three";
$B = array();
$B[0] = true;
$B["a"] = 1000;
$B["b"] = array();
$B["b"][0] = 1;
$B["b"][1] = 2;
$B["b"]["a"] = 3;
?>
equivalent presentation in SML:
<A>
	<a>1</a>
	<b>2</b>
	<0>three</0>
</A>
<B>
	<a>1000</a>
	<b>
		<0>1</0>
		<1>2</1>
		<a>3</a>
	</b>
</B>

As can be seen from the above examples, the format is well suited to store scalar variables and data fields, which are used as parameters for the initialization of the framework. The specification of a data type is not necessary, because PHP is not strictly typed. The type of a variable is chosen dynamically at runtime. The only essential distinction is between scalar values, and data fields. This distinction can be easily made based on the syntax.

Author: Thomas Meyer, www.yanaframework.net