Section 3

Examples Revisited

Tokenize Example

For this, http://cs-people.bu.edu/lapets/320/materials.php#2c7cc0e69813454eafc7128e5ac57055

[t for t in re.split(r"(\s+|true|false|and|or|not|,|\(|\))", s)]

Here, the r means raw string, please find "raw string" in this page for more information, http://docs.python.org/3/tutorial/introduction.html#strings.

The pattern \s+ matches all spaces. It is the regular expression (of pattern matching). You can try pattern matching here, http://regexpal.com/. Also, on that website, the right cornor has a Quick Reference, where you can find useful things.

Parsing Example 1

For this, http://cs-people.bu.edu/lapets/320/materials.php#2cd418f2876c42d59a57e34bd6288f22

Here,

(e1, tokens) = parse(tokens[2:])

It is an assignment statement. In Python, we have different forms of assignment statements.

x, y, z = 1, 2, 3      # assign multiple variables
(a, b, c) = (1, 2, 3)  # assign a tuple to another tuple

In our code, the parse return value is a tuple of two members, like this: ({'Or':[e1,e2]}, tokens[1:]). Therefore we can assign the return value to another tuple (e1, tokens).

Let's take a look at a very simple example.

not (and (ture, flase))
  1. After tokenize, we get ['not', '(', 'and', '(', 'ture', ',', 'false', ')', ')']
  2. tokens[0] is 'not', and tokens[1] is '(', we enter the recursive call

    1. Now, the new params for this recursive call is ['and', '(', 'ture', ',', 'false', ')', ')']
    2. Here, tokens[0] is 'and', and tokens[1] is '('. We enter the recursive call again.

      1. Now, the new params for this recursive call is ['ture', ',', 'false', ')', ')']
      2. Here, tokens[0] is 'true'. Return the tuple ('True', [',', 'false', ')', ')'])
    3. We do the assignment, e1 = 'True', tokens = [',', 'false', ')', ')']
    4. tokens[0] is ',', we enter recursive call again

      1. Now, the new params is ['false', ')', ')']
      2. tokens[0] is 'false', return ('False', [')', ')'])
    5. We do the assignment again, e2 = 'False', tokens = [')', ')'].
    6. tokens[0] is ')', return ({'And':['True', 'False']}, [')']).

  3. Now we do the assignment again, e1 = {'And':['True', 'False']}, tokens = [')'].
  4. tokens[0] is ')', return ({'Not':[{'And':['True', 'False']}]}, []).

Parsing Example 2

For this, http://cs-people.bu.edu/lapets/320/materials.php#5917f6f5de7c46079190f4b6c4961ca3

It is nearly the same as Example 1, but with backtracking.

Suppose we are parsing

(true and false)
  1. After tokenize, we have parse (['(', 'true', 'and', 'false', ')'], true).
  2. Copy tmp into tokens, and tokens[0] is '('.
  3. Now the assignment makes tokens = ['true', 'and', 'false', ')'].
  4. Enter the recursive call, with tokens, False.

    1. tokens[0] is 'true'
    2. tokens = ['and', 'false', ')'], remove the first one.
    3. Since it is not top level, as indicated by the second param, we return ('True', ['and', 'false', ')'])
  5. Then the assignment, r = ('True', ['and', 'false', ')'])
  6. r is not empty, do the assignment again, e1 = 'True', tokens = ['and', 'false', ')'].
  7. tokens[0] is not or, therefore we skip this branch, and move on to the next block of code, redo all the work above, until the tokens[0] matches 'and'.
  8. ...

So this is how backtrack works. I believe that you can do the rest interpretation.

Semantics by Example

Previously, what we do has nothing to do with meanings. We are just manipulating symbols/tokens. Althought we can have a program like

and (true, false)

which conforms to the grammar

We can also have an equivalent program and a grammar like this,

abc [x, y]

which conforms to

This is an example showing you that up to now, it is only a matter of symbols. They are equivalent, and they don't have meanings yet.

Meanings of Languages

Semantics are meanings of a language. Both syntax and semantics are part of the definition of a language.

Examples,

  • and (true, false) evaluates to false. We can say that the value false is the meaning of the program.
  • 1 + 2 * 3 evaluates to 7. We can say that the value 7 is the meaning of the program.
  • abc [x, y] evaluates to x, we can say x is the meaning of the program. This is somehow more abstract, and we actually need a whole bunch of rules to formalize the process of computing the meaning of the program. We indeed have these rules for the previous two examples, but they are not explicitly listed here. Instead, we have that in mind. We know that logically, true and false returns false, and in math, 1+2*3 is 7.
  • and (true, false) will finally return a boolen value. From the typechecker's view, the type boolean could also be a meaning of the program.
  • let x = 7 in x + 3 end, will result in a binding (bind 7 to variable x), an addition, and yields 10 in the end. Therefore, the binding process is the meaning of the subprogram x = 7, and 10 is first the meaning of that addition, and second the meaning of the whole let-in-end program.

Denotational Semantics by Example

Please refer to the definition given in the lectures. http://cs-people.bu.edu/lapets/320/materials.php#41aa3d27c2754fa880cb8c2280322f63

We can then assign meanings to programs like this:

  • [[2 + 3]] = 5
  • [[a + b]] = [[a]] + [[b]] where the meaning of + is the usual algebra addition operation.

These are concrete examples, which maps concrete syntax instances to symbols in algebra domain. In practice, we usually map abstract syntax to some chosen domain, by describing them in a recursive way using inference rules.

note

Again, for extermaly simple languages, we can manually assign meaning to each possible valid string. But in reality, we simply can't do that for a infinite set of strings of a language. Therefore, people define semantics using rules again. Just like what we did for grammars.

To put it in another way, denotational semantics use functions to describe the semantics of a programming language. The function describes semantics by associating semantic values to sytactically correct constructs. Therefore, the domain of the function is syntax, while the range of the function is semantics. We can say that the function maps a syntactical structure 2 + 3 to a semantic value 5.

You can find some informal description and examples here, http://en.wikibooks.org/wiki/Haskell/Denotational_semantics.

Operational Semantics by Example

The Denotational Semantics is a way to assign meanings to programs. Here the "meanings" is the usual understanding of the word "meanings". But in Operational Semantics, we describe how the program runs. And we treat that as the "meaning" of the program. Please refer to the definition given in the lectures. http://cs-people.bu.edu/lapets/320/materials.php#b16e887dc1bc430da525e124a5f603b1

Suppose we have such a program, containing a ite (if-else-then) expression. How should we assign meaning to it?

IF cond THEN expr_1 ELSE expr_2 

We can say that, if cond can be transformed to true, and expr_1 can be transformed to A, then the whole program can be transformed to A. If cond can be transformed to false, and expr_2 can be transformed to B, then the whole program can be transformed to B.

This can be formally described by semantic rules like this (environments are omitted here)


$$\frac{cond \Downarrow true\qquad expr_1 \Downarrow A}{\textbf{if }cond \textbf{ then }expr_1 \textbf{ else }expr_2 \Downarrow A} \qquad \frac{cond \Downarrow false\qquad expr_2 \Downarrow B}{\textbf{if }cond \textbf{ then }expr_1 \textbf{ else }expr_2 \Downarrow B}$$

note

Different notations exists, someone use , others may use . Just be aware of that. However, most of the time, the part above the line are assumptions, or premises. The part below the line are conclusions. And the rule should be read as "if the assumptions are ture, we then have the conclusions are ture, too". The also separates premises from conclusions, in a smaller range. We can read A ⇓ B as "under the premises that A is true, we have that B is also true", or we can also say "A infers B".

Here, we are describing how the program is going to run on a machine, but not necessarly a physical computing machine. Most of the case, we are running the program on a abstract machine. The machine usually consists of Program, Control, and Store. [PL]_

  • Program here, is simply the program we are analysing, IF-ELSE-THEN.
  • Control is how we are going to run the program. Say by applying the semantic rules, we reduce the program into either expr_1 or expr_2. That's like running a single step of the program. By running the program many steps, we reach the end where the program becomes a semantic value, which we can not further reduce.
  • Store, is usually represented as the environment which is a function mapping identifiers/variables to their semantic values.

So the operational semantics of a programming language usually define how this machine reacts to a program, and how the storage is changed during execution.

note

If you want to dig into formal semantics, please read the Formal Semantics chapter in this book: Programming Languages: Principles and Practices by Kenneth.

Summary

  • Grammars/Syntax define what is a valid string of a language
  • Semantics define what is the meaning of a valid string of a language
  • There are different ways of defining semantics of a language, including operational semantics and denotational semantics.
  • Denotational Semantics define semantics by using functions to map syntactical constructs to semantic values. By applying rules, we can finally compute the semantic value of a program.
  • Operational Semantics define semantics by defining how the program will run on a abstract machine. By running, we usually mean reducing the program into other forms, until we get to the semantic value.

Bibliography