4. Classes and Types

4.1 Overview

Perfect provides a number of fundamental data types, from which more complex types may be built.

4.2 Concepts of type

4.2.1 Classes

In Perfect the term class is used to mean a set of allowed values which is not a proper subset of any other Perfect class nor a union of types.

There are eight predefined non-template classes in Perfect (anything, int, real, bool, char, byte, void, and rank). Note that int is not considered to be a subset of real, and byte is not considered to be a subset of int.

Each instance in a program of the keyword enum or the keyword tag creates a new class which is distinct from all other classes (including classes created by identical declarations elsewhere in the program).

Each occurrence of an abstract class declaration also creates a new class that is distinct from all other classes (including classes created by identical class declarations elsewhere in the program).

4.2.2 Types

In Perfect a type defines a set of allowed values and may be a class, a class associated with a constraint, or a union of types.

A constraint is a predicate that takes a single parameter representing values of the class and returns true for values that are permissible values of the type.

When binding occurrences of functions and operators to their declarations, most constraints are ignored, however, constraints within template parameters must match exactly.

Declaring entities with constrained rather than unconstrained types has the following effects:

4.2.2.1 Constraints on variables and data members

When declaring a variable or a data member of a class, there is a proof obligation that the constraint is satisfied whenever the value of the variable or data member is changed, thereby increasing the degree of validation performed.

A second effect of constraining a variable or data member is that it permits the code generator to allocate a more efficient form of storage. For example, an integer variable with a constraint that the value lies in the range 0 to 100 permits the code generator to use a simple fixed-length binary representation instead of the more general format used to store unbounded integers.

4.2.2.2 Constraints on parameters

When declaring the type of a parameter in a function, operator or schema declaration, the constraint becomes a part of the precondition of that function, operator or schema. It also allows the code generator to optimize the mechanism used for parameter passing.

4.2.2.3 Constraints on results

When declaring the type of a function or operator result, or of a schema parameter which is modified by the schema, the constraint gives rise to an additional proof obligation (i.e. that the result or final value of the parameter belongs to the specified type). It may also allow the code generator to optimize the mechanism used to return the result.

4.2.2.4 Constraints on bound variables

When declaring the type of a bound variable (i.e. following one of the keywords exists, forall, that, any or for), the effect is that the bound variable ranges only over the permitted values, instead of over all values of the class.

4.3 Predefined classes

The predefined classes of Perfect are as follows:

anything	The base class for all other classes, containing member function "toString"
bool	The Boolean class, comprising the values true and false
byte	The class of eight bit bytes
char	The character set supported by the environment (including control characters)
int	The (unbounded) positive and negative integers (including zero)
real	The real numbers. These are represented using the double-precision (64-bit) format defined in standard IEC559:1989 (IEEE 754)
void	The void type, comprising the single value null
rank	The enumeration comprising the values "below", "same" and "above" (in that order)

The base class anything is deferred; all other predefined classes are final classes (see Chapter 7 for the definitions of these terms).

4.4 Predefined types

There are two predefined types that are not classes:

nat	Subset of int consisting of unbounded positive integers and zero
string	Equivalent to seq of char

4.5 Literals for predefined classes

The section on lexical form has already described character literals (which have type char), string literals (which have type seq of char), integer literals (which yield non-negative values of class int) and real literals (which yield non-negative values of class real). Negative literals of class real and int are not directly represented but suitable constants can be constructed from positive literals and the negation operator.

Literals of the bool and void classes are written directly (i.e. as true, false and null).

4.6 Class declarations

New classes may be added by three mechanisms: enumeration, tag generation, and abstract class declaration. Each such class must be named at the point of generation using the syntax:

ClassDeclaration:
        EnumerationDeclaration;
        TagDeclaration;
        AbstractClassDeclaration.

4.6.1 Enumeration generator

Classes may be generated by enumeration of values. The syntax for this is:

EnumerationDeclaration:
class Identifier "^=" enum Identifier *["," Identifier] end.

An enumeration is collection of values having an ordering relation. The lowest and highest operators (see section 5.4.1) may be applied to the enumeration class to obtain the first and last elements respectively. Predecessor and successor operators may be applied to values of enumerated classes (except for the first and last elements respectively). The comparison operator "<" is defined such that each value is considered to be less than all values declared later in the enumeration list.

Every enumeration in a program yields a new class. Two enumerations with identical lists of values yield distinct classes. The identifiers in the list are treated as non-member constants of the enumeration class, and thus must always be referred to by using the enumeration class name to resolve the scope of the value name. For example, given the following declaration:

class Color = enum red, green, blue end

then a value would be referred to as e.g. Color red. See section 5.4.14 for more on scope resolution.

4.6.2 Tag generator

The tag generator yields a class comprising a set of ordered un-named values. It may be thought of as an enumeration where we have asked the compiler to generate the names for us. The syntax is:

TagDeclaration:
class Identifier "^=" tag ["(" Expression ")"].

The optional expression (which must yield a positive integer) is the number of distinct values we require. If no expression is given, the set of distinct values provided is sufficiently large that it will not be exhausted at run-time before some other resource (e.g. memory) is exhausted.

The usual comparison operators are defined on the values. The lowest operator may be used to obtain the lowest tag available and the predecessor and successor operators may be used on tag values. If an expression is given, the highest operator may be used to obtain the highest member of the set.

Every tag constructor yields a (notionally) unique set of values.

[TBD: should we require the expression to be constant at run-time throughout the scope of the declaration? Or constant at compile-time?]

4.6.3 Abstract class declarations

These are covered fully in Chapter 7.

4.7 Type expressions

Types may be expressed in terms of classes and other types in a number of ways:

By using a non-template class name
By providing a class template name with type or class names to use as actual parameters, thereby instantiating it
By associating an existing type or class name with a constraint
By uniting two or more types
By forming the union of all abstract classes that inherit from a particular abstract class

The full syntax for type expressions is:

PossConstrainedTypeExpression:
        ConstrainedTypeExpression;
        TypeExpression.

ConstrainedTypeExpression:
        those Identifier ":" TypeExpr1 ":-" Predicate;
        TypeExpr3 ComparisonOperator Expr4;
        "(" ConstrainedTypeExpression ")".

TypeExpression:
        TypeExpression "||" TypeExpr1;
        TypeExpr1.

TypeExpr1:
        ref TypeExpr2 on Identifier;
        TypeExpr2.

TypeExpr2:
        from ClassName;
        TypeExpr3.

TypeExpr3:
        "(" TypeExpression")";
        ClassName.

ClassName:
        TemplateName of ActualTemplateParameters;
        TypeName.

TypeName:
        anything;
        bool;
        byte;
        char;
        int;
        rank;
        real;
        Identifier.

TemplateName:
        bag;
        map;
        pair;
        seq;
        set;
        triple;
        Identifier.

ActualTemplateParameters:
        TypeExpr1;
        class Identifier;
        "(" TypeExpression *[Separator TypeExpression] ")".

Separator:
        ",";
        "->";
        "<-";
        "<->".

ComparisonOperator:
        "~" CompareOp;
        CompareOp.

CompareOp:
        "=";
        ">";
        "<";
        ">=";
        "<=";
        ">>";
        "<<";
        ">>=";
        "<<=";
        in.

4.8 Instantiating class templates

A class template is instantiated by following its name with the keyword of and a list of class parameters, which are types. If there is only one parameter and it is a type name or template instantiation, it need not be enclosed in brackets, otherwise brackets are required. The separators in the parameter list must match the separators in the corresponding template class declaration.

4.9 Constrained types

A type may be formed by constraining an existing class or type. Two forms of syntax are available for expressing this.

The more general form is "those BoundVariableDeclaration :- Predicate" where Predicate is a function of the bound variable. Since the those construct is delimited by the end of the type expression, if this form is used within a larger type expression, either it must be the last element, or it must be enclosed in brackets. An example of this form would be "those x: nat :- x <= 100".

The abbreviated form is "TypeExpr4 CompareOp Expr4" which is equivalent to the full form "those xx: Type :- xx CompareOp Expr4". So the above example could be written "nat <= 100".

Another abbreviated form is possible in some contexts (see section 6.4).

4.10 United types

Sometimes the value of a variable, parameter or function return may belong to one of several types. In this case we can create a united type using the uniting operator "||"; for example "nat || void". The "||" symbol is pronounced "united with" or simply "or".

The uniting operator is commutative and associative, for example "nat || (void || char)" yields the same union as "char || nat || void".

[SC] The operands of "||" must be disjoint.

4.11 Union of all derived classes

Sometimes it is desirable to express the type comprising the union of the values of all classes derived from some base class (including the values of the base class itself). This is expressed using the notation "from ClassName", where ClassName is the name of a class or a class template instantiation.

4.12 Reference types

An entity of a reference type refers to some value created on a heap. Other entities of reference type may refer to the same value; any changes to the value are therefore visible to all reference entities which refer to it.

When declaring a reference entity, the type (or types) of the value to which it refers must be declared, together with the name of the heap on which the entity will be created.

Note that the use of reference types is the sole mechanism for creating aliases.

4.13 Type naming

It is possible to introduce new names for types, with optional generic parameters. The syntax for this is:

TypeDeclaration:
        class Identifier [of FormalTemplateParameters] "^=" PossConstrainedTypeExpression.

FormalTemplateParameters:
        Identifier;
        "(" Identifier *[Separator Identifier] ")".

For example, we could declare:

class Word ^= seq of char

or:

class ListOfLists of X ^= seq of seq of X

4.14 Predefined class templates

Predefined class templates are provided for six common ways of collecting values of similar types: sets, bags, sequences, pairs, triples and mappings.

4.14.1 Sets

A set comprises a collection which includes none, some or all the values of its base type. It is meaningless to speak of a value occurring in the set more than once. For example, if we consider the set of prime numbers, every positive integer either does or does not belong to the set.

To build a set in a non-ghost context (i.e. in a context where code must be generated) requires that the base type has a non-ghost equality operator. See section 7.1.7 for a full explanation of when the equality operator of a type is ghost.

The name of the class template is set so that, for example, "set of char" is the type expression for a set of characters.

4.14.2 Bags

A bag is a collection of values in which it does make sense to ask how many times a value occurs. For example, if we were interested in collecting statistics on examination results but wished to preserve the anonymity of the candidates concerned, we would be interested only in the marks obtained. Since multiple candidates might score the same marks and we wish to record this, the appropriate data type would be a bag of examination marks.

Like a set, a bag may only be constructed in a ghost context, or with a base type with non-ghost equality.

We construct a bag type using the template name bag, e.g. "bag of nat".

4.14.3 Sequences

The third type of collection is the sequence, or list of values. Sequences are used where the order of values is important. We construct a sequence type using the template name seq, e.g. "seq of char". The same value may occur more than once in a sequence. There is no requirement on the equality operator for constructing sequences.

4.14.4 Pairs

The class "pair of (X, Y)" is used to describe and construct mapping classes. The abstract data members of a pair are a variable of type X named "x" and a variable of type Y named "y".

4.14.5 Triples

The class "triple of (X, Y, Z)" is similar to "pair". The abstract data members of a triple are variables of types X, Y and Z named "x", "y" and "z" respectively.

4.14.6 Mappings

A mapping has a set of values (the domain) paired with a bag (the range) such that every member of the domain is associated with exactly one member of the range. Mappings are useful for describing lookup tables and relational databases. We express a mapping class using an arrow-symbol pointing from the type on the left (the domain type) to the type on the right (the range type) as the separator in the class parameter list, e.g. "map of (char -> nat)".

Like sets and bags, a mapping in a non-ghost context requires a non-ghost equality operator on both the domain and range types.

Note that a sequence can be regarded as a mapping from a range of integers to some other type. This is reflected in the fact that the indexing and domain member functions may be applied to both sequences and mappings. However, the class "seq of X" is considered distinct from "map of (nat -> X)" and all its subclasses.

The abstract data of "map of (X ->Y)" is "those v:set of pair of (X, Y) :- (for x::v.rep(1) yield x.dom).unique".

4.15 Type compatibility

4.15.1 Overview

The presence of unions and subtyping in the type system of Perfect gives rise to various ways in which two types can be related. All semantic rules governing the use of types can be expressed in terms of four fundamental binary relations on types: disjoint (have no common values), same (are equivalent), overlapping (have a common value) and nested (the first type is a subtype of the second). Three of the relations are symmetric; the only exception is 'nested'. Interdependencies between the four relations are shown on the following Venn diagram where each point represents an ordered pair of Perfect types:

Venn diagram

For example, if the type given by type expression X is nested in the type given by Y then types X and Y are also overlapping, not disjoint and can be (but do not have to be) equivalent.

In simple cases the way in which two given types are related trivially follows from their definitions. However, the high expressive power of Perfect means that it is possible to define types that are complex enough to make relations between them less than obvious. Therefore, each of the four fundamental relations is given an unambiguous formal definition. The role of these definitions is similar to that of the formal grammar of the language: both can be used to verify validity of arbitrarily complex language constructs and both are instrumental in the compilation process. The latter role of formal type relations has some far-reaching implications: for example, in Perfect it is possible to guarantee that any conceivable call to an overloaded function will be unambiguous, i.e. no call can possibly type-match more than one variant of the function declaration. This is not even a proof obligation - the "collective consistency" of overloaded declarations is checked by the type inference logic of the compiler based on the formally defined type relations, and any potential ambiguity is reported as an error.

4.15.2 Fundamental type relations

Each type relation is defined by means of a binary predicate on types: for example, the type given by expression X is regarded as a subtype of the type given by expression Y if and only if nested(X,Y) is true. Each of the four predicates is, in turn, defined by collections of clauses; the clause that applies to a particular case is chosen by means of pattern matching according to the syntactical structure of the operands (Perfect type constructors are shown in the definitions in bold face, as is the auxiliary tag 'strict' that has no counterpart in Perfect syntax). Upper-case letters in patterns denote arbitrary type expressions while lower-case Latin letters denote type names. Symbol x in rules (n1) and (n2) represents a polymorphic type name introduced in the declaration of a polymorphic function: in actual Perfect programs this position will be occupied by an identifier. It is possible for a pair of parameters to match more than one clause in the same predicate definition. Such ambiguities are resolved by textual precedence: for example, nesting of a union into another union is governed by the rule (n8) rather than (n9) or (n12). Definitions of all four fundamental type relations are complete, as each predicate definition contains a clause that will match any pair of parameters.

Type namings without constraints do not affect relations between types, and are therefore removed before any type relation is evaluated. In addition, constraints are removed except from within template parameters. For example, "seq of string" becomes "seq of seq of char", whereas "seq of nat" remains the same.

Where removing all constraints from within template parameters would change the result of evaluating a type relation we say that the relation is not defined. For example, "same(seq of nat, seq of int)" is undefined. When we say "X relation Y" we mean the relationship is true, not false or undefined.

Further to that, the following transformations are assumed to be performed before applying the type relation rules:

All unions are flattened: "(a || b) || (c || d)" is replaced by "a || b || c || d"
Instantiations of templates by a single class are rewritten in the bracketed form: "t of x" is replaced by "t of (x)"
All occurrences of polymorphic type names in declarations are prefixed by the keyword "class": "f(a:seq of class x, b:ref x)" is replaced by "f(a:seq of class x, b:ref class x)"

The purpose of these transformations is to get rid of syntactical sugar and to represent the type expressions in a uniform format; this reduces the number of clauses required to define the fundamental predicates without affecting their semantics. Actual implementations of Perfect do not have to transform the types and may use more complex systems of rules instead, as long as the final result stays the same in all cases.

Auxiliary binary predicate 'derived' is defined on type names and reflects the inheritance hierarchy that exists in the program. Namely, derived(a,b) yields true if and only if both 'a' and 'b' denote classes and class 'a' is a descendant of class 'b' or coincides with it.

Predicate disjoint(type1, type2): no value can belong to both type1 and type2

This predicate is simply a logical negation of overlapping:

(d1) disjoint(A, B) == ~ overlapping(A, B)

Predicate same(type1, type2): type1 and type2 are equivalent

Equivalence is defined as mutual nesting (see the diagram):

(s1) same(A, B) == nested(A, B) & nested(B, A)

Predicate overlapping(type1, type2): a value can belong to both type1 and type2

A type overlaps with a union if it overlaps with at least one of its members:

(o1) overlapping(A, B₁|| B₂|| ... || B_k) == exists i::1..k :-overlapping(A, B_i)

It does not matter whether a union occurs as the first or the second argument, as 'overlapping' is a symmetric relation:

(o2) overlapping(A₁|| A₂|| ... || A_k, B) == overlapping (B, A₁|| A₂|| ... || A_k)

There are no other non-trivial cases of overlapping; the rest can only be clean nesting. (Note that because of the absence of multiple inheritance, two 'from' families cannot overlap without being nested.)

(o3) overlapping(A, B) == nested(A, B) or nested(B, A)

Predicate nested(type1,type2): all values of type1 also belong to type2

The first four rules deal with polymorphic type names in function declarations. When a function call is checked for compliance to a function declaration (which is essentially parameter-wise type nesting plus fulfilling the constraints), polymorphic type names can occur on the declaration side only. However, because of the symmetric way in which equivalence is defined in (s1), rules (n2) and (n4) are also necessary. Note that rules (n1) and (n2) have side-effects: all further occurrences of the polymorphic type name in question are replaced by the opposite operand prefixed by a special tag "strict". Together with rules (n3) and (n4) this enforces the requirement that all types occurring in the positions of a function call that correspond to different occurrences of the same polymorphic type name must be equivalent.

(n1) nested(A, class X) == true, class X <- strict A
nested(class X, A) == true, class X <- strict A

(n3) nested(A, strict B) == same (A, B)

(n4) nested(strict A, B) == same (A, B)

Trivial cases with type names and derived families:

(n5) nested(a, b) == a = b

(n6) nested(a, from b) == derived(a, b)

(n7) nested(from a, from b) == derived(a, b)

A union can only be nested in other type if all its members are nested in that type:

(n8) nested(A₁|| A₂|| ... || A_k, B) == forall i::1..k :- nested(A_i, B)

The only way for a (non-union) type to be nested in a union is to be nested in one of its members:

(n9) nested(A, B₁|| B₂|| ... || B_k) == exists i::1..k :- nested(A, B_i)

References are not transparent for subtyping:

(n10) nested(ref A, ref B) == same(A, B)

Two template instantiations are either disjoint or equivalent:

(n11) nested(a of(A₁, ..., A_k), b of(B₁, ..., B_n)) == a = b & k = n & forall i::1..k :- same(A_i, B_i)

There are no other ways in which one type can be nested into another:

(n12) nested(A, B) == false

4.15.3 Examples

Example 1

Is "seq of b || seq of d" a disjoint union, provided class "d" is a descendant of class "b" ?

            disjoint(seq of (b), seq of (d))                                                                 (d1)
=          ~ overlapping(seq of (b), seq of (d))                                                       (o3)
=          ~ (nested(seq of (b), seq of (d)) | nested(seq of (d), seq of (b)))            (n11)
=          ~ (same(b,d) | same(d,b))                                                                        (s1)
=          ~ ((nested(b,d) & nested(d,b)) | (nested(d,b) & nested(b,d))                    (n5)
=          ~ ((false & false) | (false & false))
=          true

Therefore this is a valid union.

Example 2

Is "a || int" a subtype of "int || real || from a" ?

            nested(a || int, int || real || from a)                                                           (n8)
=          nested(a, int || real || from a) & nested(int, int || real || from a)               (n9)
=          (nested(a, int) | nested(a, real) | nested(a, from a)) &
                (nested(int, int) | nested(int, real) | nested(int, from a))                      (n5)
=          (false | false | nested(a, from a)) & (true | false | false)                        (n6)
=          (false | true) & true
=          true

Yes: the first of these types can be used where the second one is required.

Example 3

Is "ref (a || b)" equivalent to "ref a || ref b" ?

same(ref (a || b), ref a || ref b) (s1)
= nested(ref (a || b), ref a || ref b) & nested (ref a || ref b, ref (a || b))

Starting with the left operand of 'and',

            nested(ref (a || b), ref a || ref b)                                                              (n9)
=          nested(ref (a || b), ref a) | nested(ref (a || b), ref b)                                 (n10)
=          same(a || b, a) | same (a || b, b)                                                                (s1)
=          (nested(a || b, a) & nested(a, a || b)) | (nested(a || b, b) & nested(b, a || b))

Let us deal with the leftmost term first:

            nested(a || b, a)                                                                                        (n8)
=          nested(a, a) and nested(b, a)                                                                    (n5)
=          true and false
=          false

In the same way, nested(a || b, b) = false and therefore the answer to the original question is negative.