next up previous
Next: 1.3 Summary Up: 1. Introduction Previous: 1.1 Why SETL?

1.2 A Brief History of SETL

SETL today is essentially the same language Jack Schwartz introduced 30 years ago in Set Theory as a Language for Program Specification and Programming [175]:

It may be remarked in favor of SETL that the mathematical experience of the past half-century, and especially that gathered by mathematical logicians pursuing foundational studies, reveals the theory of sets to incorporate a very powerful language in terms of which the whole structure of mathematics can rapidly be built up from elementary foundations. By applying SETL to the specification of a number of fairly complex algorithms taken from various parts of compiler theory, we shall see that it inherits these same advantages from the general set theory upon which it is modeled. It may also be noted that, perhaps partly because of its classical familiarity, the mathematical set-notion provides a comfortable framework, that is, requiring the imposition of relatively few artificial constructions upon the basic skeleton of an analysis. We shall see that SETL inherits this advantage also, so that it will allow us to describe algorithms precisely but with relatively few of those superimposed conventions which make programs artificial, lengthy, and hard to read.

The contrast between the expressive efficiency of mathematics and the obsessive parsimony of machine-oriented languages was highlighted in On Programming [177, p. vii]:

On the one hand, programming is concerned with the specification of algorithmic processes in a form ultimately machinable. On the other, mathematics describes some of these same processes, or in some cases merely their results, almost always in a much more succinct form, yet in a form whose precision all will admit. Comparing the two, one gets a very strong even if initially confused impression that programming is somehow more difficult than it should be. Why is this? That is, why must there be so large a gap between a logically precise specification of an object to be constructed and a programming language account of a method for its construction? The core of the answer may be given in a single word: efficiency. However, as we shall see, we will want to take this word in a rather different sense than that which ordinarily preoccupies programmers.

More specifically, the implicit dictions used in the language of mathematics, which dictions give this language much of its power, often imply searches over infinite or at any rate very large sets. Programming algorithms realizing these same constructions must of necessity be equivalent procedures devised so as to cut down on the ranges that will be searched to find the objects one is looking for. In this sense, one may say that programming is optimization and that mathematics is what programming becomes when we forget optimization and program in the manner appropriate for an infinitely fast machine with infinite amounts of memory. At the most fundamental level, it is the mass of optimizations with which it is burdened that makes programming so cumbersome a process, and it is the sluggishness of this process that is the principal obstacle to the development of the computer art.

This perspective, as hinted in the first quotation above, sprang from the strong perception in the late 1960s that there was a need for a set-oriented language capable of expressing concisely the kind of set-intensive algorithm that kept arising in studies of compiler optimization, such as those by Allen, Cocke, Kennedy, and Schwartz [5,43,6,41,7,10,130,11,8,9,131,178,179,180,12,132,42]. Programming Languages and their Compilers [44], published early in 1970, devoted more than 200 pages to optimization algorithms. It included many of the now familiar techniques such as redundant code elimination and strength reduction, dealt extensively with graphs of control flow and their partitioning into ``intervals'', and showed how to split nodes in an irreducible flow graph to obtain a reducible one. Many workers in the 1970s and 80s besides those just mentioned identified SETL, directly or indirectly, as a language whose implementation was greatly in need of solutions to difficult compiler optimization problems [80,84,85,143,82,86,123,83,148,149,174,208,3,209]. SETL, while still far from the celestial sphere of pure mathematics, was nonetheless seen as occupying a very high orbit relative to other languages. It was SETL's distance from pure machines that made optimizing its implementations so important and at the same time so difficult.

The synergy between the study of code optimization and the high-level set language used for expressing optimization algorithms led to the SETL compiler project [186,177], which was itself an abundant source of optimization problems. The SETL project produced, among other things, the SETL optimizer [178,88,77], a 24,000-line prototype written in SETL. Unfortunately, on the machines of the day, it was too large to apply to itself. This was a pity because not only is SETL a language which could benefit greatly from a good optimizer, it is also one whose semantic simplicity makes it particularly amenable to the flow-tracing techniques of machine-independent code optimization. The absence of pointers alone circumvents the issue of aliasing, a huge advantage in this kind of analysis.

The sort of data flow (definition-use) information obtainable from analysis of control flow graphs, and more generally from Schwartz's ``value flow'' tracing [177,178,144] that could follow objects when they were stored in aggregates and later extracted, was useful in all sorts of ways. It sustained copy optimization [175,178,179], where the redundant copying of an object could be suppressed when the only subsequent use of the object also modified it, perhaps incrementally. Value flow analysis provided a dependency framework wherein the types of many variables and expressions could be deduced by a transitive closure process starting from the manifest types of literals and other forms [196]. This typefinding process in turn enabled the discovery of relationships of set membership and inclusion [180], which was itself a prelude to automatic data structure choice, because the way an object is used has a profound influence on how it should be implemented. Weiss and Schonberg [208,209] later showed how to do type inference even in the presence of infinite sets of possible types arising from actions such as ``x := {x}''.

Data structure representations had their own sublanguage, the DSRL, which served to annotate, but not otherwise modify, SETL programs coded at an appropriately abstract level. The DSRL was designed to permit a smooth transition from Schwartz's two-level programming regime, in which programmers supplied representational details, to a more fully developed system in which a sophisticated optimizer made the selections [175,56,174,88,77]. An important concept in the DSRL was that of base sets, which were implicitly defined objects that could in principle allow much representational sharing among the objects conceived by the programmer.

Value flow analysis, type inference, copy optimization, and deeper determinations of relationships such as set membership or inclusion between variables preparatory to automatic data structure selection all embody an approach to program analysis described by Sintzoff [189] and called abstract interpretation by Cousot and Cousot [47] or symbolic execution in Muchnick and Jones [149, p. xv]. The essence of this model is that any program P with well-defined semantics can be projected onto a more abstract program A capturing salient properties of objects in P in a manner susceptible of analysis. For example, the sign of a product can be deduced from the signs of its multiplicands without knowing their specific values. Similarly, result types for known operators can usually be gleaned at compile time from operand types regardless of the actual run-time values of those operands. In abstract interpretation, the abstract program A is exercised at P's compile time to discover desired properties of objects in P. The symbols in A combine and recombine according to an algebra appropriate for their purpose. If that algebra has been designed with feasible goals in mind, the exercise will converge. It is typical to ensure this termination by taking advantage of the fact that any set generated by inductive definitions (such as data flow equations) can be defined as the lattice-theoretic least fixed point of a monotone function. This often allows global properties to be inferred from local ones by a straightforward process of transitive closure.

The power and generality of abstract interpretation moved Paige and his colleagues to undertake an ambitious study of program transformations, which ultimately led to the APTS project [33,161,126,162]. The first of the main three transformations used in APTS is dominated convergence [32] for computing fixed points of Tarski [195,48] sequences (fi(1) : i = 0, 1, 2, ...for deflationary, monotone f) with reasonable efficiency. The second is finite differencing [157,164,158], which is a set-theoretic analogue of strength reduction that allows some expensive set operations within loops to be reduced to incremental updates by locating fixed points more quickly through the construction and maintenance of program invariants. The third transformation is real-time simulation [159,30,160,34,166] of an associative memory on an ordinary random-access memory (or with slight additional restrictions a mere pointer-access memory), which effectively automates the tedious programming activity of choosing efficient basings for sets.

Chung Yung has recently used finite differencing in a technique he calls destructive effect analysis, which seeks to incrementalize the copying of aggregates, in his purely functional programming language, EAS, a packaging of the typed $\lambda$-calculus as extended with homogeneous sets [210,211,212].

Transformational programming can be regarded as a formalization of Dijkstra's stepwise refinement [57,58]. As Bloom and Paige [28] point out, the transformational methodology is able to do much more than merely optimize code, or translate a SETL-like language into a C-like one. By helping the algorithm designer reason about time and space complexity in syntactic terms rather than only by means of low-level counting arguments, this technology has actually played a significant role in the invention of several new algorithms with greatly reduced asymptotic complexity compared to previous solutions [165,163,32,31,28,34,38,93], while it has rendered the algorithms themselves more perspicuous both to their inventors and to their students.

The next phase in the development of APTS will seek to improve both its reliability and its performance. Currently, all program transformations in APTS are proved correct (meaning-preserving) by hand, which is slow and error-prone. The hope is to integrate a meta-level proof verifier along the lines of Etna [36], an outgrowth of Cantone, Ferro, and Omodeo's work on fast decision procedures for fragments of finite set theory [37]. Alternatively, the model for an integrated verifier might be the SETL-like NAP [126] system, itself implemented in the SETL derivative Cantor [127,124]. Verification of assertions in, say, Hoare logic [113] would increase confidence in automatically applied transformations. Davis and Schwartz [50] showed how mechanical verification systems could extend themselves with new proof methods without violating soundness or changing the set of statements that could be proved.

The main existing impediment to the speed of APTS is the fact that its database of program property relationships, which is dynamically deduced using a static database of inference rules, must be recomputed after each application of a program transformation. What is being sought is an incremental rule database system that can be used to regenerate the relationship records efficiently after each rewriting operation [161]. Ultimately, it should be possible to apply APTS to itself for further large gains in speed [162], and to take advantage of the technique of partial evaluation [122] to realize a production-grade transformational system.

Recently, Goyal and Paige [94] have revisited the copy optimization problem for SETL and other high-level languages that exemplify Hoare's ideal of a pointer-free style of programming. By taking the well-known technique of dynamic reference counting to achieve ``lazy'' copying, and combining that with static liveness determination based on Schwartz's value flow analysis, they are able to optimize the placement of copy operations and of om assignments, the latter serving to decrement the reference counts of objects known to have no subsequent uses. They also prove the correctness of their alias propagation analysis and code transformations using formal semantics and abstract interpretation.

Goyal [91] has obtained a dramatic improvement in the algorithmic complexity of computing intra-procedural may-alias relations, again using dominated convergence and finite differencing. In his dissertation [92], he develops set-theoretic languages which can express both abstract specifications and low-level implementations in a form which uses a data structure selection method based on a novel type system to preserve the computational transparency that is necessary in order for statements about program efficiency to be meaningful. This is a cornerstone of the general transformational methodology.

There have been a number of implementations of SETL and SETL-like languages over the years. The first was called SETLB [151] and was implemented using an extension of Harrison's extensible LISP-like BALM language, BALMSETL [101,187]. SETLB was succeeded by SETLA [176], was implemented in BALMSETL, and was an almost strict subset of SETL.

The full SETL language itself was implemented in LITTLE [207], syntactically a Fortran-like language supplemented with a notation for bit-field extraction. LITTLE had only two built-in data types, fixed-length bit strings and floating-point numbers, but was used to implement both the compiler and run-time system of the version of SETL maintained and distributed by the Courant Institute of Mathematical Sciences (CIMS) at New York University from the mid-1970s until the late 1980s.

The CIMS SETL system was quite slow and cumbersome, and LITTLE was not widely ported, so in the late 1970s Nigel Chapman, who was then a graduate student at the University of Leeds, designed and implemented a system called Setl-s [39]. It covers a substantial subset of SETL, leaving out only such ephemera as the macros, backtracking, the data representation sublanguage, support for separate compilation, and a few minor syntactic luxuries. The ``-s'' in the name can also stand for ``small'', because Setl-s was a very compact system based on Dewar's celebrated indirect threaded code [55] technique, and was written in MINIMAL, the portable assembly language in which the run-time system of MACRO SPITBOL [54] was implemented. Jay VandeKopple dropped the hyphen from Setl-s and worked on what was then called SETLS in the 1990s [202] while at NYU on sabbatical from Marymount College. He maintains the current version of the compiler and documentation [203].

The batch-oriented character of the full CIMS SETL implementation, its requirement for considerable computing resources, and to some extent its complexity and slowness, led Ed Dubinsky, who in the early 1980s was using SETL informally for discrete mathematics and abstract algebra courses at Clarkson University, to collaborate with Gary Levin in the mid-1980s on the creation of a new, small, interactive system close in syntax and spirit to SETL but without the overhead. Called ISETL [145], this system has now been stable at version 3.0 for many years, is freely and publicly available, comprises little more than 20,000 lines of portable C code, runs quite responsively on all common desktop computers, provides first-class functions (a feature Dubinsky has long valued), and has strong ongoing pedagogical support through several textbooks, annual workshops, and the enthusiasm of a sizeable community of mathematics teachers [22,150,78,183,23,79,76,81].

In the late 1980s at NYU, meanwhile, there was a project aimed at overhauling the CIMS SETL language and implementation. The new version of the language was tentatively named SETL2. Kirk Snyder, a graduate student at that time, became dissatisfied with what appeared to be ceaseless discussion supported by little action, and covertly designed and implemented his own system called SETL2 [190] in about a year. It simplified and modified various aspects of SETL syntax and semantics while it removed the usual apocrypha (macros, backtracking, the data representation sublanguage, and SETL modules), and introduced Ada-based ``packages'' and a portable file format for separate compilation on DOS, Macintosh, Unix, and other platforms. Snyder subsequently added lambda expressions (first-class functions with closures) and support for object-oriented programming, including multiple inheritance [191], though these extensions were not entirely without semantic problems in the context of a nominally value-oriented language. Recently, Toto Paxia has made improvements to the interoperability of SETL2 through his ``native'' package declarations [168] which allow more direct calls out to routines written in C than were previously possible.

Perhaps the most famous of all SETL programs to date is Ada/Ed [2,135,173], the first validated translator and executable semantic model for the language now known as Ada 83 [198]. It established convincingly that SETL is well suited for the rapid prototyping of complex systems of significant size, and that when care is taken in the construction of such prototypes, they can serve as readable, definitive specifications to inform and guide the building of production systems.

SETL's success as a prototyping tool spawned the Esprit SED (SETL Experimentation and Demonstration) project [125] of the late 1980s, which was a sweeping effort to create a SETL-based prototyping environment complete with highly sophisticated language manipulation tools at the syntactic and semantic levels [73]. This included a SETL-to-Ada translator [72,69], an editor, a debugger, and a performance-profiling monitor [29]. The latter was rendered particularly accommodating and non-invasive by the use of coöperating processes sharing messages over TCP sockets. Abstract interpretation was the operative model in both the ambitious Tenenbaum-based [196] type inferencer and Paige's more general RAPTS [158] transformational system (the predecessor to APTS), which was used to prototype ``meta-SETL'' [4], an AST-traversing interpreter. SED employed a rich set of language processing tools such as Typol [51,52] for type checking and other semantic analysis via pattern-directed inference (abstract interpretation), and a Mentor-based [74] interface to the syntax-directed editing environment. Interoperability was addressed in the SETL-to-Ada translator and in the performance monitor by means of ISLE (the Interface Specification Language and Environment), which was important for the SED project's demonstration of rapid prototyping in SETL in a cartography application containing a package of computational geometry algorithms [27].

Jean-Pierre Keller, the leader of the SED project, went on to define a SETL descendant which he called Cantor [127,124]. Cantor is actually closer in syntax and semantics to ISETL [145] than to SETL, and is implemented in ISETL. It has first-class functions, some concurrency mechanisms, and a set of predefined objects for GUI construction.

Encouraged by SED's contributions to the art of programming in the large and by the project's inchoate plans for persistence in SETL, but disappointed by SED's failure to arrive at a coherent product [70], Ernst-Erich Doberkat proposed integrating persistent backing stores called P-files and their requisite namespace support into SETL/E [66,61], a revision of SETL that was extended with a process creation operator and renamed ProSet [71] to signify its role in prototyping. Doberkat's interest in SETL during the 1980s [59,65,63,72,68,69] grew in the 1990s into a more general interest in software engineering with set-oriented languages having intrinsic persistence features [60,64,67,70,71,62] that sought to spare the programmer the trouble of coding data movement operations explicitly. Willi Hasselbring also showed how to translate [103] a subset of SETL/E into Snyder's SETL2. ProSet tuples are natural candidates for inter-process communication via Linda tuple spaces [90], so Hasselbring has worked extensively throughout the 1990s on and with a methodology for prototyping concurrent applications using a hybrid system called ProSet-Linda [102,104,109,105,106,107,108,111], and compared this approach to several others [110,112].

An entirely different notion of persistence, pertaining not to a backing store but to the ability of a data structure to retain its update history in a way which preserves the time and space efficiency of access to current and past states [75], was used by Zhiqing Liu to create a SETL run-time system and graphical debugging interface which allows users to scroll backward and forward in a program execution history [138,140,139]. Liu has also tried to bring some of the convenience of SETL into the relatively low-level world of C++ with his LIBSETL [141] header files and run-time library.

Another relative of SETL is Slim [20], which its designer, Herman Venter, describes as ``more like a cousin to SETL than a child, since it shares a common heritage with SETL, but was independently designed'' [204]. It supports object-oriented programming and allows optional type declarations. Both the language and its implementation are relatively small but complete.

An experimental language which bears some kinship to SETL at the data structure level is the functional language SequenceL [45,46], formerly called BagL. Every data item in SequenceL is a sequence--even single elements are viewed as one-element sequences. This facilitates a highly orthogonal treatment of operations and particularly distributivity, rather in the spirit of APL. SequenceL also has maps for use in associative subscripting. This language provides few syntactic comforts, however, offering little more than a few denotational forms and a prefix function application notation that is used even for binary operators.

The Griffin [95] language which was designed at New York University in the early 1990s was intended to be a general-purpose successor to SETL. Its goals were very lofty, and included what is surely the most comprehensive type and type inference system ever proposed. Griffin was supposed to give the programmer complete freedom of choice as to whether to code in a functional or an imperative style. It had language constructs for database-style transactions, namespaces, and persistence. Real-time facilities, Ada interoperability, exceptions, and broad support for concurrency were also built in at the language level. Much of Griffin has been implemented in a compiler, but a major obstacle to the completion of that enormous task has been the difficulty of fixing on a fully self-consistent language definition.

My own interest in SETL in the 1980s, and my dissatisfaction with the CIMS implementation and the terms under which it was marketed, led me in 1987 to prototype a compiler and run-time interpreter for SETL in SPITBOL [54]. Although this version was a reasonably complete realization of the core SETL language as described by Dewar [53], and appeared to function correctly in the limited tests to which it was put, it was entirely unsatisfactory in terms of speed, and had size limitations on strings, tuples, and sets imposed by their rather direct representation as SPITBOL strings, arrays, and tables respectively.

So it was that in late 1988, newly deprived of SPITBOL in the transition from a mainframe running MTS (the Michigan Terminal System [188]) to a workstation running Unix, I found myself encouraged by the emergence of ISETL but unable to adapt it easily to non-interactive data processing purposes. I thus began a part-time effort to implement a compiler and run-time system for SETL in C. This version was complete enough for daily use by mid-1990, and, though it is now rather like the car in which almost every part has been replaced, it is still my main vehicle for running SETL programs. It is also the one which supports the language extensions and code samples presented in this dissertation. Although I set much greater store by robustness and correctness than by speed in this implementation, its efficiency has almost always been satisfactory. In the rare instances where this has not been the case, it has been easy to make the SETL program interoperate with programs or subroutines written in lower-level languages, and in any case those situations usually seem to occur when there is a library of C functions already involved. Elementary computation-intensive image processing operations are a representative example.

Shortly before arriving at New York University in 1990, I became aware of Snyder's SETL2, which had just been unveiled that spring. In the fall of that year, there was some talk of a unified language definition and implementation, but this was prevented by the fact that Snyder did not want to permit access to his system's source code, and by the fact that we both considered SETL and SETL2 to be works in progress. There were also some differences between the two languages, and they were very differently packaged, reflecting their rather different goals. Since there was starting to emerge a body of SETL2 code, including a revision of Ada/Ed [21], I elected to extend the SETL grammar with as much of SETL2's generally simpler syntax as it could accommodate. It quickly became apparent that essentially nothing needed to be left out. The worst collision was that some ambiguities in the syntax of loop headers were a little awkward to resolve, but this only led to some minor patchwork in the compiler and scarcely incommoded the SETL programmer. The resulting augmented SETL is a less than ideal splice from the language design point of view, but perfectly comfortable from the standpoint of writing code except where it gratuitously requires the programmer to make occasional choices between almost equivalent alternatives. This slight surfeit of form does not appear to have any negative impact on the readability of SETL programs, however, and in fact if the programmer always chooses the simplest expression available, it arguably even confers a minor advantage. For example, most loops can begin with the comparatively simple SETL2 headers, but there are times when the greater generality of the full SETL loop construct is preferable, particularly when the exit test does not fall naturally at the very beginning or end of the loop.

Throughout most of the 1990s, I have continued to work on and with SETL. From 1991 on, I have deliberately been rather conservative about extending the syntax of a language I already consider to be among the world's finest, but have not been so hesitant about its run-time system. My goals are highly pragmatic, as I am a day-to-day user of SETL as well as a zealous apologist.

Set-theoretic languages have a small though active following, particularly in the logic and functional programming communities. For example, SPARCL is billed as ``a visual logic programming language based on sets'' [192], and is essentially a graphical shell for a version of Prolog [40] augmented with sets, where constraints on the partitioning of the sets give the language much of its expressive power. Escher [142], on the other hand, descends from the functional language Haskell [197], and extends the very general and useful mechanism of list pattern matching on function signatures to accommodate sets, which are themselves identified with predicates. Evaluation in Escher is based on pure rewriting, and although it has no unification built in to its computational model, the pattern matching and lazy evaluation that are hallmarks of Haskell, together with an added ability to reduce expressions containing variables, combine to support logic programming in Escher without sacrificing the advantages of the functional style. Two Web sites [170,172] are devoted to various aspects of programming with sets, and twelve papers were presented at a recent workshop in Paris on declarative programming with sets [171]. Declaring goals in preference to specifying operational steps is again a subject that is close to the heart of logic programming.


next up previous
Next: 1.3 Summary Up: 1. Introduction Previous: 1.1 Why SETL?
David Bacon
1999-12-10