Say no to TCPA!

Programming Language Comparison

by Jason Voegele

What follows is my personal evaluation and comparison of many popular programming languages. It is intended to provide very high-level information about the respective languages to anyone who is trying to decide which language(s) to learn or to use for a particular project. You can find a similar comparisons from Google

Note: N/A indicates that a topic or feature is not applicable to the language.

  Eiffel Smalltalk Ruby Java C# C++ Python Perl Visual Basic
Object-Orientation Pure Pure Pure Hybrid Hybrid Hybrid / Multi-Paradigm Hybrid Add-On / Hybrid Partial Support
Static / Dynamic Typing Static Dynamic Dynamic Static Static Static Dynamic Dynamic Static
Generic Classes Yes N/A N/A No No Yes N/A N/A No
Inheritance Multiple Single Single class, multiple "mixins" Single class, multiple interfaces Single class, multiple interfaces Multiple Multiple Multiple None
Feature Renaming Yes No Yes No No No No No No
Method Overloading No No No Yes Yes Yes No No No
Operator Overloading Yes Yes? Yes No Yes Yes Yes Yes No
Higher Order Functions Agents (with version 5) Blocks Blocks No No No Lambda Expressions Yes (???) No
Lexical Closures Yes (inline agents) Yes (blocks) Yes (blocks) No No No Yes (since 2.1) Yes No
Garbage Collection Mark and Sweep or Generational Mark and Sweep or Generational Mark and Sweep Mark and Sweep or Generational Mark and Sweep or Generational None Reference Counting Reference Counting Reference Counting
Uniform Access Yes N/A Yes No No No No No Yes
Class Variables / Methods No Yes Yes Yes Yes Yes No No No
Reflection Yes (as of version 5) Yes Yes Yes Yes No Yes Yes? No
Access Control Selective Export Protected Data, Public Methods public, protected, private public, protected, "package", private public, protected, private, internal, protected internal public, protected, private, "friends" Name Mangling None public, private
Design by Contract Yes No Add-on No No No No No No
Multithreading Implementation- Dependent Implementation- Dependent Yes Yes Yes Libraries Yes No No
Regular Expressions No No Built-in Standard Library Standard Library No Standard Library Built-in No
Pointer Arithmetic No No No No Yes Yes No No No
Language Integration C, C++, Java C C, C++, Java C, some C++ All .NET Languages C, Assembler C, C++, Java C, C++ C (via DCOM)
Built-In Security No No? Yes Yes Yes No No? Yes (perlsec) No
Capers Jones Language Level* 15 15 N/A 6 N/A 6 N/A 15 11
* Based on number of source code lines per function point.

Object-Orientation

Many languages claim to be Object-Oriented. While the exact definition of the term is highly variable depending upon who you ask, there are several qualities that most will agree an Object-Oriented language should have:

  1. Encapsulation/Information Hiding
  2. Inheritance
  3. Polymorphism/Dynamic Binding
  4. All pre-defined types are Objects
  5. All operations performed by sending messages to Objects
  6. All user-defined types are Objects

For the purposes of this discussion, a language is considered to be a "pure" Object-Oriented languages if it satisfies all of these qualities. A "hybrid" language may support some of these qualities, but not all. In particular, many languages support the first three qualities, but not the final three.

So how do our languages stack up?

  Eiffel Smalltalk Ruby Java C# C++ Python Perl Visual Basic
Encapsulation / Information Hiding Yes Yes Yes Yes Yes Yes No Yes? Yes?
Inheritance Yes Yes Yes Yes Yes Yes Yes Yes? No
Polymorphism / Dynamic Binding Yes Yes Yes Yes Yes Yes Yes Yes? Yes (through delegation)
All pre-defined types are Objects Yes Yes Yes No No No Yes No No
All operations are messages to Objects Yes Yes Yes No No No No No No
All user-defined types are Objects Yes Yes Yes Yes Yes No Yes No No

Eiffel, Smalltalk, and Ruby are all pure Object-Oriented languages, supporting all six qualities listed above. Java claims to be a pure Object-Oriented language, but by its inclusion of "basic" types that are not objects, it fails to meet our fourth quality. It fails also to meet quality five by implementing basic arithmetic as built-in operators, rather than messages to objects.

C++ is considered to be a multi-paradigm language, of which one paradigm it supports is Object-Orientation. Thus, C++ is not (nor does it contend to be) a pure Object-Oriented language.

Python is often heralded as an Object-Oriented language, but its support for Object-Orientation seems to have been tacked on. Some operations are implemented as methods, while others are implemented as global functions. Also, the need for an explicit "self" parameter for methods is awkward. Some complain about Python's lack of "private" or "hidden" attributes, which goes against the Encapsulation/Information Hiding principle, while others feel that Python's "privateness is by convention" approach offers all of the practical benefits as language-enforced encapsulation without the hassle. The Ruby language, on the other hand, was created in part as a reaction to Python. The designer of Ruby decided that he wanted something "more powerful than Perl, and more Object-Oriented than Python." You can see this comparison of Python and Ruby for more information.

Visual Basic and Perl are both procedural languages that have had some Object-Oriented support added on as the languages have matured.

Static vs. Dynamic Typing

The debate between static and dynamic typing has raged in Object-Oriented circles for many years with no clear conclusion. Proponents of dynamic typing contend that it is more flexible and allows for increased productivity. Those who prefer static typing argue that it enforces safer, more reliable code, and increases efficiency of the resulting product.

It is futile to attempt to settle this debate here except to say that a statically-typed language requires a very well-defined type system in order to remain as flexible as its dynamically-typed counterparts. Without the presence of genericity (templates, to use the C++ patois) and multiple type inheritance (not necessarily the same as multiple implementation inheritance), a static type system may severely inhibit the flexibility of a language. In addition, the presence of "casts" in a language can undermine the ability of the compiler to enforce type constraints.

A dynamic type system doesn't require variables to be declared as a specific type. Any variable can contain any value or object. Smalltalk and Ruby are two pure Object-Oriented languages that use dynamic typing. In many cases this can make the software more flexible and amenable to change. However, care must be taken that variables hold the expected kind of object. Typically, if a variable contains an object of a different type than a user of the object expects, some sort of "message not understood" error is raised at run-time. Users of dynamically-typed languages claim that this type of error is infrequent in practice.

Statically-typed languages require that all variables are declared with a specific type. The compiler will then ensure that at any given time the variable contains only an object compatible with that type. (We say "compatible with that type" rather than "exactly that type" since the inheritance relationship enables subtyping, in which a class that inherits from another class is said to have an IS-A relationship with the class from which it inherits, meaning that instances of the inheriting class can be considered to be of a compatible type with instances of the inherited class.) By enforcing the type constraint on objects contained or referred to by the variable, the compiler can ensure a "message not understood" error can never occur at run-time. On the other hand, a static type system can hinder evolution of software in some circumstances. For example, if a method takes an object as a parameter, changing the type of the object requires changing the signature of the method so that it is compatible with the new type of the object being passed. If this same object is passed to many such methods, all of them must be updated accordingly, which could potentially be an arduous task. One must remember, though, that this ripple effect could occur even a dynamically-typed language. If the type of the object is not what it was originally expected to be, it may not understand the messages being sent to it. Perhaps even worse is that it could understand the message but interpret it in a way not compatible with the semantics of the calling method. A statically-typed language can flag these errors at compilation-time, pointing out the precise locations of potential errors. A user of a dynamically-typed language must rely on extensive testing to ensure that all improper uses of the object are tracked down.

Eiffel is a statically-typed language that manages to remain nearly as flexible as its dynamic counterparts. Eiffel's generic classes and unprecedentedly flexible inheritance model allow it to achieve the safety and reliability of a static type system while still remaining nearly as flexible as a dynamic type system, all without requiring (nor allowing) the use of type casts. C++ also offers generic classes (known as "templates" in the C++ parlance), as well as multiple inheritance. Unfortunately, the presence of type casts and implicit type conversions can sometimes undermine the work of the compiler by allowing type errors to go undetected until run-time. Java is seriously hindered by a lack of generic classes. This is alleviated to a degree by Java's singly-rooted type hierarchy (i.e. every class descends directly or indirectly from from the class Object), but this scheme leaves much to be desired in terms of type-safety. Forthcoming versions of Java will address this shortcoming when generic classes are introduced in Java 1.5 or later. Java also allows type casting, but some rudimentary type checks can be made by the compiler, making casts in Java somewhat safer than in C++ and other languages.

Generic Classes

Generic classes, and more generally parametric type facilities, refer to the ability to parameterize a class with specific data types. A common example is a stack class that is parameterized by the type of elements it contains. This allows the stack to simultaneously be compile-time type safe and yet generic enough to handle any type of elements.

The primary benefit of parameterized types is that it allows statically typed languages to retain their compile-time type safety yet remain nearly as flexible as dynamically typed languages. Eiffel in particular uses generics extensively as a mechanism for type safe generic containers and algorithms. C++ templates are even more flexible, having many uses apart from simple generic containers, but also much more complex.

As already mentioned in the previous section, Java's lack of generic classes is a severe hole in the Java type system. When one considers that most living objects in a program are stored in container classes, and that containers in Java are untyped due to lack of generics, it is questionable whether Java's type system provides any benefit over the more flexible dynamic counterparts. See also this article by Dave Thomas for a discussion of Java's type system in regards to its lack of generics.

Dynamically typed languages do not need parameterized types in order to support generic programming. Types are checked at run-time and thus dynamically typed languages support generic programming inherently.

Inheritance

Inheritance is the ability for a class or object to be defined as an extension or specialization of another class or object. Most object-oriented languages support class-based inheritance, while others such as SELF and JavaScript support object-based inheritance. A few languages, notably Python and Ruby, support both class- and object-based inheritance, in which a class can inherit from another class and individual objects can be extended at run time with the capabilities of other objects. For the remainder of this discussion, we'll be dealing primarily with class-based inheritance since it is by far the most common model.

Although commonly thought of as simple subtyping mechanism, there are actually many different uses of inheritance. In his landmark book Object-Oriented Software Construction, Bertrand Meyer identified and classified as many as 17 different forms of inheritance. Even so, most languages provide only a few syntactic constructs for inheritance which are general enough to allow inheritance to be used in many different ways.

The most important distinction that can be made between various languages' support for inheritance is whether it supports single or multiple inheritance. Multiple inheritance is the ability for a class to inherit from more than one super (or base) class. For example, an application object called PersistentShape might inherit from both GraphicalObject and PersistentObject in order to be used as both a graphical object that can be displayed on the screen as well as a persistent object that can be stored in a database.

Multiple inheritance would appear to be an essential feature for a language to support for cases such as the above when two or more distinct hierarchies must be merged into one application domain. However, there are other issues to consider before making such an assertion.

First, we must consider that multiple inheritance introduces some complications into a programming language supporting it. Issues such as name clashes and ambiguities introduced in the object model must be resolved by the language in order for multiple inheritance and this leads to additional complexity in the language. Eiffel is known for its carefully and thoroughly well-designed support for multiple inheritance, which features feature renaming and fine-grained control over the manner in which multiply-inherited features are selected and applied to the inheriting class. The mechanisms C++ provides for multiple inheritance are more complicated and less flexible leading many people to (mistakenly) believe that multiple inheritance is inherently ill-conceived and complex.

Next, we must distinguish between implementation inheritance and interface/subtype inheritance. Subtype inheritance (also known loosely as interface inheritance) is the most common form of inheritance, in which a subclass is considered to be a subtype of its super class, commonly referred to as an IS-A relationship. What this means is that the language considers an object to conform to the type of its class or any of its super classes. For example, a Circle IS-A Shape, so anywhere a Shape is used in a program, a Circle may be used as well. This conformance notion is only applicable to statically typed languages since it is a feature used by the compiler to determine type correctness.

Implementation inheritance is the ability for a class to inherit part or all of its implementation from another class. For example, a Stack class that is implemented using an array might inherit from an Array class in order to define the Stack in terms of the Array. In this way, the Stack class could use any features from the Array to support its own implementation. With pure implementation inheritance, the fact that the Stack inherits its implementation from Array would not be visible to code using the Stack; the inheritance would be strictly an implementation matter. C++ supports this notion directly with "private inheritance", in which methods from the base class are made private in the derived class. Recent versions of Eiffel also support this form of pure implementation inheritance using what is known as non-conforming inheritance. Most languages, on the other hand, do not support pure implementation inheritance so a class that inherits from another class is always considered to be a subtype of its super class(es).

Returning to the issue of multiple inheritance, we can see that a language's support for multiple inheritance is not a boolean condition; a language can support one or more different forms of multiple inheritance in the same way it can support different forms of single inheritance (e.g. implementation and subtype inheritance). We've already seen that C++ and Eiffel independently support pure implementation inheritance as well as subtype inheritance. Both of these languages also support multiple inheritance in both forms. Java, while it does not support pure implementation inheritance, provides two separate inheritance mechanisms. The extends keyword is used for a combination of implementation and subtype inheritance while the implements keyword is used for pure subtype (interface) inheritance.

Subtype inheritance is less important in dynamic languages since type conformance is not generally an issue, so multiple implementation inheritance is preferred over multiple subtype inheritance (although most languages still consider any class inheriting from another to be a subtype). Smalltalk supports only a single notion of inheritance: single inheritance of both interface and implementation. This means that a class may only inherit from one other class and it inherits both implementation and interface. Python similarly supports one form of inheritance (both implementation and subtype) but allows multiple inheritance and is thus more flexible in this regard than Smalltalk. Ruby lies somewhere in between the two approaches by allowing a class to inherit from only one class but also allowing a class to "mix in" the implementation of an arbitrary number of modules. This model is a slightly restricted version of the model provided by Python, but the restrictions can be overcome by Ruby's ability to support a prototype-based approach using object-based inheritance.

Visual Basic has no support for inheritance of any form, although support for single inheritance is slated for the VB .NET release.

Feature Renaming

Feature renaming is the ability for a class or object to rename one of its features (a term we'll use to collectively refer to attributes and methods) that it inherited from a super class. There are two important ways in which this can be put to use:

As an example of the first use, consider again a stack implemented by inheriting from an array. The array might provide an operation called remove_last to remove the last element of the array. In the stack, this operation is more appropriately named pop.

Eiffel and Ruby both provide support for feature renaming. Ruby provides an alias method that allows you to alias any arbitrary method. Eiffel also provides support for feature renaming, although it is slightly more limited than in Ruby because you can only rename a feature in an inheritance clause.

Method Overloading

Method overloading (also referred to as parametric polymorphism) is the ability for a class, module, or other scope to have two or more methods with the same name. Calls to these methods are disambiguated by the number and/or type of arguments passed to the method at the call site. For example, a class may have multiple print methods, one for each type of thing to be printed. The alternative to overloading in this scenario is to have a different name for each print method, such as print_string and print_integer.

Java and C++ both support method overloading in a similar fashion. Complexities in the mechanism to disambiguate calls to overloaded methods have lead some language designers to avoid overloading in their languages. None of the other languages under consideration support method overloading. Default argument values provide a subset of the behavior for which method overloading is used, and some languages such as Ruby and Python have chosen this route instead.

Operator Overloading

Operator overloading (a hotly debated topic) is the ability for a programmer to define an operator (such as +, or *) for user-defined types. This allows the operator to be used in infix, prefix, or postfix form, rather than the standard functional form. For example, a user-defined Matrix type might provide a * infix operator to perform matrix multiplication with the familiar notation: matrix1 * matrix2 .

Some (correctly) consider operator overloading to be mere syntactic "sugar" rather than an essential feature, while others (also correctly) point to the need for such syntactic sugar in numerical and other applications. Both points are valid, but it is clear that, when used appropriately, operator overloading can lead to much more readable programs. When abused, it can lead to cryptic, obfuscated code. Consider that in the presence of operator overloading, it may not be clear whether a given operator is built in to the language or defined by the user. For any language that supports operator overloading, two things are necessary to alleviate such obfuscation:

  1. All operations must be messages to objects, and thus all operators are always method calls.
  2. Operators must have an equivalent functional form, so that using the operator as a method call will behave precisely the same as using it in infix, prefix, or postfix form.

This second point is subtle. It means that given any operator, it must be possible to invoke that operator in functional form. For example, the following two expressions should be equivalent: 1 + 2 and 1.+(2) . This ensures that no implicit behavior is taking place that may not be immediately obvious from examining the source text.

Of the languages under consideration, Eiffel, Ruby, C++, and Python support operator overloading. Eiffel and Ruby also support the two criteria listed above for safer use of operator overloading. Python supports the "equivalent functional form" criterion, but not the "all operations are messages to objects" criterion. C++ does not support either notion. Eiffel's mechanism is particularly flexible in that users may define arbitrary operators, rather than being limited to redefining a set of predefined operators.

Higher Order Functions & Lexical Closures

Higher order functions are, in the simplest sense, functions that can be treated as if they were data objects. In other words, they can be bound to variables (including the ability to be stored in collections), they can be passed to other functions as parameters, and they can be returned as the result of other functions. Due to this ability, higher order functions may be viewed as a form of deferred execution, wherein a function may be defined in one context, passed to another context, and then later invoked by the second context. This is different from standard functions in that higher order functions represent anonymous lambda functions, so that the invoking context need not know the name of the function being invoked.

Lexical closures (also known as static closures, or simply closures) take this one step further by bundling up the lexical (static) scope surrounding the function with the function itself, so that the function carries its surrounding environment around with it wherever it may be used. This means that the closure can access local variables or parameters, or attributes of the object in which it is defined, and will continue to have access to them even if it is passed to another module outside of its scope.

Among the languages we're considering, Smalltalk and Ruby have supported both higher order functions and lexical closures from the beginning in the form of blocks. A block is an anonymous function that may be treated as any other data object, and is also a lexical closure. Eiffel has recently added support for higher order functions using the "agent" mechanism. The inline variant of Eiffel agents forms a lexical closure. Python, which has long supported higher order functions in the form of lambda expressions, has recently added support for closures using its improved support for nested static scopes.

While neither Java nor C++ support higher order functions directly, both provide mechanisms for mimicking their behavior. Java's anonymous classes allow a function to be bundled with an object that can be treated much as a higher order function can. It can be bound to variables, passed to other functions as an argument, and can be returned as the result of a function. However, the function itself is named and thus cannot be treated in a generic fashion as true higher order functions can. C++ similarly provides partial support for higher order functions using function objects (or "functors"), and add the further benefit that the function call operator may be overloaded so that functors may be treated generically. Neither C++ nor Java, however, provide any support for lexical closures.

Visual Basic provides no support for either higher order functions or lexical closures, nor is there any apparent mechanism for providing similar behavior.

Garbage Collection

Garbage collection is a mechanism allowing a language implementation to free memory of unused objects on behalf of the programmer, thus relieving the burden on the programmer to do so. The alternative is for the programmer to explicitly free any memory that is no longer needed. There are several strategies for garbage collection that exist in various language implementations.

Reference counting is the simplest scheme and involves the language keeping track of how many references there are to a particular object in memory, and deleting that object when that reference count becomes zero. This scheme, although it is simple and deterministic, is not without its drawbacks, the most important being its inability to handle cycles. Cycles occur when two objects reference each other, and thus there reference counts will never become zero even if neither object is referenced by any other part of the program. This is the scheme that is utilized by Python and Visual Basic, although in the case of Python an extra step is taken to ensure that cycles are handled appropriately.

"Mark and sweep" garbage collection is another scheme that overcomes this limitation. A mark and sweep garbage collector works in a two phase process, not surprisingly known as the mark phase and the sweep phase. The mark phase works by first starting at the "root" objects (objects on the stack, global objects, etc.), marking them as live, and recursively marking any objects referenced from them. These marked objects are the set of live objects in program, and any objects that were not marked in this phase are unreferenced and therefore candidates for collection. In the sweep phase, any objects in memory that were not marked as live by the mark phase are deleted from memory. The primary drawback of mark and sweep collection is that it is non-deterministic, meaning that objects are deleted at an unspecified time during the execution of the program. This is the most common form of garbage collection, and the one that is supported by most implementations of Eiffel, Smalltalk, Ruby, and Java.

Generational garbage collection works in a similar fashion to mark and sweep garbage collection, except it capitalizes on the statistical probability that objects that have been alive the longest tend to stay alive longer than objects that were newly created. Thus a generational garbage collector will divide objects into "generations" based upon how long they've been alive. This division can be used to reduce the time spent in the mark and sweep phases because the oldest generation of objects will not need to be collected as frequently. Generational garbage collectors are not as common as the other forms but may be found in some implementations of Eiffel, Smalltalk, Ruby, and Java.

C++ does not provide any sort of garbage collection, the reasons for which are discussed at length in Bjarne Stroustrup's The Design and Evolution of C++. It is possible, however, with some effort to layer reference counting garbage collection onto C++ using smart pointers. In addition there exist garbage collectors that can be integrated into C++ programs, though their use has not caught on to any great degree within the C++ community.

Uniform Access

The Uniform Access Principle, as published in Bertrand Meyer's Object-Oriented Software Construction, states that "All services offered by a module should be available through a uniform notation, which does not betray whether they are implemented through storage or through computation." It is described further with "Although it may at first appear just to address a notational issue, the Uniform Access principle is in fact a design rule which influences many aspects of object-oriented design and the supporting notation. It follows from the Continuity criterion; you may also view it as a special case of Information Hiding."

Say that bar is a feature of a class named Foo. For languages that do not support the Uniform Access Principle, the notation used to access bar differs depending on whether it is an attribute (storage) or a function (computation). For example, in Java you would use foo.bar if it were an attribute, but you would use foo.bar() if it were a function. Having this notational difference means that users of Foo are exposed to unnecessary implementation details and are tightly coupled to Foo. If bar is changed from attribute to method (or vice versa), then any users of Foo must also be changed.

The Uniform Access Principle seeks to eliminate this needless coupling. A language supporting the Uniform Access Principle does not exhibit any notational differences between accessing a feature regardless of whether it is an attribute or a function. Thus, in our earlier example, access to bar would always be in the form of foo.bar, regardless of how bar is implemented. This makes clients of Foo more resilient to change.

Among our languages, only Eiffel and Ruby directly support the Uniform Access Principle, although Smalltalk renders the distinction moot by not allowing any access to attributes from clients.

Class Variables/Methods

Class variables and methods are owned by a class, and not any particular instance of a class. This means that for however many instances of a class exist at any given point in time, only one copy of each class variable/method exists and is shared by every instance of the class.

Smalltalk and Ruby support the most advanced notion of class variables and methods, due to their use of meta-classes and the fact that even classes are objects in these languages. Java and C++ provide "static" members which are effectively the same thing, yet more limited since they cannot be inherited. Python, surprisingly, does not support class methods or variables, but its advanced notion of a module allows workarounds for this limitation. Eiffel also does not provide direct support for class variables or methods, but it does provide similar, but limited, functionality in the form of "once" functions. Once functions are evaluated once only, and subsequent uses use a cached result.

See also this article for an in-depth discussion of the different languages' support for class variables and methods.

Reflection

Reflection is the ability for a program to determine various pieces of information about an object at run-time. This includes the ability to determine the type of the object, its inheritance structure, and the methods it contains, including the number and types of parameters and return types. It might also include the ability for determining the names and types of attributes of the object.

Most object-oriented languages support some form of reflection. Smalltalk, Ruby, and Python in particular have very powerful and flexible reflection mechanisms. Java also supports reflection, but not in as flexible and dynamic fashion as the others. C++ does not support reflection as we've defined it here, but it does supported a limited form of run-time type information that allows a program to determine the type of an object at run-time. Eiffel also has support for a limited form of reflection, although it is much improved in the most recent versions of Eiffel, including the ability to determine the features contained in an object.

Access Control

Access control refers to the ability for a modules implementation to remain hidden behind its public interface. Access control is closely related to the encapsulation/information hiding principle of object-oriented languages. For example, a class Person may have methods such as name and email, that return the person's name and e-mail address respectively. How these methods work is an implementation detail that should not be available to users of the Person class. These methods may, for example, connect to a database to retrieve the values. The database connection code that is used to do this is not relevant to client code and should not be exposed. Language-enforced access control allows us to enforce this.

Most object-oriented languages provide at least two levels of access control: public and protected. Protected features are not available outside of the class in which they are contained, except for subclasses. This is the scheme supported by Smalltalk, in which all methods are public and all attributes are protected. There are no protected methods in Smalltalk, so Smalltalk programmers resort to the convention of placing methods that should be protected into a "private protocol" of the class. See this discussion for the benefits and drawbacks of this approach. Visual Basic also supports these two levels of access control, although since there is no inheritance in Visual Basic, protected features are effectively private to the class in which they are declared.

Some languages, notably Java and C++, provide a third level of access control known as "private". Private features are not available outside of the class in which they are declared, even for subclasses. Note, however, that this means that objects of a particular class can access the private features of other objects of that same class. Ruby also provides these three levels of access control, but they work slightly differently. Private in Ruby means that the feature cannot be accessed through a receiver, meaning that the feature will be available to subclasses, but not other instances of the same class. Java provides a fourth level of, known as "package private" access control which allows other classes in the same package to access such features.

Eiffel provides the most powerful and flexible access control scheme of any of these languages with what is known as "selective export". All features of an Eiffel class are by default public. However, any feature in an Eiffel class may specify an export clause which lists explicitly what other classes may access that feature. The special class NONE may be used to indicate that no other class may access that feature. This includes attributes, but even public attributes are read only so an attribute can never be written to directly in Eiffel. In order to better support the Open-Closed principle, all features of a class are always available to subclasses in Eiffel, so there is no notion of private as there is in Java and C++.

Python, curiously, does not provide any enforced access control. Instead, it provides a mechanism of name mangling: any feature that begins with underscores will have its name mangled by the Python interpreter. Although this does not prevent client code from using such features, it is a clear indicator that the feature is not intended for use outside the class and convention dictates that these features are "use at your own risk".

Design by Contract

Design by Contract is another idea set forth by Bertrand Meyer and discussed at length in Object Oriented Software Construction as well as the Eiffel Home Page. In short, Design by Contract (DBC) is the ability to incorporate important aspects of a specification into the software that is implementing it. The most important features of DBC are:

There is much more to DBC than these simple facilities, including the manner in which pre-conditions, post-conditions, and invariants are inherited in compliance with the Liskov Substitution Principle. However, at least these facilities must be present to support the central notions of DBC.

As Bertrand Meyer, the original pioneer of DBC, is also the creator of Eiffel, it is no surprise that Eiffel has full support for DBC. Eiffel stands as the model for a robust DBC implementation in an object-oriented language. It is to this date still the only language with full support for DBC. Libraries exist, however for several languages, including Ruby and Java, that provide the same basic facilities.

Multithreading

Multithreading is the ability for a single process to process two or more tasks concurrently. (We say concurrently rather than simultaneously because, in the absence of multiple processors, the tasks cannot run simultaneously but rather are interleaved in very small time slices and thus exhibit the appearance and semantics of concurrent execution.) The use of multithreading is becoming increasingly more common as operating system support for threads has become near ubiquitous.

Among the languages under discussion, nearly all support multithreading either directly within the language or through libraries. Ruby is somewhat unique in that its threading capabilities are built in to the interpreter itself, rather than wrappers around the operating system threading operations. This has the disadvantage that any operating system calls will block the entire interpreter, but has the advantage of being completely portable even to systems that do not support multithreading, such as MS-DOS.

Regular Expressions

Regular expressions are pattern matching constructs capable of recognizing the class of languages known as regular languages. They are frequently used for text processing systems as well as for general applications that must use pattern recognition for other purposes. Libraries with regular expression support exist for nearly every language, but ever since the advent of Perl it has become increasingly important for a language to support regular expressions natively. This allows tighter integration with the rest of the language and allows more convenient syntax for use of regular expressions. Perl was the model for this kind of built-in support and Ruby, a close descendant of Perl, continues the tradition. Python, and recently Java, have included regular expression libraries as part of the standard base library distributed with the language implementation.

Pointer Arithmetic

Pointer arithmetic is the ability for a language to directly manipulate memory addresses and their contents. While, due to the inherent unsafety of direct memory manipulation, this ability is not often considered appropriate for high-level languages, it is essential for low-level systems applications. Thus, while object-oriented languages strive to remain at a fairly high level of abstraction, to be suitable for systems programming a language must provide such features or relegate such low-level tasks to a language with which it can interact. Most object-oriented languages have foregone support of pointer arithmetic in favor of providing integration with C. This allows low-level routines to be implemented in C while the majority of the application is written in the higher level language. C++ on the other hand provides direct support for pointer arithmetic, both for compatibility with C and to allow C++ to be used for systems programming without the need to drop down to a lower level language. This is the source both of C++'s great flexibility as well as much of its complexity.

Language Integration

For various reasons, including integration with existing systems, the need to interact with low level modules, or for sheer speed, it is important for a high level language (particularly interpreted languages) to be able to integrate seamlessly with other languages. Nearly every language to come along since C was first introduced provides such integration with C. This allows high level languages to remain free of the low level constructs that make C great for systems programming, but add much complexity.

All of the languages under discussion integrate tightly with C, except for Visual Basic, which can only do so through DCOM. Eiffel and Ruby provide particularly easy-to-use interfaces to C as well as callbacks to the language runtime. Python, Perl, Java, and Smalltalk provide similar interfaces, though they aren't quite as easy to use. C++, naturally, integrates quite transparently with C.

Built-In Security

Built-in security refers to a language implementation's ability to determine whether or not a piece of code comes from a "trusted" source (such as the user's hard disk), limiting the permissions of the code if it does not. For example, Java applets are considered untrusted, and thus they are limited in the actions they can perform when executed from a user's browser. They may not, for example, read or write from or to the user's hard disk, and they may not open a network connection to anywhere but the originating host.

Several languages, including Java, Ruby, and Perl, provide this ability "out of the box". Most languages defer this protection to the user's operating environment.

Capers Jones Language Level

The Capers Jones Language Level is a study that attempts to identify the number of source lines of code is necessary in a given language to implement a single function point. The higher the language level, the fewer lines of code it takes to implement a function point, and thus presumably is an indicator of the productivity levels achievable using the language.

The study (which can be found at http://www.theadvisors.com/langcomparison.htm) is considered flawed by many since not every language was examined in detail. Some languages were assumed to be approximately equal to another language, and so the study at best represents an approximation. However, the study is thorough enough to determine ballpark estimates on the general productivity levels of the languages.

Of the languages we're considering that were included in the study, Smalltalk, Eiffel, and Perl were the highest with a language level of 15. Visual Basic was next highest on the list, at level 11. Java and C++ were the lowest at level 6.

Python and Ruby were not included in the study, though presumably both would be at least level 15, if not higher.


jason@jvoegele.com