Diatribe On Programming Language Design Features

Introduction

How This Diatribe Came To Be

I am currently on a moderately obsessive quest to develop a generalist's knowledge of programming languages and the way they function. I flatter myself that this is quite rare: most people follow the herd, use what they've always used, or find one novel language with cool features (usually something functional or homoiconic or scripting) and proselytize it for a very long time. Very very few people go out and read the langeage reports for a lot of different languages to get an overview of what features they do or do not share, especially those that are unique to a particular language, and form an opinion on the features themselves, independant of the source language. This is what I've set out to do.

To start with, I'm reading Advanced Programming Language Design (Addison-Wesley, 12/95), which is very much a book along the same sort of lines as this document, although with a different approach and much longer. :) The author has most graciously made this book freely available (for electronic viewing only, not GPL free or anything).

There are two primary reasons for actually bothering to write this down and publish it. One is to focus my thoughts and to build on them later without having to go back and have the same discussions with myself more than once. The other is the vain hope that people will be interested in or incensed by these opinions enough to attempt to explain aspects of them I may have not thought of before.

I should note here that I do not understand the math behind the stuff I'm talking about. I could learn it if I tried, I'm sure, but I'm quite comfortable for the time being to do secondary research.

What This Diatribe Covers

This doc focuses solely on high level languages, for reasons that will be made clear in the next section. I subscribe to the terminology that says that there is only one low level language: assembly (for whatever the processor in question is; by extension certain dataflow languages and LISPs are 'assembly' because there were computers that ran them more-or-less directly as I understand it). As far as I'm concerned, there is, in any useful sense, only one middle-level language: C. The hard-core system programming language wars are over, folks. C won. Get over it.

This doc also does not cover scripting languages at all, because they are too specialized. In general, in fact, it basically doesn't touch aggregate languages. Note that I consider Perl, for example, to be a general language, although it is still rather specialized.

For the most part, in fact, this doc doesn't talk about languages at all, it talks about features common to several languages, although sometimes a feature is rare enough that it is useful to point to a language that has it (typed errors are more-or-less in this category).

Bases For My Opinions

While this is in the rant section of my vanity pages, I have tried hard to evaluate various features from a consistent point of view. Note that I am very much expressing opinions, often very strong ones. I doubt, for example, that everyone will agree with me that pointers in high level languages are the basest, blackest evil imaginable, with the possible exception of regular use of 'goto'.

So, my criteria:

Programmer time is expensive. Very expensive

This one doesn't have much meaning except in the concept of some of the other criteria, but suffice it to say that any feature that saves the programmer time in any way is prima facie a good thing. Making it impossible to introduce a class of bugs is perhaps the most profound example. Many of the standard criteria used to evaluate a language (i.e. simplicity and clarity) follow directly from this.

Disk is stupidly cheap.

This one doesn't much matter in the context of non-aggregate languages, but certainly if someone came up with an effecient, say, travelling salesman solution that required n*100 megs of disk on the number of nodes, I would say that was a great deal. More importantly, it means that the fact that a "Hello, World" executable in some languages is upwards of 10M (I'm not kidding) is really basically irrelevant. It's icky, but not important.

Disk is stupidly slow.

The converse of the above. While disk space is cheap, disk access is probably second only to CDs or, the gods forbid, tape drives in its expensiveness in terms of slowing computation time. How the above travelling salesman solution could possibly be efficient, for example, is beyond me, because of the time it takes to write 500M files. This means that while memory is cheap (see below), any setup that requires you to consume enough memory that you are forced to swap is bad, possible even bad enough to justify extra programmer time depending on the application.

CPU cycles are very cheap.

Seriously, people. CPU cycles are much, much cheaper nowadays than programmer's time. Any feature that saves programmer time, is less efficient, and that doesn't actually change the big-O notation versus a hand-coded version is worth it, in all but the most critical real-time applications. And if you're writing time-critical applications, why the heck are you writing in a high-level language?? Buy a copy of QNX and program in C (which won the middle-level language wars, remember?) or assembly.

Obviously, increasing runtime by a factor of 1000 is going to be a problem, but I would say that doubling the runtime is easily worth saving a programmer from just one non-obvious bug. Remember that most of the real-world time a program takes is spent doing I/O. If your program can actually use up 100% of CPU for its lifetime, and it's taking way too long, and you're using a high-level language, you're doing something wrong. That's a time-critical, CPU-intensive application. Run it through a usage analyzer and code the procedures that are using all that freaking CPU in C.

Note: When I recommend writing something in C, I do NOT mean C++. That's a high-level language, and a bad one.

Memory is cheap.

Unless a program uses so much memory that it's swapping out all the time, the fact that a program is a memory hog is really not much of an issue. It is typical for modern server machines to have more than a gig of main memory. A memory inneficiency that saves the programmer having to track down a non-obvious bug is worth a good 10M of memory in the year 2000, easy. True memory leaks, on the other hand, are pure evil, and any language that is capable of producing them is a pile of steaming toss (and probably has pointers too).

Features I Dislike

I decided to start with these because they are more fun to rant about. :)

Mutable Variables

What It Is

If you need to ask this, you almost certainly shouldn't be reading this doc. Let me point out though, for those of you that haven't experienced functional programming, that variables != identifiers. Naming things is great, being able to alter named things is not.

Why I Like It

Uhhh... In a high level language? Uhhh... Well, they make a lot of semantic sense to people, so I can almost see them being useful in a teaching language, but then people will just get confused when they encounter a programming paradigm that's not so horribly outdated. Nope, I can't really think of any reason to like the things. Well, they make make some algorthims semantically clearer (Finkel's book has some examples at the end of the Functional Programming section), but that can either be fixed by adding language design features, or is too low-level anyways; go write it in C. Writing quicksort so that it updates in place is a good example of this last type.

Why I Don't Like It

Aieee, where to start. They main reason is that they introduce a whole class of bugs that are otherwise impossible. Take this fragment:

int i=0;
bar(foo(i));
baz();

Now, assume that scoping in this language is such that i is accessible to the procedures as other than an argument (this would be likely to be true for most imperative languages). Let's say that foo has a for loop in it, using the very typical identifier i for loop control. Let's say that you forget to declare a locally scoped i. The program will run just fine, but bar will almost certainly be getting a different i as its argument than you expected.

Another majer stupidity is how variables kill concurrency. The same code fragment without i:

bar(foo(0));
baz();

Does the some thing, right? One very key difference: the second form, in a language with no variables, can safely run the first line and the second line in parallel! Without any kind of checking. In the imperative form, baz could use i, since it's in a scope where i is visible, and your parallelism is gone.

Memory leaks, the bane of (in particular) C++ programmers. Languages wih variables often let you initialize variables from the heap yourself. You then get to forget to get rid of them, which is wonderful fun if they're being created in a loop. Alternatively, the language might have garbage collection, which will give the user a nice, visible slowdown every few minutes as the GC cleans up the toss. Feh.

Conclusions

Figuring out how to programatically prallelize imperative languages is a major area of study. WHY? You don't need the things! Explaining why and how variables are truly not necessary is well beyond the scope of this article, but trust me, it's true.

If you need really tight concurrency and really tight algorthims, use a C-based concurrency library. Othewise, there is no plausible reason I am aware of for using languages with variables. They are a pile of steaming toss.

Pointers

What It Is

Wow. If you need to ask this, you are a very, very lucky programmer. Or I have no idea why you're reading this. Pointers are variables (bad start) that hold as a value the number of a memory location. You 'dereference' the pointer to affect whatever value is at the memory location.

Why I Like It

Ha ha! Ha ha ha ha ha! You're funny.

Why I Don't Like It

Pointers make memory leaks much, much easier. Pointer arithmetic compounds the problem. They are semantically obscure. They are usually syntactically obscure. They allow insane things in some languages like treating one record type as a completely different one. They make it much, much, much harder to implement concurrency without having the programmer point out the concurrency explicitely (which is another way to introduce bugs).

Conclusions

Pointers are that which makest whole the glorious goal of Satan's unborn soul. I can't possibly describe how bad they really are.

For the record, they are hella useful in C, for that limited set of things (i.e. operating systems) C is good for. But for a high level language? Get real. Here's quarter, kid, buy yourself a manual for a real language.