Pages

A blog about teaching Programming to non-CompSci students by Tim Love (Cambridge University Engineering Department). I do not speak on behalf of the university, the department, or even the IT group I belong to.

Tuesday, 15 September 2015

"Hackers and Painters" by Paul Graham (O'Reilly, 2004)

Subtitled "Big ideas from the computer age", the book considers the social and psychological factors that encourage start-up companies, then provides opinions on computing languages. The author's start-up company produced what he considers the first web-based app. It was written mostly in Lisp. He sold out to Yahoo.

The nature of programming

  • When I finished grad school in computer science I went to art school to study painting ... Hacking and painting have a lot in common. In fact, of all the different types of people I've known, hackers and painters are among the most alike (p.18)
  • Because hackers are makers rather than scientists, the right place to look for metaphors is not in the sciences, but among other kinds of makers (p.25)
  • hackers start original, and get good, and scientists start good, and get original (p.26)
  • Computer science is a grab bag of tenuously related areas thrown together by an accident of history, like Yugoslavia ... It's as if mathematicians, physicists, and architects all had to be in the same department (p.18)

Languages

  • A language can be very abstract, but offer the wrong abstracts. I think this happens in Prolog, for example (p.150)
  • Inspired largely by the example of Larry Wall, the designer of Perl, lots of hackers are thinking, why can't I design my own language? ... The result is ... a language whose inner core is not very well designed, but which has enormously powerful libraries of code for solving specific problems (p.153)
  • Cobol, for all its sometime popularity, does not seem to have any intellectual descendants ... I predict a similar fate for Java (p.155)
  • I have a hunch that the main branches of the evolutionary tree pass through the languages that have the smallest, cleanest cores. The more of a language you can write in itself, the better (p.157)
  • Semantically, strings are more or less a subset of lists in which the elements are characters ... Having strings in a language seems to be a case of premature optimization .... Instead ... have just lists, with some way to give the compiler optimization advice that will allow it to lay out strings as contiguous bytes if necessary (p.160)
  • Somehow the idea of reusability got attached to object-oriented programming in the 1980s, and no mount of evidence to the contrary seems to be able to shake it free. But although some object-oriented software is reusable, what makes it reusable is its bottom-upness (p.163)
  • There seem to be a huge number of new programming languages lately. Part of the reason is that faster hardware has allowed programmers to make different tradeoffs between speed and convenience (p.164)
  • The trend is not merely toward languages being developed as open source projects rather than "research," but toward languages being designed by the application programmers who need to use them, rather than by compiler writers (p.166)
  • Lisp was a piece of theory that unexpectedly got turned into a programming language (p.186)
  • Lisp started out powerful, and over the next twenty years got fast. So-called mainstream languages started out fast, and over the next forty years gradually got more powerful, until now the most advanced of them are fairly close to Lisp. Close, but they are still missing a few things (p.186)
  • Perl ... was not only designed for writing throwaway programs, but was pretty much a throwaway program itself (p.206)
  • in practice a good profiler may do more to improve the speed of actual programs written in the language than a compiler that generates fast code (p.209)

Misc

  • This is the Computer Age. It was supposed to be the Space Age, or the Atomic Age (p.ix)
  • research must be original - and as anyone who has written a PhD dissertation knows, the way to be sure you're exploring virgin territory is to stake out a piece of ground that no one wants (p.20)
  • two guys who thought Multics excessively complex went off and wrote their own. They gave it a name that was a joking reference to Multics: Unix (p.52)
  • If you want to keep your money safe, do you keep it under your mattress at home, or put it in a bank? This argument applies to every aspect of server administration: not just security, but uptime, bandwidth, load management, backups, etc. (p.75)
  • The average end user may not need the source code of their word processor, but when you really need reliability, there are solid engineering reasons for insisting on open source (p.149)
  • As technologies improve, each generation can do things that the previous generation would have considered wasteful (p.159)

Monday, 30 March 2015

Using Software Metrics in Automated Assessment

In an attempt to make assessment of code less subjective I've tried using cccc to gather software metrics for a year's worth (about 40 programs) of interdisciplinary projects, hoping that some of the values will correspond to human-assessed Software quality or the general performance. The task is quite tightly constrained, so I was hoping that differences between metrics might be significant, though the task involved electronics and mechanics too, and students could choose whether to use O-O or not.

The cccc program measures simple features like lines of code, lines of comments, etc., but also it measures "information flow between modules" and "decision complexity". It's easy to use. The only problem in practice is that it needs to be given all and only the code used to produce the students' program. Some students' folders were very messy, and their version control was little more than commenting/uncommenting blocks of code, or saving old files with names like "nearlyready.cc". Some teams had main programs that incorporated calibration and testing code whilst others wrote separate programs to perform these tasks. I tried to data-cleanse before running cccc, but the project was too open-ended to make comparisons fair.

I can't see any correlations useful to us, though there's a great variety of code statistics (the code/comment ratio ranging from 1.3 to 59 for example). As a rule of thumb (and unsurprisingly) it seems that teams who use more than 1 source file tend to get decent marks. Those with a few physical source files but only one logical file (e.g. a main.cc with

#include "linefollow.cc"
#include "actuator.cc"

etc.) tended to fare poorly. Here's some sample cccc output along with the human marks.

Feature
Num of modules (NOM) 114143051
Lines of Code (LOC) 5656229173017948498291291
McCabe's Cyclomatic Number (MVG) /NOM8078597331120132212
Lines of Comment (COM) 13222316583130814320
LOC/COM 4.282.7810.42.152.7594.03
MVG/COM 0.60.353.60.390.390.66
Information Flow measure/NOM 00002200
Software Mark (human)5653 7167756662
General Performance (human)6550 8065756060

What CCCC measures

CCCC creates web-page reports. It measures

  • Number of modules NOM
  • Lines of Code LOC
  • McCabe's Cyclomatic Number MVG
  • Lines of Comment COM
  • LOC/COM L_C
  • MVG/COM M_C
  • Information Flow measure ( inclusive ) IF4
  • Information Flow measure ( visible ) IF4v
  • Information Flow measure ( concrete ) IF4c
  • Lines of Code rejected by parser
  • Weighted Methods per Class ( weighting = unity ) WMC1
  • Weighted Methods per Class ( weighting = visible ) WMCv
  • Depth of Inheritance Tree DIT
  • Number of Children NOC (Moderate values of this measure indicate scope for reuse, however high values may indicate an inappropriate abstraction in the design)
  • Coupling between objects CBO (The number of other modules which are coupled to the current module either as a client or a supplier. Excessive coupling indicates weakness of module encapsulation and may inhibit reuse)
  • Information Flow measure ( inclusive )
  • Fan-in FI (The number of other modules which pass information into the current module)
  • Fan-out FO (The number of other modules into which the current module passes information)
  • Information Flow measure IF4 (A composite measure of structural complexity, calculated as the square of the product of the fan-in and fan-out of a single module)

McCabe's Cyclomatic Complexity is a measure of the decision complexity of the functions. Information Flow measure is a measure of information flow between modules

See Also