Pages

A blog about teaching Programming to non-CompSci students by Tim Love (Cambridge University Engineering Department). I do not speak on behalf of the university, the department, or even the IT group I belong to.

Tuesday, 5 January 2021

Automated Python Marking

Each year at our department 300+ new students learn Python using a tutorial (Jupyter notebooks). They need to complete exercises (Jupiter notebooks) using a cloud facility like colaborator. In October 2020 we attempted to mark these exercises using automation. This replaced 75 expensive and tiring person-hours of face-to-face interaction. As an extra challenge we didn't change the exercises (many of which used asserts anyway). The first batch of 10,000+ programs were non-graphical, which helped. This document reports on the results. Marking needs to be part of a wider framework, so help provision is considered here too.

Providing support

In the past, support was an online Forum and a daily face-to-face drop-in help desk, neither of which were used much, (especially given the size of the cohort). Marking was done face-to-face. Covid forced us to reconsider these procedures. We decided to advertise to students a range of methods of getting help -

  • You may well need to re-read the provided notebooks several times. Everything you need to know is there. If you rush too quickly to the Exercises you might be able to muddle through the early ones, but sooner or later you'll get stuck.
  • The "Ask a Question" Forum on the Moodle page is open all the time. Anybody can answer. Posts can lead to a Zoom dialog if you want.
  • The "Support Chatrooms" on the Moodle page are open 2-3pm. One helper per chatroom. Posts can lead to a Zoom dialog if you want.
  • Other tutorials - one reason we teach Python is that there are many online tutorials to suit all tastes. The main Python site has a list for beginners on https://wiki.python.org/moin/BeginnersGuide/Programmers
  • The University provides general and specialist courses - see https://training.cam.ac.uk/ucs/search?type=events&query=python
  • Online help - If the local Forum doesn't suit you, try https://stackoverflow.com/questions/tagged/python-3.x. If you don't know what an error message (e.g. "'NoneType' object is not subscriptable") means, just copy/paste it into a search engine. Read How do I ask a good question?

Virtual support seemed to work ok (help-support costs were 30% of last year's - even so, the helpers during the 2-3pm sessions often had nothing to do for complete sessions). 85 questions were asked on the Forum in 6 weeks during Mich 2020. Students liked sending screendumps rather than code (e.g. this) See online helpdesks for information about how we used Moodle's facilities.

Marking was a trickier issue.

simplemarker

Many automarkers (some commercial) are available - see the list below. Many are language neutral. Most compare the program output with expected output (sometimes demanding an exact match). Some integrate with databases.

Merely comparing program output with expected output restricts the type of questions that can be asked, so we wrote our own simplemarker program. Given a folder of students' replies to a programming exercise, and a list of tests, simplemarker

  • checks to see if the programs ask for user input
  • checks for infinite loops
  • checks to see if the programs pass a set of tests
  • optionally checks for duplicate replies
  • returns lists of files that pass, etc.

Here's the code fragment for marking the solutions to the exercise about writing an "is_odd" function

testisodd1={"FunctionExists": {"FunctionName":"is_odd"}}
testisodd2={"CheckReturnValue": {"FunctionName":"is_odd","Input":3,"Returns":True}}
testisodd3={"CheckReturnValue": {"FunctionName":"is_odd","Input":4,"Returns":False}}
d=marking.simplemarker("testfolder", [testisodd1,testisodd2,testisodd3],label="Mark is_odd",hashcheck=True)
printanswers(d)

and here's the output

*** Mark is_odd
PASS -  ['abc123.py', 'tl136.py']
FAIL -  ['sms67.py']
ASKS FOR USER INPUT -  ['xxx.py']
INFINITELOOP -  ['yyy.py']
HASH -
b5db5badc627e448e07616fcd606064a 2

(the last line means that 2 solutions have the same checksum, which would be suspicious with a large number of programs)

The tests available are

  • Does a named function exist?
  • Does a named function (or the whole file) contain a particular string or regex?
  • Does a named function (or the whole file) have less than (or more than) a specified number of lines?
  • Is the named function recursive?
  • Given the specified input, does a named function return the specified value?
  • Does the file print the specified output?
  • Does a new file which contains a named function and some specified extra lines produce the specified output when run?

The aim wasn't for complete automation in the first year, but to reduce the need for human interaction using the real-life sample set to refine the simplemarker program.

The process - benefits and problems

Students submit a zip file of 6 jupyter notebooks via Moodle (our CMS system). 100 lines of bash/Python code extract the python programs and generated images from these, putting them into an appropriate folder (there's one for each exercise) and reporting on missing programs. 1000 more lines of Python code (still under development) do the marking, producing a "Right" or "Wrong" result for each file, and sometimes diagnosing the problem. At the end there's a list of students/program that have failed.

The process for milestone 2, Mich 2020 was

  • In Moodle, choose the "Download all submissions" "Grading action"
  • Unzip these into an empty folder - I did
    mkdir /tmp/marking
    cd /tmp/marking
    unzip ~/Downloads/*notebooks*7-12*.zip
    mv */*zip .
    rm -rf *_file_
    
  • Create a folder for each exercise (actually, each cell of an exercise). Extract the programs and png files (the png files are the program's graphical output) -
    bash ~/python-ia-marking/extractallmilestone2
    
    In each folder there's a CRSID.py file for each student, and a CRSID.png file if graphics were produced
  • Mark -
    python ~/python-ia-marking/do_the_markingMilestone2.py
    

This outputs lists of CRSIDs of passes and failures for each question. For milestone 2 on an old laptop it takes about 20 minutes.

Benefits

  • More rigorous, consistent checking of straightforward issues.
  • The ability to check for cheating
  • Gathering statistics to identify what students struggle with -
    • Well over 10% of students fail to extract the 3rd column of a matrix using slicing (they extract the wrong column, a row, or list each element). Anything not tested by asserts is likely to have a rather high fail rate.
    • Over 20% of students fail to use recursion when asked to write a recursive factorial function - they iterate (thinking this is recursion) or call math.factorial.
    • For milestone 1 in Mich 2020, 294 corrections were requested from 199 students. Many re-corrections were requested too.

Problems

  • Submission trouble - The instructions were "The single zip file you submit should have your ID, a full-stop, and a "zip" suffix in its name and nothing else (e.g. tl136.zip). It should contain only the 6 notebook files, the files having the suffix ipynb". About 5% of students submit wrongly - multiple files instead of a single file, wrongly named single file, pdf files archived, suffices removed from all the archived files, oriental characters added to the suffices of archived files, zip files archived, etc.
  • Remoteness - When the course was C++ and lab-based (timetabled to run twice a week), it was possible to check that students were keeping up (marking was done in each session). Struggling students could be given extra help, often benefitting from long 1-to-1 discussions. When the course became python and self-taught in 2016, students who self-identified as weak could (and did) regularly visit the daily helpdesks and get individual tuition. Of course, unconfident student could still hide away, but opportunities were there. Covid has increased the difficulties that staff have identifying and contacting the students who most need help.
  • Pedantic automation - I soon realised during the trial run that automation would reject many programs that human markers would accept. When asked to write a function called area, many students wrote Area, triangle_area, etc. When asked to print a value out they only calculated it. Pedantry is part of computing, but it can detract from the learning experience.
  • Expert programmers need more feedback - Having spent time doing the optional sections and finding ingeniously short answers, students want more than a "No news is good news" or "Congratulation! You've passed" response.
  • Poor programmers need more feedback - Telling students that they need to re-do a question is frequently unproductive. As is telling them to read the question more carefully (see the next section).
  • Students finish the exercises without trying to understand the material - This common problem is made worse by the remote learning context. It's not unusual for students to bypass much of the teaching material, starting by looking at the exercises (which are in separate documents from the tutorials). Only if they get stuck will they refer back (very selectively) to the tutorial. Consequently when they are told why their program fails they can't correct it because they don't understand the helper's description of the mistake. They don't know what recursion or dictionaries are (one student asked me if a vertical dictionary was ok). When asked to create a list, they use the common meaning of "list" rather than the Python concept. They don't understand the difference between a function printing the answer and returning the answer (after all, both appear on screen). I tell them to read the teaching material and ask questions about that before attempting the exercises. Face-to-face interaction is needed at this point.

In Mich 2020 milestone 1 I used automation to filter out correct entries, looking at the files flagged as wrong. When I mailed students for corrections I always explained briefly what was wrong. All the students who failed to use recursion when asked to received the same bulk-mailed message. In most other cases I needed to send different messages to subsets of students whose solution failed.

I invited students to mail me or come to the daily helpdesk if they wanted feedback.

In Mich 2020 milestone 2, things were smoother. And I realised that graphical output could be extracted from notebooks.

Example of student errors

Getting students to correct their code isn't a trivial matter. Students are asked to "Write a function that uses list indexing to add two vectors of arbitrary length, and returns the new vector. Include a check that the vector sizes match, and print a warning message if there is a size mismatch." They're given some code to start -

def sum_vector(x, y):
    "Return sum of two vectors"

a = [0, 4.3, -5, 7]
b = [-2, 7, -15, 1]
c = sum_vector(a, b)

Over 20% of students failed this exercise. It (and a question about dictionary manipulation) was the most re-corrected exercise. Several things (many not tested by asserts) can go wrong -

  • They forget entirely to check vector lengths
  • They check vector lengths, but outside of the function (they check the lengths of a and b before calling sum_vector). Explaining why this is bad can take a while
  • They overwrite the given x and y values in the function - e.g. x== [0, 4.3, -5, 7], etc. (perhaps because they have to use variables called a and b when calling the function, but the function wants variables called x and y).
  • They check vector lengths in the function, in the loop they use to sum the vectors
  • They check vector lengths in the function, but only after they've tried to sum the vectors
  • They check vector lengths in the function before they sum the vectors, and they print a message if there's a mismatch, but they don't return, so the program crashes anyway
  • They check vector lengths in the function at the right time, bypassing the maths if there's a mismatch but forgetting to print a message
  • If there's a size mismatch they return a string containing the warning message
  • If there's a size mismatch they return print('Size mismatch') without appreciating what they're returning.

I've had situations where, when told about the first error listed here, they make the second error. When told about the second error, they make another one. And so on! They're blindly jumping through hoops.

Examples of students' inefficiencies

Some long-winded ways of solving problems aren't bugs, but a human, face-to-face marker would fix them on the spot. Here are some examples -

  • String repetition - a question begins by giving them a passage from Shakespeare, asking them to produce a string that repeats it a 100 times. A few students use copy/paste
  • Dice outcomes - a question asks them to repeatedly roll a simulated die and collect results. Instead of using "frequency[outcome]+= 1" several students try something like
      if outcome == 1:
        no1 += 1
      elif outcome == 2:
        no2 += 1
      elif outcome == 3:
        no3 += 1
      elif outcome == 4:
    
    Fortunately it's only a 6-sided die.
  • String comparison - a question asks them to write a __lt__ method. This involved string comparison. Dozens of students instead of using self.surname < other.surname write something like
       if len(self.surname) < len(other.surname):
          for i in range(len(self.surname)):
             if alphabet.index(self.surname.upper()[i]) < alphabet.index(other.surname.upper()[i]):
                return True
             elif alphabet.index(self.surname.upper()[i]) > alphabet.index(other.surname.upper()[i]):
                return False
     
    etc., etc.

I don't think a programming course should let these pass. We could identify at least some of these by checking on the program/function length, but human follow-up is required.

The future

  • automarker changes -
    • Diagnosing common bugs so that students can be told more precisely what's wrong with their code. By anticipating and testing for common bugs (rather in the way that multiple choice options anticipate common mistakes) perhaps better diagnosis is possible. I started doing this for milestone 2 when I realised that students were often summing the 3rd row of a matrix rather than the 3rd column.
    • Diagnosing common style issues.
    • Add mail=yes/no from=emailaddress message="" fields for each test to facilitate automation of mailed feedback
    • Mailing students about all their mistakes in one go, rather than mailing students question by question.
    • Listing students who did the optional questions, so they can be praised and encouraged.
  • Some changes in the questions' wording and greater ruthlessness in rejecting invalid submissions would save a lot of time.
    • When an question asks then to produce and print a list, they're likely to print a sequence of items (i.e. a list rather than a Python list). Maybe we could tell them the name that the list should be given.
    • Some exercises look optional to some of them though they're compulsory. Exercise 06.3 only has an editable cell under the optional part, so they didn't do the first part (or they made a new cell for it). But all the questions have this layout with cells at the end, so I don't know why 06.3 is a problem.
    • Some exercises (e.g 05.2) encourage the use of multiple cells to write the code in. Students used 1-4 of the cells. One cell would be easier for the automarker
    • Maybe more of the open-ended parts of questions could become optional - in 05.1 for example.
  • An interactive system? - students could drop files into a webpage and get instant feedback.
  • Automarking needs to be part of an integrated support environment. Perhaps automarking should be a first phase dealing with the bulk of submissions, then humans should deal with the more problematic cases. Demonstrators could deal with students who did the optional extras, or who wanted more feedback on style, etc.

Available products for marking

  • https://github.com/marovira/marking - (Developed for first year courses at UVic). Any deviation from the expected output (whitespace, spelling, capitalization, new lines, etc) will be flagged as an error
  • http://web-cat.org/ - "It is highly customizable and extensible, and supports virtually any model of program grading, assessment, and feedback generation. Web-CAT is implemented as a web application with a plug-in-style architecture so that it also can serve as a platform for providing additional student support services". Free.
  • https://github.com/autolab - "Web service for managing and auto-grading programming assignments". Deals with course administration too - see https://autolab.github.io/docs/. Tango does the marking. "Tango runs jobs in VMs using a high level Virtual Memory Management System (VMMS) API. Tango currently has support for running jobs in Docker containers (recommended), Tashi VMs, or Amazon EC2." But it seems to be more of a framework for automarking, each course-leader having to produce assessment code
  • https://pypi.org/project/markingpy/ - same philosophy as my attempt, though more sophisticated. It can do timing tests. Alas the link to the docs is broken - https://markingpy.readthedocs.io.
  • https://www.autogradr.com/ - Not free.
  • https://github.com/GatorEducator/gatorgrader - same philosophy as my attempt. Has tests like "Does source code contain the designated number of language-specific comments?", "Does source code contain a required fragment or match a specified regular expression?", "Does a command execute and produce output containing a fragment or matching a regular expression?"
  • https://github.com/apanangadan/autograde-github-classroom - "Currently, grade-assignments.py will get stuck if one of the student repos goes into an infinite loop."
  • https://github.com/cs50/check50 - using YAML files it's easy to check if programs produce the right output for specified input. But I don't think it can do other checks.
  • https://classroom.github.com/assistant - "Track and manage assignments in your dashboard, grade work automatically, and help students when they get stuck"
  • https://github.com/jupyter/nbgrader - an extension for notebooks that uses asserts. I think we use it.
  • https://github.com/kevinwortman/nerfherder - "a collection of Python hacks for grading GitHub Classroom repositories."
  • https://github.com/Submitty/Submitty - "Customizable automated grading with immediate feedback to students. Advanced grading tools: static analysis, JUnit, code coverage, memory debuggers, etc."
  • https://www.codio.com/features/auto-grading - Not free.

Thursday, 20 April 2017

"The Clean Coder" by Robert C. Martin (Pearson Education, 2011)

The full title is "The Clean Coder: a code of conduct for professional programmers". "In the pages of this book I will try to define what it means to be a professional programmer. I will describe the attitudes, disciplines, and actions that I consider to be essentially professional" (p.2)

The book contains many opinions, some repeated. I'll organise the quotes into sections.

Management

  • "It is not your employer's responsibility to train you, or to send you to conferences, or to buy you books" (p.16)
  • "Good managers crave someone who has the guts to say no. It's the only way you can really get anything done" (p.26)
  • "Here are some examples of words and phrases to look for that are telltale signs of noncommitment: ... "We need to get this done" ... "I hope we can meet again some day" ... "Let's finish this thing" (p.48)
  • "when professionals say yes, they use the language of commitment so that there is no doubt about what they've promised" (p.56)
  • "Professional development organisations allocate projects to existing gelled teams, they don't form teams around projects" (p.170)

Working practises

  • "The only way to prove that your software is easy to change is to make easy changes to do. And when you find that the changes aren't as easy as you thought, you refine the design so that the next change is easier" (p.15)
  • "Why do most developers fear to make continuous changes to their code? They are afraid they'll break it! Why are they afraid they'll break it? Because they don't have tests!" (p.15)
  • "When you have a suite of tests that you trust, then you lose all fear of making changes. When you see bad code, you simply clean it up on the spot ... the code base steadily improves instead of the normal rotting that our industry has become used to" (p.81)
  • "Here is a minimal list of the things that every software professional should be conversant with:
    • Design patterns. You ought to be able to describe all 24 patterns in the GOF book and have a working knowledge of many of the patterns in the POSA books.
    • Design principles. You should know the SOLID principles and have a good understanding of the component principles
    • Methods. You should understand XP, Scrum, Lean, Kanban, Waterfall, Structured Analysis, and Structured Design
    • Disciplines. You should practice TTD, Object-Oriented design, Structured Programming, Continuous Integration, and Pair Programming
    • Artifacts: You should know how to use: UML, DFDs, Structured Charts, Petri Nets, State Transition Diagrams and Tables, flow charts, and decision tables
    " (p.18)
  • "If you are tired or distracted, do not code" (p.59)
  • "Nowadays when I feel myself slipping into the Zone, I walk away for a few minutes. I clear my head" (p.62)
  • "Choose disciplines that you feel comfortable following in a crisis. Then follow them all the time. Following these disciplines is the best way to avoid getting into a crisis" (p.153)
  • "programmers do not tend to be collaborators. And yet collaboration is critical to effective programming. Therefore, since for many of us collaboration is not an instinct, we require disciplines that drive us to collaborate" (p.75)
  • "The three laws of TDD [Test Driven Development]
    1. You are not allowed to write any production code until you have first written a failing unit test
    2. You are not allowed to write more of a unit test than is sufficient to fail - and not compiling is failing
    3. You are not allowed to write more production code that is sufficient to pass the currently failing unit test
    " (p.80)
  • "[Model Driven Architecture] assumes that the problem is the code. But code is not the problem. It has never been the problem. The problem is detail" (p.201)

Testing

  • "QA Should Find Nothing" (p.12)
  • "For some reason software developers don't think of debugging time as coding time" (p.69)
  • "Writing these tests is simply the work of specifying the system. Specifying at this level of detail is the only way we, as programmers, can know what 'done' is" (p.105)
  • "Make sure that all your unit tests and acceptance tests are run several times per day in a continuous integration system" (p.110)
  • "It should be QA's role to work with business to create the automated acceptance tests that become the true specification and requirements document for the system" (p.114)
  • "
    • Unit tests - written by the programmers, for programmers ... before the production code is written
    • Component tests - The components of the system encapsulate the business rules, so the tests for those components are the acceptance tests for those business rules ... written by QA and Business with help from development
    • Integration tests - They do not test business rules ... they ... make sure that the components are properly connected and can clearly communicate with each other ... typically written by the system architects ... typically not executed as part of the Continuous Integration suite, because they often have longer runtimes
    • System tests - They are the ultimate integration tests. ... We would expect to see throughput and performance tests in this suite
    • Manual exploratory tests - These tests are not automated, nor are they scripted
    " (p.116)

Misc

  • "FITNESSE is 64,000 lines of code, of which 28,000 are contained in just over 2,200 individual unit tests. These tests cover at least 90% of the production code and take about 90 seconds to run" (p.80)

Monday, 18 January 2016

The psychology of programmers

Psychology is of interest to programmers when designing GUIs, etc., and of use to educationalists when designing programming courses. Here I'm more concerned with identifying the psychology traits that are more common (or useful) in programmers. When Gerald M. Weinberg wrote "The Psychology of Computer Programming" (1971) he said "What traits, then, would give an indication of potential failure in a programmer? We can speak on this subject only from anecdotal material, for there have as yet been no formal studies of this question". Since then more studies have been done, but I've had trouble finding material. Here I'll provisionally assemble information about aptitude tests, anecdotal material and academic findings.

A programmer is often part of a team, so individuals can afford to have a narrow range of skills. A lone programmer may need to have the skills of a team, so the literature about teamwork is relevant too.

Team requirements

If you can employ several people your team might well comprise: Project Manager, Product Manager, Architect, User-interface designer, End-user liaison, Developer, QA/Testers, Toolsmith, Build Coordinator, Risk Officer, End-user documentation specialist (Survival Guide p.105). These don't always get on with each other. If they're all one person, beware! It's unlikely that one person will be strong in all phases of development.

Aptitude tests

The material at Kent University Careers is typical. It says that the non-programming tests deal with "logical reasoning, numerical problem solving, pattern recognition, ability to follow complex procedures and attention to detail".

They suggest some other attributes that are also required by programmers and other computing professionals

  • Time management
  • Creativity
  • Teamwork
  • Determination
  • Clear, concise documentation
  • Ability to quickly learn new skills and update existing ones by teaching yourself.
  • a receptivity to new ideas:
  • reasonably quick coders, although accuracy is more important than speed

Computational Thinking

Many of the above skills have been clumped into the concept of Computational Thinking. According to its advocators, Computational Thinking "involves a set of problem-solving skills and techniques that software engineers use to write programs ... However, computational thinking is applicable to nearly any subject. ... Specific computational thinking techniques include: problem decomposition, pattern recognition, pattern generalization to define abstractions or models, algorithm design, and data analysis and visualization".

Psychology research

According to Psychology of programming: Looking into Programmers' heads by Jorma Sajaniemi (2008) "Psychology of programming (PoP) is an interdisciplinary area that covers research into computer programmers’ cognition; tools and methods for programming related activities; and programming education. ... During the past two decades, two important workshop series have been fully devoted to PoP: the Workshop on Empirical Studies of Programmers (ESP), based primarily in the USA, and the Psychology of Programming Interest Group Workshop (PPIG), having a European character".

The paper lists work that has looked at correlations between programming success and some other property - "field dependence (e.g., Mancy & Reid, 2004), inclination to systematic behavior (e.g., Dehnadi, 2006), or self-efficacy (e.g., Wiedenbeck, LaBelle, & Kain, 2004). Jones and Burnett study spatial ability and find a positive correlation between mental rotation ability and programming success in their paper “Spatial Ability and Learning to Program.”"

In Five Big Personality Traits of a Programmer. Do They Matter? it's suggested that "Some traits are more beneficial for the software development: Explorer [high Openness], Focused [high Conscientiousness], Extravert. Other could also nicely compliment each other like Focused Preserver or Open-Minded Challenger. Some traits could be dangerous for the project and team like Nervous Preserver or Agressive Challenger."

According to Renata McGee, "Computer programmers are very detailed thinkers and are able to excel in their positions due to the various traits and skills they possess. Professionals with ISTJ (Introverted, Sensing, Thinking, Judging) personality types have natural skills that are beneficial to this line of work, according to the Myers-Briggs Type Indicators (MBTI) assessment personality test."

According to the more recent "What makes a computer wiz? Linking personality traits and programming aptitude" by Timo Gnambs" in the Journal of Research in Personality, Volume 58, (October 2015) programming ability is positively correlated with introversion, though it's more strongly correlated with "openness to experience". It's not correlated with agreeableness or neuroticism.

Anecdotal evidence

  • Adam Sinicki thinks that
    • "The seasoned coder is someone who looks for shortcut ways to achieve tasks and who is resourceful enough to find unconventional solutions to problems. This requires you to first learn the system or the context you’re working in and then to find exploits within it. Sometimes this is referred to as ‘systems thinking’.
    • We use our working memory to store information, so when you’re imaging what a line of code does, you have to store the variables and the ideas you’re testing out, there. So when you’re thinking of a sequence of events, you need to keep the line of logical reasoning held in your working memory – which is where the similarity to math comes in. This is crucial for ‘abstraction’.
    • the brain areas involved with abstract thought are actually the same as those associated with verbal semantic processing
    • So the coder’s brain is good at abstraction, language and short term memory – at least in theory. It should be creative at problem solving and resourceful. How about attention? Actually, this is something I have very little problem with when programming and I’d say that most coders I’ve met feel the same way. Concentrating on coding isn’t the problem: it’s stopping that’s hard.
  • Paul Graham in "Hackers and Painters"(O'Reilly, 2004) writes that
    • Hacking and painting have a lot in common. In fact, of all the different types of people I've known, hackers and painters are among the most alike (p.18)
    • Because hackers are makers rather than scientists, the right place to look for metaphors is not in the sciences, but among other kinds of makers (p.25)
    • Computer science is a grab bag of tenuously related areas thrown together by an accident of history, like Yugoslavia ... It's as if mathematicians, physicists, and architects all had to be in the same department (p.18)
  • LarryWall in "Programming Perl" (1st edition), O'Reilly and Associates writes
    • "We will encourage you to develop the three great virtues of a programmer: laziness, impatience, and hubris."
  • Rob Walling on Personality traits of the best software developers writes
    • "the more I looked at what makes [phenomenal developers] so good, the more I realized they all share a handful of personality traits ... Pessimistic ... Angered By Sloppy Code ... Long Term Life Planners ... Attention to Detail"
  • Me -
    • I think programmers can go up and down layers of complexity quickly, collapsing and expanding concepts as they go.
    • They don't need 1-1 modelling - i.e. can cope with distance between the target and the representation (Recipes vs meals; Music scores vs performance).
    • I think they're likely to be chess players (Matt Sadler is a particular example).
    • I'd imagine that they tend to be field-independent, and can assess their own work in a more egoless fashion than others often manage.

Selection by psychological traits

In New Scientist it was reported that SAP were selecting autistic people. Their move was sparked by successful results from employing a small group of people with autism in India as software testers. It is now expanding its autistic workforce in Ireland, Germany and the US.

Sources

Tuesday, 15 September 2015

"Hackers and Painters" by Paul Graham (O'Reilly, 2004)

Subtitled "Big ideas from the computer age", the book considers the social and psychological factors that encourage start-up companies, then provides opinions on computing languages. The author's start-up company produced what he considers the first web-based app. It was written mostly in Lisp. He sold out to Yahoo.

The nature of programming

  • When I finished grad school in computer science I went to art school to study painting ... Hacking and painting have a lot in common. In fact, of all the different types of people I've known, hackers and painters are among the most alike (p.18)
  • Because hackers are makers rather than scientists, the right place to look for metaphors is not in the sciences, but among other kinds of makers (p.25)
  • hackers start original, and get good, and scientists start good, and get original (p.26)
  • Computer science is a grab bag of tenuously related areas thrown together by an accident of history, like Yugoslavia ... It's as if mathematicians, physicists, and architects all had to be in the same department (p.18)

Languages

  • A language can be very abstract, but offer the wrong abstracts. I think this happens in Prolog, for example (p.150)
  • Inspired largely by the example of Larry Wall, the designer of Perl, lots of hackers are thinking, why can't I design my own language? ... The result is ... a language whose inner core is not very well designed, but which has enormously powerful libraries of code for solving specific problems (p.153)
  • Cobol, for all its sometime popularity, does not seem to have any intellectual descendants ... I predict a similar fate for Java (p.155)
  • I have a hunch that the main branches of the evolutionary tree pass through the languages that have the smallest, cleanest cores. The more of a language you can write in itself, the better (p.157)
  • Semantically, strings are more or less a subset of lists in which the elements are characters ... Having strings in a language seems to be a case of premature optimization .... Instead ... have just lists, with some way to give the compiler optimization advice that will allow it to lay out strings as contiguous bytes if necessary (p.160)
  • Somehow the idea of reusability got attached to object-oriented programming in the 1980s, and no mount of evidence to the contrary seems to be able to shake it free. But although some object-oriented software is reusable, what makes it reusable is its bottom-upness (p.163)
  • There seem to be a huge number of new programming languages lately. Part of the reason is that faster hardware has allowed programmers to make different tradeoffs between speed and convenience (p.164)
  • The trend is not merely toward languages being developed as open source projects rather than "research," but toward languages being designed by the application programmers who need to use them, rather than by compiler writers (p.166)
  • Lisp was a piece of theory that unexpectedly got turned into a programming language (p.186)
  • Lisp started out powerful, and over the next twenty years got fast. So-called mainstream languages started out fast, and over the next forty years gradually got more powerful, until now the most advanced of them are fairly close to Lisp. Close, but they are still missing a few things (p.186)
  • Perl ... was not only designed for writing throwaway programs, but was pretty much a throwaway program itself (p.206)
  • in practice a good profiler may do more to improve the speed of actual programs written in the language than a compiler that generates fast code (p.209)

Misc

  • This is the Computer Age. It was supposed to be the Space Age, or the Atomic Age (p.ix)
  • research must be original - and as anyone who has written a PhD dissertation knows, the way to be sure you're exploring virgin territory is to stake out a piece of ground that no one wants (p.20)
  • two guys who thought Multics excessively complex went off and wrote their own. They gave it a name that was a joking reference to Multics: Unix (p.52)
  • If you want to keep your money safe, do you keep it under your mattress at home, or put it in a bank? This argument applies to every aspect of server administration: not just security, but uptime, bandwidth, load management, backups, etc. (p.75)
  • The average end user may not need the source code of their word processor, but when you really need reliability, there are solid engineering reasons for insisting on open source (p.149)
  • As technologies improve, each generation can do things that the previous generation would have considered wasteful (p.159)

Monday, 30 March 2015

Using Software Metrics in Automated Assessment

In an attempt to make assessment of code less subjective I've tried using cccc to gather software metrics for a year's worth (about 40 programs) of interdisciplinary projects, hoping that some of the values will correspond to human-assessed Software quality or the general performance. The task is quite tightly constrained, so I was hoping that differences between metrics might be significant, though the task involved electronics and mechanics too, and students could choose whether to use O-O or not.

The cccc program measures simple features like lines of code, lines of comments, etc., but also it measures "information flow between modules" and "decision complexity". It's easy to use. The only problem in practice is that it needs to be given all and only the code used to produce the students' program. Some students' folders were very messy, and their version control was little more than commenting/uncommenting blocks of code, or saving old files with names like "nearlyready.cc". Some teams had main programs that incorporated calibration and testing code whilst others wrote separate programs to perform these tasks. I tried to data-cleanse before running cccc, but the project was too open-ended to make comparisons fair.

I can't see any correlations useful to us, though there's a great variety of code statistics (the code/comment ratio ranging from 1.3 to 59 for example). As a rule of thumb (and unsurprisingly) it seems that teams who use more than 1 source file tend to get decent marks. Those with a few physical source files but only one logical file (e.g. a main.cc with

#include "linefollow.cc"
#include "actuator.cc"

etc.) tended to fare poorly. Here's some sample cccc output along with the human marks.

Feature
Num of modules (NOM) 114143051
Lines of Code (LOC) 5656229173017948498291291
McCabe's Cyclomatic Number (MVG) /NOM8078597331120132212
Lines of Comment (COM) 13222316583130814320
LOC/COM 4.282.7810.42.152.7594.03
MVG/COM 0.60.353.60.390.390.66
Information Flow measure/NOM 00002200
Software Mark (human)5653 7167756662
General Performance (human)6550 8065756060

What CCCC measures

CCCC creates web-page reports. It measures

  • Number of modules NOM
  • Lines of Code LOC
  • McCabe's Cyclomatic Number MVG
  • Lines of Comment COM
  • LOC/COM L_C
  • MVG/COM M_C
  • Information Flow measure ( inclusive ) IF4
  • Information Flow measure ( visible ) IF4v
  • Information Flow measure ( concrete ) IF4c
  • Lines of Code rejected by parser
  • Weighted Methods per Class ( weighting = unity ) WMC1
  • Weighted Methods per Class ( weighting = visible ) WMCv
  • Depth of Inheritance Tree DIT
  • Number of Children NOC (Moderate values of this measure indicate scope for reuse, however high values may indicate an inappropriate abstraction in the design)
  • Coupling between objects CBO (The number of other modules which are coupled to the current module either as a client or a supplier. Excessive coupling indicates weakness of module encapsulation and may inhibit reuse)
  • Information Flow measure ( inclusive )
  • Fan-in FI (The number of other modules which pass information into the current module)
  • Fan-out FO (The number of other modules into which the current module passes information)
  • Information Flow measure IF4 (A composite measure of structural complexity, calculated as the square of the product of the fan-in and fan-out of a single module)

McCabe's Cyclomatic Complexity is a measure of the decision complexity of the functions. Information Flow measure is a measure of information flow between modules

See Also

Wednesday, 24 December 2014

"Geek Sublime" by Vikram Chandra (Faber and Faber, 2014)

The author's written some very respectable novels and stories, but he was also a programmer and a computer consultant - a self-confessed geek. The blurb says "What is the relationship between the two? Is there such a thing as the sublime in code? Can we ascribe beauty to the craft of coding?" but only a small proportion of the book directly deals with that. He begins by pointing out some people's attempts to relate programs and arts

  • According to Graham, the iterative processes of programming - write, debug (discover and remove bugs, which are coding errors, mistakes), rewrite, experiment, debug, rewrite - exactly duplicate the methods of artists (p.2)
  • 'Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do' (Donald Knuth - literate programming)
  • "Of all the different types of people I've known, hackers and painters are amongst the most like" (Paul Graham) (p.2)

He points out that US and Indian computer cultures are different

  • In a 2013 interview, the executive chairman of Google, Eric Schmidt, said, "Forty per cent of the startups in Silicon Valley are headed by India-based entrepreneurs" (p.75)
  • This [Indian] educational process, with its obsessive emphasis on examinations and rankings, produces legions of rote learners, mark grubbers, and cheaters. It causes casualties - 7379 students committed suicide in 2010, at increate of 26 per cent over 2005 (p.77)
  • the proportion of undergraduate computer-science degrees awarded to women in the US has declined from 37 per cent in 1984 to 18 per cent in 2010 ... Meanwhile, in India, the trend has gone in the other direction ... in 2003, 32 percent of the Bachelor of Engineering degrees in computer science and 55 per cent of the Bachelor of Science degrees in computer science were awarded to women (p.80)
  • research in countries as varied as Iran, Hong Kong, Mauritius, Taiwan, and Malaysia has yielded results consistent who those found in studies in India, showing that there is nothing about the field of computing that makes it inherently male. Varma's conclusion is blunt: 'The gender imbalance in the United States seems to be specific to the country; is not a universal phenomenon (p.83)

He quotes some interesting facts about computer languages

  • COBOL .. still processes 90 per cent of the planet's financial transactions, and 75 per cent of all business data (p.128)
  • Malboge ... is so impenetrable that it took two years after the language was first released for the first working program to appear, and that program was not written by a human, but generated by a computerized search program that examined the space of all possible Malboge programs and winnowed out one possibility (p.130)
  • The open-source database SQLite, at the time of this writing, has 1177 times the amount of test code as it does program code (p.155)

He deals with Indian theories of aesthetics

  • It is the very artificiality and conventionality of the aesthetic experience, therefore, that makes the unique experience of rasa possible (p.150)
  • The speech of the poet can be effective even when it doesn't obey the rules of everyday language. According to Abhinavagupta, even the denotative and connotative meanings are only aids to the production of rasa, unessential props which can sometimes be discarded (p.155)
  • Indian movies mix emotions and formal devices in a manner quite foreign to Western filmgoers; Indian tragedies accommodate comedic scenes, and soldiers in gritty war movies can break into song ... This is why the Aristotelian unities of British and American films seemed so alien to me (p.161-2)
  • Mary Douglas writes ... 'ring composition is extremely difficult for Westerners to recognise.' ... ... When I was writing my first book, I had never heard the phrase 'ring composition,' but the method and its specific implications and techniques came readily to hand because - of course - I had seen and heard it everywhere. What I wanted within the nested circles or chakras of my novel was a mutual interaction between various elements in the structure (p.165)
  • Shulman writes that in India, reiterations and ring compositions 'speak to a notion of reality, in varying intensities and degrees of integrity, as resonance, reflection, or modular repetition understood as eruption or manifestation (avirbhava) from a deeper reservoir of existence (p.167)

Then he gets into Indian metaphysics and Tantric practices, the role of woman in Indian culture, Sanskrit and the consequences of the Empire. Finally he returns to the literature/programming issue -

  • programmers ... often seem convinced that they already know everything worth knowing about art ... to make art, you don't have to become an artist - that anyhow, is only a pose - you just analyse how art is produced, you understand its domain, and then you code art (p.209)
  • For my own part, as a fiction writer who has programmed, thinking and feeling as an artist is a state of being utterly unlike that which arises when one is coding (p.210)
  • To compare code to works of literature may point the programmer towards legibility and elegance, but it says nothing about the ability of code to materialize logic ... Most discussions of the beauty of code I have encountered emphasize formal qualities of language - simplicity, elegance, structure, flexibility ... But programs are not just algorithms as concepts or applied ideas; they are algorithms in motion. Code is unique kinetic. It acts and interacts with itself, with the world. In code, the mental and the material are one. Code moves. It changes the world (p.221)

Tuesday, 5 August 2014

"The Art of UNIX Programming"

This book's by Eric S. Raymond, published by Pearson Education. Though it dates from 2004 it's still interesting. It filled some gaps in my knowledge.

Misc

  • "There is evidence [Hatton97] that when one plots defect density versus module size, the curve is U-shaped and concave upwards" (p.86)
  • "second-system effect" - "the urge to add everything that was left out the first time around" (p.29). "third-system effect" - "after the second system has collapsed of its own weight, there is a chance to go back to simplicity and get it really right. The original Unix was a third system" [after CTSS and Multics] (p.29)
  • "Holding down the shift key required actual effort: thus the preference for lower case, and the use of "-" (rather than the perhaps more logical "+") to enable options" (p.242)
  • "marshalling" means "serialising"
  • "A program is transparent when it is possible to form a simple mental model of its behaviour that is actually predicive for all or most cases" (p.133)
  • "Software systems are discoverable when they include features that are designed to help you build in your mind a correct mental model of what they do and how they work. (p.133)
  • "Threading is a performance hack ... they do not reduce global complexity but rather increase it" (p.159)
  • "Despite occasional exceptions such as NFS and the GNOME project, attempts to import CORBA, ASN.1, and other forms of remote-procedure-call interface have largely failed - these technologies have not been naturalized into the Unix culture" (p.178)
  • "Today, RPC and the Unix attachment to text streams are converging in an interesting way, through protocols like XML-RPC and SOAP" (p.179)
  • "Unix was the first production operating system to be ported between differing processor familes" (p.393)

Programs

  • "Emacs stands for Editing MACroS" (p.351)
  • "yacc has a rather ugly interface, through exported global variables with the name prefix yy_. This is because it predates structs in C; in fact, yacc predates C itself; the first implementation was written in C's predecessor B" (p.353)
  • scp calls ssh "as a slave process, intercepting enough information from ssh's standard output to reformat the reports as an ASCII animation of a progress bar" (p.169)
  • "No discussion of make(1) would be complete without an acknowledgement that it includes one of the worst design botches in the history of Unix. The use of tab" . The author said that he "had a user population of about a dozen, most of them friends, and I didn't want to screw up my embedded base. The rest, sadly, is history"

Languages

  • "Outside of Fortran's dwindling niche in scientific and engineering computing, and excluding the vast invisible dark mass of COBOL financial applications at bank and insurance companies, C and its offspring C++ have now (in 2003) dominated applications programming almost completely for more than a decade. It may therefore seem perverse to assert that C and C++ are nowadays almost always the wrong vehicle for beginning new applications development. But it's true; C and C++ optimize for machine efficiency at the expense of increased implementation and (especially) debugging time" (p.323)
  • "C++ is anti-compact - the language's designer has admitted that he doesn't expect any one programmer to ever understand it all" (p.89)
  • "Python is "generally thought to be the least efficient and slowest of the major scripting languages, a price it pays for runtime type polymorphism" (p.337) "it encourages clean, readable code and combines accessibility with scaling up well to large projects" (p.338)

O-O

  • "The OO design concept initially proved valuable in the design of graphics systems, graphical user interfaces, and certain kinds of simulation. To the surprise and gradual disillusionment of many, it has proven difficult to demonstrate the benefits of OO outside those areas" (p.101-2)
  • "Unix programmers have always tended to be a bit more skeptical about OO than their counterparts elsewhere" (p.102)
  • "a lot of programming courses teach thick layering as a way to satisfy the Rule of Representation. In this view, having lots of classes is equated with embedding knowledge in your data. The problem with this is that too often, the 'smart data' in the glue layers is not actually about any natural entity in whatever the program is manipulating - it's just about being glue" (p.102)
  • "One reason that OO has succeeded most where it has (GUIs, simulations, graphics) may be because it's relatively difficult to get the ontology of types wrong in those domains" (p.103)

Quotes by others

  • "This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface" (Doug McIlroy)
  • "La perfection est atteinte non quand il ne reste rein à ajouter, mais quand il ne reste rien à enlever" (Antoine de Saint-Exupéry)
  • "C++: an octopus made by nailing extra legs onto a dog" (anon)
  • "Beauty is more important in computing than anywhere else in technology because software is so complicated. Beauty is the ultimate defense against complexity" (David Gelernter,"Machine Beauty: Elegance and the Heart of Technology")
  • "When in doubt, use brute force" (Ken Thompson)
  • "If you know what you're doing, three layers is enough; if you don't, even seventeen levels won't help" (Padlipsky)