Human: A Manifesto for Programming Language for Humans

Abstract

Over the last 50 years, programming languages have become increasingly human. It is far easier to write programmes in the languages of today than it was in FORTRAN. I argue that we are currently at the beginning of a wave of entirely new class of programming languages, in which productivity will be orders of magnitude higher and quality will be orders of magnitude better. These programming languages will be very close to natural languages, and their compilers will be artificially intelligent. I discuss various charaterstics of this yet to be designed language, which I christen in advance: Human.

A lot of what follows is a rosy picture of the future with lots of hand waving. However, I believe there is a kernel of realism in the ideas proposed here.

Programming languages are approaching natural lanauges

Programming languages serve as the bridge between humans and computers. While humans think (mostly) in natural languages, computers understand machine language. Programming languages say to the user: "Hey! we need to cover this distance between your thoughts, which are in natural language and the machine language, which computer understands. Your natural language is too ambiguous and incosistent, so why don't you please use a programming language which is very close to natural language, except that it is unambiguous and consistent. Then the compiler will take care of converting programming language to machine language"

So, the deal with computer systems is: "Humans cover part of the distance: translating from natural language to programming lanauge, while the computers (in the form of compilers) will do the rest: translating from programming language to the machine language.

With the advancement of natural languages, the distance required to be travelled by humans has gradually reduced. Consider assembly programming, where humans cover most of the distance, with compiler needing to do a one to one translation between assembly instructions and machine instructions.

Programming languages, being the bridge to machines, are inspired from them. They retain constructs inspired by machines, and shed these structures only gradually. Let's observe a few of these constructs and what is their status.

Imperativeness

Imperative programming is the paradigm of programming where statements are used to mutate the state of the program.

The fact that a microprocessor executes instructions one after the other, and changes the values of various regisers, makes imperativeness inherent in the computation stack. This is the reason why earliest languages were imperative, imperativeness permeates the art of programming today. The problem is that humans do not think imperatively and forcing them to write imperative code reduces their productivity and code quality.

There have been a few paradigms different from procedural programming, each getting a fair degree of success. Declarative programming languages (like SQL) have achieved success in some domains. There are no popular general purpose declarative programming languages. Functional programming is relatively more widespread, but statelessness is its Achilles heel. With mainstream languages like Python and Java adopting functional features, some of the imperativeness is being replaced by functional programming.

If you can't beat them, join them. That is the mantra of object oriented programming. If state is inevitable, let's promote them from low level atomic values (like integer, floating points and characters) to objects. Style is still procedural, you still execute instructions one after the other mutating the state, but the state stands for something more meaningful to humans.

Goto

Earlier languages had a "goto" instruction, which was inspirted by jump instruction in microprocessor. Goto instructions led to very confusing code because you can jump to arbitrary points in the code, making it hard to reason about the code. While there may be some valid uses of using goto, it has been almost completely eradicated in modern programming practice, and replaced by various other constructs such as structured programming and exceptions.

Typing

Since instruction set cares about types (integer addition is different from floating point addition), initial programming languages were all statically typed. However, since humans do not think in types, many modern languages are dynamically typed. There is a performance penalty for dynamic typing, but for most use cases, faster development trumps the poorer performance.

Function call

Function calls are very interesting. The idea of function call is very natural to humans. We often delegate work to other people. Initially, there was no support for function calls in the microprocessor, they needed to be implemented by the programmar. However, it was such a popular idea that microprocessors started to provide "call" and "ret" instructions to make function calls faster. This is one instance where human style of thinking has had effect all the way till microprocessor.

Thus we see that over the decades programming languages have got increasingly humanized.

Artificial intelligence is advancing

In the past decade or so, artifical intelligence has made major advances. This is primarily driven by two factors. Firstly, we have enormous amount of data available to train AI systems. Secondly, we have enormous amount of compute power available to perform this training. As a result of these two factors, machines can now perform tasks which they could not do earlier.

Natural language processing has advanced enough that we have tools like Siri and Amazon Alexa which can do simple tasks seamlessly.

Programming in natural language would be significantly harder than that. However, it should be possible to create a intelligent, learning compiler which understands a subset of English and translates it to a different programming language.

Efforts in natural language programming

There have been efforts, for a long time, in the area of natural language programming.

One active area of research is "Programming by example"1, where you give the computer sample inputs and outputs and then leave it to generate programs. For example you may provide
InputOutput
Missing page numbers, 19931993
64-67, 19951995
1992 (1-27)1992
The system would learn from inputs and outputs and produce a program that can perform the translation on any input.

Mathematica provides a natural language interface to generate Mathematica programs.2

Edsger Dijkstra, considered natural language programming to be a foolish exercise.3. However, his views are from 80's and 90's, and computation has advanced far beyond those times. His chief concern is that natural languge is imprecise and thus unsuitable for programming. However, given far more computational resources and maturation of deep learning technologies, a compiler should be able to compile natural language programs correctly in most cases. As for the other cases, well, even humans don't write bug free code, and test cases should anyway be written to test the software thoroughly.

A paper from Mihalcea et. al. 4 outlines some challenges involved in building such a system.

A Few Examples of what Human should look like

I recently wrote a program, whose natural description would be the following

"File a.txt contains three comma separated fields: username, first name and last name. File b.txt also contains three comma seperated fields: email, first name and last name. For each email in b.txt, if corresponding first name and last name are present in a.txt, I want to know its username."

The corresponding code which I wrote in Python was this:

import sys

name_2_uids = {}
f = open('a.txt')
for line in f:
    uid, first_name, last_name = line.strip().split(',')
    uid = uid.strip()
    first_name = first_name.strip()
    last_name = last_name.strip()
    name_2_uids.setdefault((first_name, last_name), []).append(uid)
f.close()

f = open('b.txt')
name_2_email = {}
for line in f:
    email, first_name, last_name = line.strip().split(',')
    email = email.strip()
    first_name = first_name.strip()
    last_name = last_name.strip()
    if (first_name, last_name) in name_2_uids:
        print first_name.encode('utf-8') + '\t' + last_name.encode('utf-8') + '\t' + ','.join(name_2_uids[(first_name, last_name)])
f.close()

Here is another example:
"I have a file called user.txt, which contains userId's. From the events table in database 1772e3a9bb92fdc2810a_web of mongodb(hostname: 10.0.0.1, port 27010), find all the rows such that userId is contained in user.txt, eventName is page_viewed and _id is more than ObjectId("595698a80000000000000000")"
And here is the Python program I wrote for it:

def get_user_ids(filename):
    f = open(filename)
    field_ids = []
    for line in f:
        field_ids.append(int(line.strip()))
    f.close()
    return field_ids

if __name__ == '__main__':
    client = MongoClient("10.0.0.1:27010")
    db = client['1772e3a9bb92fdc2810a_web']
    collection = db['events']
    user_ids = get_user_ids(filename)

    query = {'eventName': 'page_viewed', '_id': {'$gt': ObjectId("595698a80000000000000000")}, 'userId': {'$in': user_ids}}
    events = collection.find(query)
    for event in events:
        print event

How should we make Human?

How should we go about making Human? There are two steps:

  1. First we create a basic version of Human which can solve some specific use cases. Perhaps it should be able to do text file manipulation, akin to awk. In this stage, we may use some hard coded rules to translate from natural language to human language, and we can impose some constraints on natural language too. We choose Python as the target language, just because I am most comfortable with this language.
  2. Once some level of maturity is reached, we write a learning compiler: the users of Human correct the incorrectly compiled programs, and the <source code, incorrectly compiled code, corrected compiled code> is uploaded to the server. These samples are used by the compiler to learn and become better.

A prediction

I predict that a Human (there can be multiple versions of Human, by different people) would find a mention in Tiobe Index5 by December 2022.

Related Links

  1. Learning to Learn Programs from Examples: Going beyond Program Structures. https://www.microsoft.com/en-us/research/wp-content/uploads/2017/04/ranking-ijcai17.pdf
  2. Programming with Natural Language Is Actually Going to Work. http://blog.wolfram.com/2010/11/16/programming-with-natural-language-is-actually-going-to-work/
  3. On the foolishness of "natural language programming". https://www.cs.utexas.edu/users/EWD/transcriptions/EWD06xx/EWD667.html
  4. NLP (Natural Lanaguage Processing) for NLP (Natural Language Programming), http://alumni.media.mit.edu/~hugo/publications/papers/CICLING2006-nlp4nlp.pdf
  5. Tiobe Index, https://www.tiobe.com/tiobe-index/

Mention that getting data will be hard.