Insights

How Claude Got Its Name

The intellectual machinery of artificial intelligence was assembled two decades before the field was named — and the man who measured information lent his name to a machine that now writes.

By Antony Coppellotti, founder and fractional CTO, Gordion Solutions

Artificial intelligence is usually told as a story that begins in 1956, in a meeting room at Dartmouth College. That is where the name was coined and where the field acquired its founding ambition. But the intellectual machinery was already two decades old by then, and it had been assembled by people who were not trying to build minds at all. They were trying to settle a question about the limits of mathematics. The line that runs from that question to the modern field is unbroken, and following it makes the eventual claim of 1956 look less like a leap of imagination and more like the obvious next move.

A question about proof

In 1928 David Hilbert and Wilhelm Ackermann set out a problem that crystallised the formalist programme then dominating the foundations of mathematics. They called it the Entscheidungsproblem, the decision problem. Was there a definite procedure, a mechanical recipe requiring no insight, that could take any statement in first-order logic and decide whether it was provable? Hilbert believed mathematics was, at bottom, a closed and decidable system. The Entscheidungsproblem asked whether that belief could be made precise and then satisfied.

Two results dismantled the optimism behind it. Kurt Gödel’s incompleteness theorems of 1931 showed that any sufficiently powerful formal system contains true statements it cannot prove. Then, in 1936, a 24 year old fellow of King’s College, Cambridge, supplied the decisive answer to the decision problem itself. Alan Turing’s paper, On Computable Numbers, with an Application to the Entscheidungsproblem, proved that no such universal procedure could exist.

What made the paper foundational was not only the negative result but the apparatus Turing invented to reach it. To reason rigorously about what a “definite mechanical procedure” even was, he had to define one. His answer was an abstract device, now called a Turing machine: a head that reads and writes symbols on an unbounded tape, moving left or right according to a finite table of rules and a current internal state. Anything that could be computed by following explicit instructions, he argued, could be computed by such a machine. Alonzo Church reached an equivalent conclusion the same year through his lambda calculus, and the two formulations together became the Church-Turing thesis, the working definition of computation that the field still uses.

Turing then proved something more. A single machine, supplied on its tape with a description of any other machine, could carry out that machine’s computation. This universal machine is the theoretical ancestor of every stored-program computer, the recognition that the program and the data are the same kind of thing and can live in the same memory. It is worth pausing on the strangeness of the route taken here. The general-purpose computer was not designed to compute anything in particular. It fell out of an attempt to prove that some things cannot be computed at all.

This is the pre-war work, and it is purely mathematical. Turing’s wartime cryptanalysis at Bletchley Park and his post-war design for the ACE belong to a later chapter, as does his 1950 paper Computing Machinery and Intelligence, which proposed the imitation game and asked directly whether machines could think. But the conceptual gift was already given by 1936: a precise, mechanical model of reasoning, and the demonstration that one machine could embody them all.

Control, brains, and the steersman

The decade after 1936 turned the abstraction towards biology and engineering. The pivotal move came in 1943, when Warren McCulloch, a neurophysiologist, and Walter Pitts, a largely self-taught logician, published A Logical Calculus of the Ideas Immanent in Nervous Activity. They modelled the neuron as a simple threshold device, firing or not firing, and showed that networks of these idealised neurons could compute logical functions. The brain, on this account, was performing logic, and logic could be realised in circuits. Here the Turing lineage meets neuroscience for the first time: a network of McCulloch-Pitts neurons can, in principle, compute anything a Turing machine can. The two great metaphors of the field, the mind as logic engine and the mind as neural net, were already entangled at birth.

The synthesis acquired a name in 1948, when Norbert Wiener published Cybernetics: Or Control and Communication in the Animal and the Machine. Wiener took the word from the Greek for steersman, and his subject was feedback: the way a system, biological or mechanical, uses information about its own output to correct its behaviour and hold itself stable. A thermostat, a gun-laying servomechanism, a hand reaching for a cup, and an animal maintaining its body temperature were, to Wiener, instances of one principle. Cybernetics proposed that purpose and self-regulation were not mysterious properties of living things but engineering quantities that could be described and reproduced. W. Ross Ashby pursued this directly in Britain, building the Homeostat and arguing in Design for a Brain (1952) that adaptive behaviour could emerge from the dynamics of a sufficiently rich feedback system.

Much of this came together at the Macy Conferences, a series of interdisciplinary meetings held in New York between 1946 and 1953. Wiener, McCulloch, Pitts, John von Neumann, Margaret Mead, and others met to ask whether the same mathematics could describe nervous systems, social systems, and machines. The conferences gave the period its characteristic confidence, the sense that mind and mechanism were converging.

Donald Hebb supplied the other half of the picture in 1949. In The Organization of Behavior he proposed that learning is physical: when one neuron repeatedly helps to fire another, the connection between them strengthens. The slogan, that cells which fire together wire together, became the seed of every later learning rule for artificial networks. Where McCulloch and Pitts had shown what a network could compute, Hebb suggested how a network might change itself.

Measuring information

Running alongside cybernetics, and often confused with it, was a separate and more durable achievement. In 1948 Claude Shannon, working at Bell Labs, published A Mathematical Theory of Communication. Shannon was not interested in meaning. He was interested in the engineering problem of sending messages reliably down a noisy wire, and he solved it by treating information as a measurable quantity. His first name was Claude, and it is the one a machine built three quarters of a century later would come to carry.

His key abstraction was to define information in terms of uncertainty removed. The fundamental unit was the binary digit, the bit, the answer to a single yes-or-no question. He defined the entropy of a source as the average information its messages carry, established the maximum rate at which a given channel can carry information, its capacity, and proved that error-free communication is possible up to that limit and impossible beyond it. The noisy-channel coding theorem remains one of the most consequential results in engineering.

Shannon’s relevance to the coming field runs deeper than the obvious fact that digital computers traffic in bits. He had already shown, in his 1937 master’s thesis, that the operations of Boolean algebra could be carried out by arrangements of electrical relays, welding symbolic logic to physical switching circuits. He turned next to games, publishing in 1950 a scheme for Programming a Computer for Playing Chess that introduced the idea of searching a tree of possible moves and evaluating positions, a framework that organised game-playing AI for the next half century. Information theory gave the field its currency, and Shannon gave it an early demonstration that hard, apparently intellectual problems could be reduced to search and evaluation.

Von Neumann sat across all of these threads. His 1945 report on the EDVAC set out the stored-program architecture that bears his name, the practical realisation of Turing’s universal machine. He worked on self-reproducing automata, and his late, unfinished lectures, published in 1958 as The Computer and the Brain, weighed the computer against the nervous system as two species of information-processing machine.

Naming the field

By the mid-1950s the pieces were on the table. There was a mathematical theory of computation, a model of the neuron, a theory of feedback and control, a theory of information, working stored-program computers, and a first programme for getting a machine to play a serious game. What was missing was a discipline, a community that would treat the simulation of intelligence as a goal in its own right rather than a side effect of cybernetics or a curiosity of computing.

That community was convened in the summer of 1956 at Dartmouth College in New Hampshire. The proposal had been written the previous year by John McCarthy, then at Dartmouth, together with Marvin Minsky, Nathaniel Rochester of IBM, and Claude Shannon. It is worth quoting their central conjecture closely, because it set the agenda for everything that followed: the work would proceed on the basis that every aspect of learning, or any other feature of intelligence, could in principle be described so precisely that a machine could be made to simulate it.

The proposal also gave the field its name. McCarthy chose “artificial intelligence” deliberately, in part to mark out territory distinct from cybernetics and, by some accounts, to avoid working under Wiener’s considerable shadow. The choice was consequential. It tilted the new discipline towards the symbolic, logical tradition descended from Turing and away from the feedback-and-control framing of Wiener, a divergence whose consequences played out for decades.

The Dartmouth meeting is remembered less for what was agreed there, which was little, than for who attended and what one pair of attendees brought with them. Allen Newell and Herbert Simon arrived from RAND and Carnegie with a working program, written with the programmer Cliff Shaw, called the Logic Theorist. It is often described as the first true artificial intelligence program, and for a reader with any taste for formal methods it is the most satisfying object in the whole story. The Logic Theorist proved theorems in propositional logic, working through the early results of Whitehead and Russell’s Principia Mathematica. It did not grind mechanically through every possibility. It searched selectively, using heuristics to pursue promising lines and abandon poor ones, which is to say it reasoned in a recognisably human style. It proved a substantial fraction of the theorems in the relevant chapter, and for one of them it found a proof shorter and more elegant than Russell’s own.

Here the circle closes. The discipline of artificial intelligence announced itself by having a machine do the very thing Hilbert had asked about in 1928, the thing Turing had reasoned about in 1936: produce proofs in a formal system. The Entscheidungsproblem had asked whether a machine could decide provability in general, and the answer was no. But the Logic Theorist showed that a machine could nonetheless find particular proofs, intelligently, and sometimes better than a Fellow of the Royal Society.

The inheritance

The years immediately after Dartmouth filled in the practical foundations. Newell and Simon generalised their approach into the General Problem Solver in 1957. McCarthy designed LISP in 1958, the language that gave symbolic AI its native medium and remained its workhorse for a generation. Arthur Samuel’s checkers program, which improved its own play through experience, prompted him to coin the term “machine learning” around 1959. Frank Rosenblatt built the Perceptron in 1957 and 1958, a trainable network of artificial neurons that carried the McCulloch-Pitts and Hebb lineage forward and represented the connectionist alternative to McCarthy’s logic. The laboratories that would dominate the field for decades, at MIT, Stanford, and Carnegie Mellon, were established in this same window.

The tension visible at the founding never fully resolved, and it is useful to see it clearly because the field keeps re-enacting it. On one side stood the symbolic tradition: intelligence as the manipulation of logical expressions, descended from Hilbert, Turing, Church, and realised in the Logic Theorist and LISP. On the other stood the connectionist tradition: intelligence as the emergent behaviour of networks that adjust themselves, descended from McCulloch, Pitts, Hebb, and Rosenblatt. The two have traded dominance several times. The systems that command attention today sit firmly in the second camp, vast networks trained on data, and yet they are run on the universal machine that fell out of a 1936 paper about the limits of proof, and they are pressed into service proving theorems and writing code, the symbolic tasks the first camp prized.

The lesson of the early history is that artificial intelligence did not begin with a desire to imitate human beings. It began with a precise question about what could be mechanically decided, and the tools built to answer that question turned out to be general enough to build almost anything. The name was settled in 1956. The substance was settled in 1936.

Which leaves the name on this page. The man who taught machines to measure information was Claude Shannon; the assistant now called Claude takes its first name from him. The line from a 1936 question about proof to a machine that writes runs, unbroken, through him.

Gordion Solutions builds AI-native software for organisations whose critical systems have outgrown what they were built to do. Start a conversation.