Minimum Intelligent Signal Test:
An Alternative Turing Test
To appear in issue #41 of Canadian Artificial Intelligence
Not for Citation
A Minimum Intelligent Signal Test (MIST) is a form of Turing Test designed to detect and provide a measure of intelligence for synthetic systems, while only requiring those systems to respond in a binary fashion. It may be used to provide automated feedback to genetic, neural or other statistical correlation systems operating in the human knowledge domain.
In 1950 Alan Turing proposed a test of machine intelligence now commonly referred to as the Turing Test1. The premise of this test is rooted in the philosophy of other minds problem, which states that thinking can detect itself, but it can only infer the existence of other thought through the actions of other thinkers. In other words, the appearance of thinking, is thinking. Turing predicted that by the year 2000, there would exist machines with the storage capacity of 109 bytes2 that could fool the average questioner into thinking the machine was human for a period of 5 minutes, 70% of the time.
In 1991 Dr. Hugh Loebner, the National Science Foundation, and the Sloan Foundation started the annual Loebner Prize to offer $100,000 US to the creator of the first computer program to pass an unrestricted Turing Test (Epstein, 1992). At the end of 1995, hard disk drives with 109 bytes could be purchased for under $400 almost anywhere. Despite the prize and the availability of storage space, no computer program has yet been able to pass Turing’s test, and the Loebner Prize remains unclaimed.
Minimum Intelligent Signal Test:
Given a series of stimuli (items), a system being tested generates a binary response for each stimulus. Thus a Minimum Intelligent Signal may be detected in the cumulative binary output of that system. A system that has a MIST score that is statistically different from a random system is said to be intelligent. A system that has a MIST score that does not differ statistically from the MIST score of an average human, is said to have the intelligence of an average human.
1) N, items are generated:
All items must be able to be responded to by systems (i.e. people) judged to have human intelligence3 , in a binary fashion. The response must have statistical stability , both when a human tries to give an intelligent response and when a human tries to evade (give a non-intelligent response).
2) Items are presented, and responses recorded:
Items are presented in the highest common mode4 between human and synthetic subjects. On subsequent re-trials, item order is re-randomized5 .
3) Double blind experimenter grades item/response pairs:
For each item judge the item/response pair either consistent or inconsistent with human intelligence. This grading procedure may be easily automated, reducing the chance of grading error or unforeseen bias.
4) Generate Score:
Sum the total of the items judged consistent I (Intelligent), and sum the items judged inconsistent E (Evasive). Probability the system under consideration is intelligent and cooperative is p(I)=I/N. Probability system is intelligent and evasive is p(E)=E/N. Both probabilities must sum to 1.0.
MIST’s yield standardized probability scores. The judgment of whether or not an individual response is intelligent has been made and validated prior to administration of the test, and is defined as being stable for normal human intelligence. Random systems which exhibit no intelligence will have MIST scores of p(I) and p(E) = .5, while intelligent systems (natural and synthetic) will have MIST scores of p(I) or p(E) that are statistically different from a random system.
Using a very large corpus of MIST item/response pairs, self organizing systems may develop human-like intelligence, with no programmer intervention. For example, a system could be developed that allows infinite input configurations and binary output, with some complex connectionist layer in-between. Using simple feedback, this type system should automatically discover the correlation between input and output that we call human intelligence or common sense, that created the corpus. This process is analogous to building a medical CAT scan image; where many independent measurements from many points of view are statistically combined to form an image of the common cause of all the measurements.
Independent of its application as a feedback function for creating intelligent systems, MIST remains a powerful scientific method for describing emerging synthetic systems. The future may see MIST scores correlated with scores from more complex intelligence tests that yield a standardized IQ. This would allow synthetic intelligence to be expressed in the same manner as human intelligence, while only requiring the synthetic system to respond in binary, greatly simplifying development of such synthetic systems.
Currently, I am in the process of organizing an annual MIST contest modeled after the Loebner Prize, as well as creating a central MIST item repository (with a goal of 1 million validated items by January 1, 2000) for educational use. If you are interested in any aspect of the MIST contest or the repository project, please contact me or check http://www.clickable.com/mist.html for current information.
(Turing, 1950) A. M. Turing, "Computing Machinery and Intelligence," Mind, Vol. 54, No.236, October 1950, pp.433-460
(Searle, 1980) J. R. Searle, "Minds, brains and programs." Behavioral and Brain sciences, 3: 417-424
(Epstein, 1992) R. Epstein, "The Quest for the Thinking Computer," AAAI Magazine, Vol. 13, No. 2, Summer 1992, pp. 80-95.
(Shieber, 1992) S. Shieber, "Lesions from a Restricted Turing Test," Technical Report TR-19-92, Harvard University, Sept. 1992, Revision 4.
Chris McKinstry is an Information Technology consultant with LGS Group, Inc., one of Canada’s largest information technology consulting firms. He has over ten years software development experience, primarily in large database environments. Some of his clients have included Litton Systems Canada, Atomic Energy of Canada, Manitoba Public Insurance, and CP Rail. He may be reached via email at email@example.com
1 There have been numerous criticisms of Turing’s test. (Searle, 1980; Shieber, 1992; et al)
2 Turing actually used ‘numbers’ and not the current term, ‘bytes’.
3 In practice each item is defined as being stable for the subgroup of the population which they were validated against (i.e. undergraduate psychology students).
4 The Loebner Prize has recently been modified to require all systems attempting to win the $100,000 prize accept audio/visual input in real-time as would a human. The author feels this is arbitrary and unfair to synthetic systems. Intelligence is not dependent on sight or sound, but rather on intelligent responses to various stimuli. With current systems, text would be considered the highest common mode.
5 This precludes between item dependence; all items must be able to stand on their own.