The main topic of this book is conformal prediction, a method of prediction recently developed in machine learning. Conformal predictors are among the most accurate methods of machine learning, and unlike other state-of-the-art methods, they provide information about their own accuracy and reliability.
The book integrates mathematical theory and revealing experimental work. It demonstrates mathematically the validity of the reliability claimed by conformal predictors when they are applied to independent and identically distributed data, and it confirms experimentally that the accuracy is sufficient for many practical problems. Later chapters generalize these results to models called repetitive structures, which originate in the algorithmic theory of randomness and statistical physics. The approach is flexible enough to incorporate most existing methods of machine learning, including newer methods such as boosting and support vector machines and older methods such as nearest neighbors and the bootstrap.
Topics and Features:
Researchers in computer science, statistics, and artificial intelligence will find the book an authoritative and rigorous treatment of some of the most promising new developments in machine learning. Practitioners and students in all areas of research that use quantitative prediction or machine learning will learn about important new methods.
The book may be purchased directly from Springer and from many on-line booksellers, including amazon.com.
Vladimir Vovk and Alex Gammerman are Professors of Computer Science at Royal Holloway, University of London. Glenn Shafer is Professor in the Rutgers School of Business - Newark and New Brunswick. All three authors are affiliated with the Computer Learning Research Centre at Royal Holloway, University of London.
The following articles further develop and review the theory presented in the book:
This article, published in the Annals of Statistics, volume 37, pages 1566 - 1590 (2009), is written for statisticians. We consider the on-line predictive version of the standard statistical problem of linear regression; the goal is to predict each consecutive response given the corresponding explanatory variables and all the previous observations. We are mainly interested in prediction intervals rather than point predictions. The standard treatment of prediction intervals in linear regression analysis has two drawbacks: (1) the usual prediction intervals guarantee that the probability of error is equal to the nominal significance level epsilon, but this property per se does not imply that the long-run frequency of error is close to epsilon; (2) it is not suitable for prediction of complex systems as it assumes that the number of observations exceeds the number of parameters. We state the book's result showing that in the on-line protocol the frequency of error does equal the nominal significance level, up to statistical fluctuations, and we describe alternative regression models in which informative prediction intervals can be found before the number of observations exceeds the number of parameters. One of these models, which only assumes that the observations are independent and identically distributed, is greatly underused in the statistical theory of regression.
This is a written version of the second Computer Journal lecture, presented at the British Computer Society London Office on 12 June 2006. The lecture was followed by a discussion by some of the leading experts in machine learning, which is also included in the article. Its definitive version is published in the Computer Journal, volume 50, pages 151 - 177 (2007). This is the description of conformal prediction as given in the article's abstract:
Recent advances in machine learning make it possible to design efficient prediction algorithms for data sets with huge numbers of parameters. This article describes a new technique for "hedging" the predictions output by many such algorithms, including support vector machines, kernel ridge regression, kernel nearest neighbours, and by many other state-of-the-art methods. The hedged predictions for the labels of new objects include quantitative measures of their own accuracy and reliability. These measures are provably valid under the assumption of randomness, traditional in machine learning: the objects and their labels are assumed to be generated independently from the same probability distribution. In particular, it becomes possible to control (up to statistical fluctuations) the number of erroneous predictions by selecting a suitable confidence level. Validity being achieved automatically, the remaining goal of hedged prediction is efficiency: taking full account of the new objects' features and other available information to produce as accurate predictions as possible. This can be done successfully using the powerful machinery of modern machine learning.
This tutorial, published in the Journal of Machine Learning Research, volume 9, pages 371 - 421 (2008), presents a self-contained account of the theory of conformal prediction and works through several numerical examples. Its abstract describes the method of conformal prediction as follows:
Conformal prediction uses past experience to determine precise levels of confidence in new predictions. Given an error probability epsilon, together with a method that makes a point prediction of a label y, it produces a set of labels, typically containing the point prediction, that also contains y with probability 1-epsilon. Conformal prediction can be applied to any method for producing point predictions: a nearest-neighbor method, a support-vector machine, ridge regression, etc.
Conformal prediction is designed for an on-line setting in which labels are predicted successively, each one being revealed before the next is predicted. The most novel and valuable feature of conformal prediction is that if the successive examples are sampled independently from the same distribution, then the successive predictions will be right 1-epsilon of the time, even though they are based on an accumulating dataset rather than on independent datasets.
In addition to the model under which successive examples are sampled independently, other on-line compression models can also use conformal prediction. The widely used Gaussian linear model is one of these.
This article appears in the Proceedings of ICML 2012. A standard assumption in machine learning is the exchangeability of data, which is equivalent to assuming that the examples are generated from the same probability distribution independently. This paper is devoted to testing the assumption of exchangeability on-line: the examples arrive one by one, and after receiving each example we would like to have a valid measure of the degree to which the assumption of exchangeability has been falsified. Such measures are provided by exchangeability martingales. We extend known techniques for constructing exchangeability martingales and show that our new method is competitive with the martingales introduced before. Finally we investigate the performance of our testing method on two benchmark datasets, USPS and Statlog Satellite data; for the former, the known techniques give satisfactory results, but for the latter our new more flexible method becomes necessary.
The conference version of this article is published in the Proceedings of ACML 2012 (JMLR: Workshop and Conference Proceedings, volume 25, pages 475 - 490, 2012). The journal version is to appear in Machine Learning (ACML 2012 Special Issue). Conformal predictors are automatically valid in the sense of having coverage probability equal to or exceeding a given confidence level. Inductive conformal predictors are a computationally efficient version of conformal predictors satisfying the same property of validity. However, inductive conformal predictors have been only known to control unconditional coverage probability. This paper explores various versions of conditional validity and various ways to achieve them using inductive conformal predictors and their modifications. These are the data set (the Spambase data set contributed by George Forman to the UCI Machine Learning Repository with column names) and R programs used in the experimental section of the conference version of the paper.
This note introduces the method of cross-conformal prediction, which is a hybrid of the methods of inductive conformal prediction and cross-validation, and studies its validity and predictive efficiency empirically.
This note introduces Venn-Abers predictors, a new class of Venn predictors based on the idea of isotonic regression. As all Venn predictors, Venn-Abers predictors are well calibrated under the exchangeability assumption.
This paper discusses a transductive version of conformal predictors. This version is computationally inefficient for big test sets, but it turns out that apparently crude "Bonferroni predictors" are about as good in their information efficiency and vastly superior in computational efficiency.
For other research on conformal prediction, see the on-line prediction wiki. See also the events page.