PhD thesis Regression analysis with imprecise data by Andrea Wiencierz

My PhD thesis deals with the statistical problem of analyzing the
relationship between a response variable and one or more explanatory
variables when these quantities are only imprecisely observed.

Regression methods are some of the most popular and commonly
employed methods of statistical data analysis. Like most statistical
tools, regression methods are usually based on the assumption that
the analyzed data are precise and correct observations of the
variables of interest. In statistical practice, however, often only
incomplete or uncertain information about the data values is
available. In many cases, the incomplete or uncertain information
about the precise values of interest can be expressed by subsets of
the observation space. For example, interval-censored and rounded
data can be represented by intervals. As the representation by
subsets allows considering many different forms of uncertainty about
data within the same framework, this representation is adopted in my
thesis and set-valued observations of real-valued variables are
simply called imprecise data.

The aim of my PhD research was to find a regression method that provides reliable insights about the analyzed relationship, even if the variables are only imprecisely observed.

After a review of different approaches proposed in the literature, in my thesis, I present the likelihood-based approach to regression with imprecisely observed variables that we developed in Cattaneo and Wiencierz (2012) and that is named Likelihood-based Imprecise Regression (LIR). In the LIR framework, the regression problem is formalized as a decision problem whose actions are the possible regression functions, whose states are the considered probability distributions, and whose loss function is usually a characteristic of the residuals’ distribution. The LIR methodology consists in
determining likelihood-based confidence regions for the loss of the regression problem on the basis of imprecise data and in regarding the set of all regression functions that are not strictly dominated as the imprecise result of the regression analysis. The confidence
regions consist of the loss values associated with all probability measures that are (to a chosen degree) plausible in the light of the imprecise data, where the relative plausibility of the considered probability measures is measured by the likelihood function induced by the observations. Given the imprecise decision criteria, the Interval Dominance principle is applied to identify the set-valued result. Hence, a LIR analysis usually yields an imprecise result, which can be interpreted as a confidence set for the unknown regression function.

From the general LIR methodology, a robust regression method was derived in Cattaneo and Wiencierz (2012), where quantiles of the residuals’ distribution are considered as loss. Furthermore, an exact algorithm to implement this regression method for the special case of simple linear regression with interval data was developed in Cattaneo and Wiencierz (2013) and implemented in an R package (Wiencierz, 2012). In my thesis, I present and discuss these in detail. Moreover, I study several statistical properties of the robust LIR method. It turns out that this LIR method is robust in terms of a high breakdown point and that it yields highly reliable results in the sense that the coverage probability of the resulting set of regression functions seems to be generally rather high.

In addition to the robust LIR method, I also investigate an alternative approach that was proposed in Utkin and Coolen (2011). This approach generalizes Support Vector Regression (SVR) to situations where the response variable is imprecisely observed. It
consists in applying a Minimin or a Minimax rule to the imprecise maximum likelihood estimates of the loss values associated with the regression functions, and in both cases yields a precise regression estimate. The set-valued decision criteria here consist of the loss values associated with all marginal probability distributions of the unobserved precise data that are compatible with the empirical distribution of the imprecise data. After discussing this approach in detail, I develop an alternative adaptation of SVR to this
situation by following the LIR approach, which further generalizes these methods. In contrast to the Minimin and Minimax methods, the LIR method for SVR usually yields a set-valued result, which appears to be more appropriate when dealing with imprecise data. Moreover, the LIR framework has the advantage that it allows reflecting the data imprecision and accounting for statistical uncertainty at the same time, since both types of uncertainty are expressed by the extent of the set-valued result of the regression analysis.

Finally, I apply the different regression methods to two practical data sets from the contexts of social sciences and winemaking, respectively. In both cases, the LIR analyses provide very cautious inferences.

References

• M. Cattaneo and A. Wiencierz (2012). Likelihood-based Imprecise Regression. International Journal of Approximate Reasoning 53, 1137-1154.
• M. Cattaneo and A. Wiencierz (2013). On the implementation of LIR: the case of simple linear regression with interval data. Computational Statistics, in press.
• L. V. Utkin and F. P. A. Coolen (2011). Interval-valued Regression and Classification Models in the Framework of Machine Learning. Proceedings of the 7th International Symposium on Imprecise Probability: Theories and Applications. SIPTA. pp. 371-380.
• A. Wiencierz (2012). linLIR: linear Likelihood-based Imprecise Regression. R-package.

Andrea Wiencierz works in the Working Group Methodological Foundations of Statistics and their Applications at the Department of Statistics of the LMU Munich, Germany, where she defended her PhD thesis on December 13, 2013.

Software for credal classification

A classifier is a statistical model of the relationship between the attributes (features) of an object and its category (class). Classifiers are learned from a training set and later are used on the test set to predict the class of a new object given its features.  Credal classifiers extend traditional classifiers allowing for set-valued (or indeterminate) predictions of classes.  The output set is typically larger when the data set is small or it contains many missing values. Credal classifiers aim at producing reliable classifications also in conditions of poor information.

I am aware of only two software suitable for credal classification.

JNCC2

The first is the JNCC2, authored by myself together with M. Zaffalon. It runs from the command line and it is implemented in Java. The code is open source. It reads the data from an ARFF file; this is the open format developed for WEKA (one of the most important open source software for data mining).

The  downloadable zip file contains the source code and the user manual, with worked examples. Once a data set is provided, the software performs cross-validation, comparing Naive Bayes and Naive Credal Classifier. Continuous features are discretized using (Fayyad and Irani, 1993) algorithm.

JNCC2 enables the conservative treatment of missing data. One can declare whether each attribute of the classification problem is subject to a MAR (missing at random) or to a non-MAR missingness process. See here for an introductory discussion of MAR and non-MAR missing data; here for a practical example of non-MAR data; here for a theoretical discussion on how to perform conservative inference in presence of non-MAR missing data.

JNCC2 reports the most traditional indicators of performance for credal classification; for instance it measures the accuracy of Naive Bayes on the instances on which Naive Credal Classifier is determinate or indeterminate. It also computes other typical indicators of performance, such as the percentage of indeterminate classifications, the number of classes returned on average when indeterminate and so on. The results are written to a text file. The software has been published on the JMLR special track on open source software.

Weka-IP

The weka-IP plugin has been developed when preparing the classification chapter of the book Introduction to Imprecise Probabilities. I had the opportunity to spend some time in Granada with J. Abellan , A. Masegosa, S. Moral. Their research group has a large experience in credal classification based on decision trees.

We thus decided to make available a number of credal classifiers under the WEKA interface. Andres  linked the code of  WEKA with that of credal decision trees and JNCC2. He is the maintainer of the package.

Weka-IP contains the following credal classifiers:

• credal decision trees (paper);
• naïve credal classifier (paper);
• lazy naïve credal classifier (paper);
• credal model averaging (paper).

The last two algorithms had been developed by me by extending the JNCC2 code-base, but I had not previously released the code.

The credal classifiers are available in a separate folds of classifiers, which is not present in the standard WEKA interface.

Weka-IP computes also the utility-based metrics for scoring credal classifiers; in this respect it is more up-to-date compared to JNCC2.

I recommend using the Experimenter interface (manual) of Weka-IP. The Experimenter allows comparing via statistical tests the performance of different credal classifiers, based on the results of cross-validation. Moreover, it allows running several credal classifiers on multiple data sets.

Installation and usage details are discussed in the user manual. Some details require some attention. Credal decision tree require removing missing data, which is done by the filter E_FilteredClassifier. Credal classifiers derived from JNCC2 requires the feature to be discrete, which is done by the filter E_Discretize.

Due to interface reasons, it is not possible to deal with non-MAR missing data.

Another advantage of Weka-IP is that one can exploit the powerful Weka functionalities for data pre-processing, such as feature selection.

Conclusions

The Weka-IP allows to easily get in touch with different credal classifiers, to run them from a graphical user interface and to statistically compare their performances. The code might be regarded as preliminary in different respects (see my previous comment on the need for filters), but its interface is generally very easy to use. If you plan to develop your own new algorithm for credal classification, it might be a good idea to try to add it to the Weka-IP package. In this way, you could quickly compare your new algorithm with previously existing ones.

G. Corani is senior researcher at the Imprecise Probability Group of IDSIA(Lugano, Switzerland). He obtained his PhD in Information Technology from the Politecnico di Milano in 2005. His research interests are mainly data mining and applied statistic. Most of his work done with imprecise probability regards credal classification. He is author of about 50 international publications.

Denis D. Mauá’s PhD Thesis on Algorithms and Complexity Results for Discrete Probabilistic Reasoning Tasks

The Three Dots

My PhD thesis is about connecting three hard computational problems that arise in tasks involving graph-based probabilistic reasoning, namely, the problems of

Roughly speaking, in the MAP inference problem we seek the most
probable explanation of a complex phenomena represented as a Bayesian
network, a graph-based description of a multivariate joint probability
distribution where nodes are identified with random variables and local
conditional probability distributions. By extending a Bayesian network
with actions and utilities, we get an influence diagram. The problem of
planning with influence diagrams is to select a set of actions that
maximizes expected utility. A (strong) credal network is a Bayesian
network whose numerical parameters (the local conditional probability
distributions) have been imprecisely specified as convex sets (instead
point estimates). Belief updating in (strong) credal networks is the
task of computing tight upper and lower bounds on the (conditional)
probability of a certain event, and is largely equivalent to the problem
of assessing the sensitivity of a probabilistic inference in a Bayesian
network to global changes in the model parameters. The similarities and
differences among these three problems are more easily captured by the
following simple example in fault analysis.

Probabilistic Fault Analysis

Consider the simple electric circuit below, in which a light bulb is

Suppose we are interested in estimating the probable causes of the
light being off. By inspecting the circuit, we see that such an event
could have been caused by a burned light bulb, a depleted battery, or
simply because there is no power at the light bulb socket (among other
causes that we ignore). The last event could have been caused by an
interruption of the energy flow on the left or right trails; the trail
on the left is interrupted only if both switches 1 and 2 are off. The
right trail is interrupted if any of the switches 3 and 4 is off. Such a
collection of intercausal reasonings can be graphically represented by
the acyclic directed graph below.

If we assign to each node in that graph a conditional probability
distribution of the corresponding event given its parent events (i.e.,
its immediate predecessors in the graph), the result is a fully
specified Bayesian network. If instead, we associate a closed and convex
set of probability distributions to each node and each configuration of
its parents, we obtain a (separately specified) credal network. To get
an influence diagram, we extend the Bayesian network with actions (for
example, possible fixes or information gathering routines) and utilities
(e.g., the costs of replacing the light bulb or the battery, or the loss
caused by having no light), as in the figure below.

Connecting the dots

At first sight, these three problems seem connected only by means of
their combinatorial or optimization nature, and by the use of graphs as
a concise yet intuitive representation of complex phenomena. Nevertheless, correspondences between instances of these problems have long been noticed in the literature. For instance, it was previously known that belief updating in strong credal networks can be reduced to MAP inference in Bayesian networks [Cano, Cano & Moral, 1994] and vice-versa [De Campos & Cozman, 2005]. De Campos & Ji [2008] showed that planning with influence diagrams can be reduced to belief updating in credal networks, while the converse was proved by Antonucci & Zaffalon [2008].

In my thesis, and jointly with Cassio de Campos and Marco Zaffalon,
we proved the last two missing correspondences, namely, that MAP
inference and planning with influence diagrams can be reduced to one
another. We now know that these three problems are strongly tied by
their computational equivalences. These equivalences increase the
algorithmic toolset available to each problem with the algorithms
developed for the other problems, and allow us to derive bounds on the
computational hardness of each problem. Moreover, they provide an
interesting view of each problem instance by the perspective of the
other corresponding problem instances. For example, Cano, Cano &
Moral [1994] reduced belief updating in strong credal networks to MAP
problem instances in order to use available algorithms for the
latter. In a similar fashion, De Campos & Ji [2008] reduced planning
in influence diagrams to belief updating in credal networks so that the
former problem could be solved using algorithms designed for the
latter. Antonucci & Zaffalon [2008] reduced belief updating in
credal networks to planning in influence diagrams in order to provide a
decision-theoretic view of credal networks. De Campos & Cozman
[2005] showed NP-hardness of belief updating in strong credal networks by a reduction from MAP inference. More recently, we were able to show that a
certain class of planning problems can be solved in polynomial time by
reducing them into instances of credal belief updating that are known to
be polynomial-time computable. Using the converse reduction, we were
able to prove the NP-hardness of structurally very simple instances of
credal belief updating. Notably, these correspondences allowed us to
extend De Campos’ proof of polynomial-time approximability of MAP
inference in Bayesian networks of bounded treewidth and variable
cardinality to planning in influence diagrams and belief updating in
strong credal networks. These are the first results concerning the
approximability of these problems. On the more practical side, we were
able to develop a unified algorithmic framework that approximately
solves any of the problems in polynomial time when both treewidth and
variable cardinality are small.

Conclusion

In summary, the main contributions of my work are:

• a state-of-the-art anytime algorithm for MAP inference;
• a state-of-the-art exact algorithm for strategy selection in influence diagrams;
• a proof of NP-hardness of strategy selection in polytree-shaped influence diagrams of bounded treewidth, even in the approximate case;
• an FPTAS for the strategy selection problem on diagrams of bounded
treewidth and bounded cardinality;
• a proof of NP-hardness of strategy selection even in polytree-shaped diagrams with only binary variables, and a result of tractability of the case in which there is a single value node;
• a proof of NP-hardness of approximate belief updating in credal networks of bounded treewidth (and unbounded variable cardinality);
• an FPTAS for belief updating in strong credal networks of bounded treewidth and variable cardinality.

Denis D. Mauá is currently a Post-Doctoral Researcher at the Decision Making Lab of the University of São Paulo. He obtained his PhD in Informatics from the University of Lugano (Università della Svizzera italiana) in Sep., 2013. From Sep., 2009 and Dec., 2013 he was a Research Fellow at IDSIA’s Imprecise Probability Group.

Alessandro Antonucci reports on WPMSIIP’2013

The sixth edition of WPMSIIP, the Workshop on Principles and Methods of Statistical Inference with Interval Probability was held in Lugano (Switzerland) between the first and the second week of September 2013. The workshop was a follow-up to previous editions, held in Durham (2008, 2010), Munich (2009, 2012), and Lublijana (2011).

The 2013 edition was organized by the Imprecise Probability Group of IDSIA. About 25 participants from ten different countries attended the workshop. Since its first edition, WPMSIIP is intended as an open forum for researchers interested in interval (and imprecise, thus of great interest for the SIPTA community) probability. Almost all the participants took actively part to the workshop by presenting in their talks ongoing research topics and/or open challenges. Each talk was followed by an open discussion, with no strict time constraints. Yet, despite some very long and intense discussions, the original program was (of course in an imprecise way!) met.

During the first day, the discussion has been focused on classification and regression. Giorgio Corani chaired the classification part: it clearly emerged that the focus of the research on this topic is slightly moving from traditional credal classification to more general (and challenging) data-mining problems like preference learning and multilabel classification. The regression part was chaired by Andrea Wiencierz and the importance of new regression tools to cope with interval data was one of the main output of the discussion. The second day, chaired by Alessio Benavoli and Marco Cattaneo, covered different topics related to learning. An increasing interest for filtering based on interval/imprecise methods was observed, together with the need of novel learning tools for non-parametric imprecise models. Inference was the topic of the third day. Cassio Polpo de Campos chaired the discussion, which was mostly specialized to imprecise probabilistic graphical models. Two major topics were discussed: credal networks with epistemic irrelevance and the application to logic of imprecise models. Decision making and evaluation problems were discussed on the fourth day, chaired by Denis Mauá. The utility-based approach seems a satisfactory answer to the evaluation of imprecise classifiers. The situation is definitely more open for decision making. Finally, on the last day, open problems ranging from very theoretical to very research applied topics were discussed.

Despite such a packed program, it was possible to find time for a hiking excursion in the beautiful Swiss Alps.

In summary, the WPMSIIPs meetings should be regarded as an important resource for the SIPTA community. The open format of the workshop allows for exhaustive (and sometimes exhausting!) discussions, something which could hardly be reproduced in other forums. Most of the slides of the talks are available in the workshop website. We look forward to seeing you at WPMSIIP 2014!