The Society for Imprecise Probabilities:
Theories and Applications

Software for credal classification

Posted on February 6, 2014 by Giorgio Corani
[ go back to blog ]

A classifier is a statistical model of the relationship between the attributes (features) of an object and its category (class). Classifiers are learned from a training set and later are used on the test set to predict the class of a new object given its features. Credal classifiers extend traditional classifiers allowing for set-valued (or indeterminate) predictions of classes. The output set is typically larger when the data set is small or it contains many missing values. Credal classifiers aim at producing reliable classifications also in conditions of poor information. I am aware of only two software suitable for credal classification.

JNCC2

The first is the JNCC2, authored by myself together with M. Zaffalon. It runs from the command line and it is implemented in Java. The code is open source. It reads the data from an ARFF file; this is the open format developed for WEKA (one of the most important open source software for data mining). The downloadable zip file contains the source code and the user manual, with worked examples. Once a data set is provided, the software performs cross-validation, comparing Naive Bayes and Naive Credal Classifier. Continuous features are discretized using (Fayyad and Irani, 1993) algorithm. JNCC2 enables the conservative treatment of missing data. One can declare whether each attribute of the classification problem is subject to a MAR (missing at random) or to a non-MAR missingness process. See here for an introductory discussion of MAR and non-MAR missing data; here for a practical example of non-MAR data; here for a theoretical discussion on how to perform conservative inference in presence of non-MAR missing data. JNCC2 reports the most traditional indicators of performance for credal classification; for instance it measures the accuracy of Naive Bayes on the instances on which Naive Credal Classifier is determinate or indeterminate. It also computes other typical indicators of performance, such as the percentage of indeterminate classifications, the number of classes returned on average when indeterminate and so on. The results are written to a text file. The software has been published on the JMLR special track on open source software.

Weka-IP

The weka-IP plugin has been developed when preparing the classification chapter of the book Introduction to Imprecise Probabilities. I had the opportunity to spend some time in Granada with J. Abellan, A. Masegosa and S. Moral. Their research group has a large experience in credal classification based on decision trees. We thus decided to make available a number of credal classifiers under the WEKA interface. Andres linked the code of WEKA with that of credal decision trees and JNCC2. He is the maintainer of the package. Weka-IP contains the following credal classifiers:

Conclusions

The Weka-IP allows to easily get in touch with different credal classifiers, to run them from a graphical user interface and to statistically compare their performances. The code might be regarded as preliminary in different respects (see my previous comment on the need for filters), but its interface is generally very easy to use. If you plan to develop your own new algorithm for credal classification, it might be a good idea to try to add it to the Weka-IP package. In this way, you could quickly compare your new algorithm with previously existing ones.

About the author

G. Corani is senior researcher at the Imprecise Probability Group of IDSIA (Lugano, Switzerland). He obtained his PhD in Information Technology from the Politecnico di Milano in 2005. His research interests are mainly data mining and applied statistic. Most of his work done with imprecise probability regards credal classification. He is author of about 50 international publications.

Note from the editor: the SIPTA software page provides a list of software for imprecise probabilities, including JNCC2 and Weka-IP.