+++ /dev/null
--*- mode: text -*-
-
-+----------------------------------------------------------------------+
-| This archive contains a simple implementation of the Conditional |
-| Mutual Information Maximization for feature selection. |
-+----------------------------------------------------------------------+
-| Written by François Fleuret |
-| Contact <francois.fleuret@epfl.ch> for comments & bug reports |
-| Copyright (C) 2004 EPFL |
-+----------------------------------------------------------------------+
-
-$Id: README,v 1.3 2007-08-23 08:36:50 fleuret Exp $
-
-0/ INTRODUCTION
-
- The CMIM feature selection scheme is designed to select a small
- number of binary features among a very large set in a context of two
- class classification. It consists in picking features one after
- another to maximize the conditional mutual information between the
- selected feature and the class to predict given any one of the
- features already picked. Such a criterion picks features which are
- both individually informative yet pairwise weakly dependent. CMIM
- stands for Conditional Mutual Information Maximization. See
-
- Fast Binary Feature Selection with Conditional Mutual Information
- Francois Fleuret
- JMLR 5 (Nov): 1531--1555, 2004
- http://www.jmlr.org/papers/volume5/fleuret04a/fleuret04a.pdf
-
-1/ INSTALLATION
-
- To compile and test, just type 'make test'
-
- This small test consists of generating a sample set for a toy
- problem and testing CMIM, MIM and a random feature selection with
- the naive Bayesian learner. The two populations of the toy problem
- live in the [0, 1]^2 square. The positive population is in x^2+y^2 <
- 1/4 and the negative population is everything else. Look at
- create_samples.cc for more details. The features are responses of
- linear classifiers generated at random.
-
-2/ DATA FILE FORMAT
-
- Each data file, either for training or testing, starts with the
- number of samples and the number of features. Then follow for every
- single sample two lines, one with the value of the features (0/1)
- and one with the value of the class to predict (0/1). Check the
- train.dat and test.dat generated by create_samples to get an
- example.
-
- The test file has the same format, and the real class is used to
- estimate the error rates. During test, the response of the naive
- bayse before thresholding is saved in a result file (3rd parametre
- of the --test option)
-
-3/ OPTIONS
-
- --silent
-
- Switch off all the outputs to stdout
-
- --feature-selection <random|mim|cmim>
-
- Selects the feature selection method
-
- --classifier <bayesian|perceptron>
-
- Selects the classifier type
-
- --error <standard|ber>
-
- Choses which error to minimize during bias estimation for the CMIM
- + naive Bayesian.
-
- standard = P(f(X) = 0, Y = 1) + P(f(X) = 1, Y = 0)
-
- ber = (P(f(X) = 0 | Y = 1) + P(f(X) = 1 | Y = 0))/2
-
- --nb-features <int: nb of features>
-
- Selects the number of selected features
-
- --cross-validation <file: data set> <int: nb test samples> <int: nb loops>
-
- Do cross-validation
-
- --train <file: data set> <file: classifier>
-
- Build a classifier and save it on disk
-
- --test <file: classifier> <file: data set> <file: result>
-
- Load a classifier and test it on a dataset
-
-4/ LICENCE
-
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License version 3 as
- published by the Free Software Foundation.
-
- This program is distributed in the hope that it will be useful, but
- WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- General Public License for more details.