There exist several alternatives to decision trees for data exploration, such as neural networks, nearest neighbor methods and regression analysis. Several researchers have compared trees to these other methods on specific problems.

An early study comparing machine learning methods for learning from
examples can be found in [77]. Comparisons
of symbolic and connectionist methods can be found in
[373,327]. Quinlan empirically
compared decision trees to genetic classifiers [294] and
to neural networks [298]. Thrun * et al.*
[349] compared several learning algorithms on simulated
Monk's problems. Palvia and Gordon [278] compared
decision tables, decision trees and decision rules, to determine which
formalism is best for decision analysis.

Multilayer perceptrons and CART (with and without linear combinations)
[29] are compared in [9] to find
that there is not much difference in accuracy. Similar conclusions
were reached in [103] when ID3 [292]
and backpropagation were compared. Talmon * et al.*
[345] compared classification trees and neural
networks for analyzing electrocardiograms (ECG) and concluded that no
technique is superior to the other. In contrast, ID3 is adjudged to
be slightly better than connectionist and Bayesian methods in
[340]. Brown * et al.* [33] compared
backpropagation neural networks with decision trees on three problems
that are known to be multimodal. Their analysis indicated that there
was not much difference between both methods, and that neither method
performed very well in its ``vanilla'' state. The performance of
decision trees improved in [33] when multivariate
splits were used, and backpropagation networks did better with feature
selection.

Giplin * et al.* [123] compared stepwise linear
discriminant analysis, stepwise logistic regression and CART
[29] to three senior cardiologists, for predicting
the problem of predicting whether a patient would die within a year of
being discharged after an acute myocardial infarction. Their results
showed that there was no difference between the physicians and the
computers, in terms of the prediction accuracy. Kors and Van Bemmel
[191] compared statistical multivariate methods
with heuristic decision tree methods, in the domain of
electrocardiogram (ECG) analysis. Their comparisons show that decision
tree classifiers are more comprehensible and flexible to incorporate
or change existing categories. Pizzi and Jackson
[288] compare an expert systems developed using
traditional knowledge engineering methods to Quinlan's ID3
[292] in the domain of tonsillectormy. Comparisons of CART
to multiple linear regression and discriminant analysis can be found
in [43] where it is argued that CART is more
suitable than the other methods for very noisy domains with lots of
missing values.

Comparisons between decision trees and statistical methods like linear
discriminant function analysis and automatic interaction detection
(AID) are given in
[232], where it is argued that machine learning
methods sometimes outperform the statistical methods and so should not
be ignored. Feng * et al.* [99] present a
comparison of several machine learning methods (including decision
trees, neural networks and statistical classifiers) as a part of the
European Statlog
project. Their main conclusions are that (1) no method seems uniformly
superior to others, (2) machine learning methods seem to be superior
for multimodal distributions, and (3) statistical methods are
computationally the most efficient.

Long * et al.* [217] compared Quinlan's C4
[297] to logistic regression on the problem of diagnosing
acute cardiac ischemia, and concluded that both methods came fairly
close to the expertise of the physicians. In their experiments,
logistic regression outperformed C4. Curram and Mingers
[67] compare decision trees, neural networks and
discriminant analysis on several real world data sets. Their
comparisons reveal that linear discriminant analysis is the fastest of
the methods, when the underlying assumptions are met, and that
decision trees methods overfit in the presence of noise. Dietterich
* et al.* [75] argue that the inadequacy of
trees for certain domains may be due to the fact that trees are unable
to take into account some statistical information that is available to
other methods like neural networks. They show that decision trees
perform significantly better on the text-to-speech conversion problem
when extra statistical knowledge is provided.

Thu Oct 19 17:40:24 EDT 1995