Abstract

An important but insufficiently addressed issue for machine learning in engineering applications is the task of model selection for new problems. Existing approaches to model selection generally focus on optimizing the learning algorithm and associated hyperparameters. However, in real-world engineering applications, the parameters that are external to the learning algorithm, such as feature engineering, can also have a significant impact on the performance of the model. These external parameters do not fit into most existing approaches for model selection and are therefore often studied ad hoc or not at all. In this article, we develop a statistical design of experiment (DOEs) approach to model selection based on the use of the Taguchi method. The key idea is that we use orthogonal arrays to plan a set of build-and-test experiments to study the external parameters in combination with the learning algorithm. The use of orthogonal arrays maximizes the information learned from each experiment and, therefore, enables the experimental space to be explored extremely efficiently in comparison with grid or random search methods. We demonstrated the application of the statistical DOE approach to a real-world model selection problem involving predicting service request escalation. Statistical DOE significantly reduced the number of experiments necessary to fully explore the external parameters for this problem and was able to successfully optimize the model with respect to the objective function of minimizing total cost in addition to the standard evaluation metrics such as accuracy, f-measure, and g-mean.

References

References
1.
Michie
,
D.
,
Spiegelhalter
,
D.
, and
Taylor
,
C.
,
1994
,
Machine Learning, Neural and Statistical Classification
,
Ellis-Horwood
,
NJ
.
2.
Brazdil
,
P.
,
Soares
,
C.
, and
Da Costa
,
J.
,
2003
, “
Ranking Learning Algorithms: Using Ibl and Meta-Learning on Accuracy and Time Results
,”
Mach. Learn.
,
50
(
3
), pp.
251
277
. 10.1023/A:1021713901879
3.
Strijov
,
V.
, and
Weber
,
G.
,
2010
, “
Nonlinear Regression Model Generation Using Hyperparameter Optimization
,”
Comput. Math. Appl.
,
60
(
4
), pp.
981
988
. 10.1016/j.camwa.2010.03.021
4.
Bergstra
,
J.
,
Bardenet
,
R.
,
Bengio
,
Y.
, and
Kégl
,
B.
,
2011
, “
Algorithms for Hyperparameter Optimization
,”
Advances in Neural Information Processing Systems
,
Granada, Spain
,
Dec. 12–17
, pp.
2546
2554
.
5.
Snoek
,
J.
,
Larochelle
,
H.
, and
Adams
,
R.
,
2012
, “
Practical Bayesian Optimization of Machine Learning Algorithms
,”
Advances in Neural Information Processing Systems
,
Lake Tahoe, NV
,
Dec. 3–8
, pp.
2951
2959
.
6.
Thornton
,
C.
,
Hutter
,
F.
,
Hoos
,
H.
, and
Leyton-Brown
,
K.
,
2013
, “
Auto-Weka: Combined Selection and Hyperparameter Optimization of Classification Algorithms
,”
ACM International Conference on Knowledge Discovery and Data Mining
,
Chicago, IL
,
Aug. 11–14
, pp.
847
855
.
7.
Olson
,
R.
, and
Moore
,
J. H
,
2016
, “
Tpot: A Tree-Based Pipeline Optimization Tool for Automating Machine Learning
,”
ICML Workshop on Automatic Machine Learning
,
New York City, NY
,
June 24
, pp.
66
74
.
8.
Feurer
,
M.
,
Klein
,
A.
,
Eggensperger
,
J.
,
Blum
,
M.
, and
Hutter
,
F.
,
2015
, “
Efficient and Robust Automated Machine Learning
,”
Advances in Neural Information Processing Systems
,
Montreal, Canada
,
Dec. 7–12
, pp.
2962
2970
.
9.
Desa
,
S.
, and
Munger
,
T.
,
2013
, “
A Representation-Based Methodology for Developing High-Value Knowledge Engineering Software Products: Theory, Application, and Implementation
,”
ASME J. Comput. Inform. Sci. Eng
,
13
(
4
), p.
041006
. 10.1115/1.4024914
10.
Munger
,
T.
,
2012
, “
A Representation-Based Methodology for Developing High-Value Knowledge Engineering Systems: Theory and Applications
,” Master’s thesis,
University of Califonia
,
Santa Cruz, CA
.
11.
Munger
,
T.
,
Desa
,
S.
, and
Wong
,
C.
,
2015
, “
The Use of Domain Knowledge Models for Effective Data Mining of Unstructured Customer Service Data in Engineering Applications
,”
IEEE Big Data Computing Service and Applications
,
Redwood City, CA
,
Mar. 30
, pp.
427
438
.
12.
Munger
,
T.
,
2016
, “
The Development of Representation-Based Methods for Knowledge Engineering of Computer Networks
,” Ph.D. thesis,
University of Califonia
,
Santa Cruz, CA
.
13.
Taguchi
,
G.
, and
Konishi
,
S.
,
1987
,
Orthogonal Arrays and Linear Graphs
,
ASI Press
,
Dearborn, MI
.
14.
Rice
,
J.
,
1976
, “
The Algorithm Selection Problem
,”
Adv. Comput.
,
15
(
1
), pp.
65
118
.
15.
Smith-Miles
,
K.
,
2008
, “
Cross-Disciplinary Perspectives on Metalearning for Algorithm Selection
,”
ACM Comput. Surveys
,
41
(
6
), pp.
1
25
. 10.1145/1456650.1456656
16.
Gama
,
J.
, and
Brazdil
,
P.
,
1995
, “
Characterization of Classification Algorithms
,”
Proceedings of the 7th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
,
Funchal, Madeira Island, Portugal
,
Oct. 3–6
, pp.
189
200
.
17.
Lindner
,
G.
, and
Studer
,
R.
,
1999
,
Ast: Support for Algorithm Selection With a Cbr Approach
,
Springer
,
Berlin/Heidelberg
.
18.
Pfahringer
,
B.
,
Bensusan
,
H.
, and
Giraud-Carrier
,
C.
,
2000
, “
Meta-Learning by Landmarking Various Learning Algorithms.
International Conference on Machine Learning
,
Stanford, CA
,
June 29–July 2
, pp.
743
750
.
19.
Breiman
,
L.
,
1996
, “
Bagging Predictors
,”
Mach. Learn.
,
24
(
2
), pp.
123
140
. 10.1007/bf00058655
20.
Schapire
,
R.
,
1990
, “
The Strength of Weak Learnability
,”
Mach. Learn.
,
5
(
2
), pp.
197
227
. 10.1007/bf00116037
21.
Bergstra
,
J.
,
Yamins
,
D.
, and
Cox
,
D.
,
2013
, “
Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures.
International Conference on International Conference on Machine Learning
,
Atlanta, GA
,
June 16–21
, pp.
115
123
.
22.
Smith
,
M.
,
Mitchell
,
L.
,
Giraud-Carrier
,
C.
, and
Martinez
,
T.
,
2014
, “
Recommending Learning Algorithms and Their Associated Hyperparameters
,”
International Workshop on Meta-learning and Algorithm Selection
,
Prague, Czech Republic
,
Aug. 19
, pp.
39
40
.
23.
Bermudez
,
J.
,
2014
,
Cognitive Science: An Introduction to the Science of the Mind
,
Cambridge University Press
,
Cambridge, UK
.
24.
Russell
,
S.
,
Norvig
,
P.
,
Canny
,
J.
,
Malik
,
J.
, and
Edwards
,
D.
,
2003
,
Artificial Intelligence: A Modern Approach
,
Prentice Hall
,
Upper Saddle River, NJ
.
25.
Alexander
,
C.
,
1964
,
Notes on the Synthesis of Form
,
Harvard University Press
,
Cambridge, MA
.
26.
Pahl
,
G.
, and
Beitz
,
W.
,
1996
,
Engineering Design: A Systematic Approach
,
Springer-Verlag
,
New York
.
27.
Simon
,
H.
,
1973
, “
The Structure of Ill Structured Problems
,”
Artif. Intell.
,
4
(
3–4
), pp.
181
201
. 10.1016/0004-3702(73)90011-8
28.
Schreiber
,
G.
, and
Wielinga
,
B.
,
1994
, “
Commonkads: A Comprehensive Methodology for Kbs Development
,”
IEEE Expert Syst.
,
9
(
6
), pp.
28
37
. 10.1109/64.363263
29.
Fisher
,
R.
,
1935
,
The Design of Experiments
,
Oliver and Boyd
,
Edinburgh
.
30.
Fowlkes
,
W.
, and
Creveling
,
C.
,
1995
,
Engineering Methods for Robust Product Design
,
Addison-Wesley
,
Boston, MA
.
31.
Srinagesh
,
K.
,
2006
,
The Principles of Experimental Research
,
Butterworth-Heinemann
,
Waltham, MA
.
32.
Witten
,
I.
, and
Frank
,
E.
,
2005
,
Data Mining: Practical Machine Learning Tools and Techniques
,
Morgan Kaufmann
,
Burlington, MA
.
33.
Phadke
,
M.
,
1989
,
Quality Engineering Using Robust Design
,
Prentice Hall
,
Upper Saddle River, NJ
.
34.
Hastie
,
T.
,
Tibshirani
,
R.
, and
Friedman
,
J.
,
2009
,
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
, 2nd ed.,
Springer
,
New York
.
35.
Zhang
,
X.
,
Ge
,
C.
, and
Dong
,
M.
,
2019
, “
Deep Neural Network Hyperparameter Optimization With Orthogonal Array Tuning
,”
International Conference on Neural Information Processing
,
Vancouver, Canada
,
Dec. 8–14
, pp.
287
295
.
You do not currently have access to this content.