A New Method for Vertical Parallelisation of TAN Learning Based on Balanced Incomplete Block Designs
MetadataShow full item record
Author/sMadsen, Anders L.; Jensen, Frank; Salmerón Cerdán, Antonio; Karlsen, Martin; Langseth, Helge; [et al.]
The framework of Bayesian networks is a widely popular formalism for performing belief update under uncertainty. Structure re- stricted Bayesian network models such as the Naive Bayes Model and Tree-Augmented Naive Bayes (TAN) Model have shown impressive per- formance for solving classi cation tasks. However, if the number of vari- ables or the amount of data is large, then learning a TAN model from data can be a time consuming task. In this paper, we introduce a new method for parallel learning of a TAN model from large data sets. The method is based on computing the mutual information scores between pairs of variables given the class variable in parallel. The computations are organised in parallel using balanced incomplete block designs. The results of a preliminary empirical evaluation of the proposed method on large data sets show that a signi cant performance improvement is pos- sible through parallelisation using the method presented in this paper.