Abstract—String matching algorithms plays an important
role in many applications of computer science: in particular
searching, retrieving and processing of data. Various fields that
rely on computer science for computing and data processing
such as science, informatics (e.g. biology, medical, and
healthcare), statistics, image, video/signal processing and
computational aspect of business (e.g. finance, accounting, and
computer security) would benefit greatly from efficient data
search algorithm, in particular string matching. Any
applications involving the use of database would use string
matching algorithm. Many string matching algorithms such as
TBM (Turbo Boyer Moore), BMH (Boyer-Moore-Horspool),
BMHS (Boyer Moore Horspool Sundays, and BMHS2 (Boyer
Moore Horspool Sundays 2) were introduced based on the
celebrated BM (Boyer-Moore) algorithm considered to be one
of the early efficient string searching algorithms. Although
these algorithm offers significant performance improvement
over the BM algorithm, they were designed with the assumption
of single core computer architecture which executes the
algorithm in a serialized manner. Today,
multiple-core-processor computers are very common, and
applications are designed to process big data thanks to the
advanced in computing technology of various fields. High
performance computing system utilizing parallel and
distributed computing has started to become popular. This
work evaluates and compares the performance of the
aforementioned string matching algorithms in parallel and
distributed environment for high performance computing with
respect to that of the serialized single-core computing platform.
In this work, the variants of BM algorithms are implemented
and evaluated on Apache Spark, a popular distributed
computing platform, by executing a set of queries of different
search pattern lengths.
Index Terms—Apache spark, Boyer Moore, distributed
computing, string matching.
The authors are with Faculty of Information and Communication
Technology and Integrative Computational Bioscience Center, Mahidol
University, Thailand (e-mail: kunaphas.kon@gmail.com,
boonsit.yim@mahidol.ac.th).
[PDF]
Cite: Kunaphas Kongkitimanon and Boonsit Yimwadsana, "On Performance Evaluation of BM-Based String Matching Algorithms in Distributed Computing Environment," International Journal of Future Computer and Communication vol. 6, no. 1, pp. 1-5, 2017.