Abstract—Keyword Spotting is a challenging task aiming at
detecting the predefined keywords in utterances. In the low
resource environment such as little keyword templates and the
lack of linguistic information, the detection performance is
always unsatisfactory. In this paper, we focus on the low
resource situation where every keyword only has about 40
templates and the linguistic information is unknown. We explore
using deep neural networks for acoustic modeling. In addition,
we investigate several techniques including transfer-learning,
multilingual bottleneck features, balancing keyword filler data
and data augmentation to address the low resource problem and
improve the system's performance. Compared with a
query-by-example baseline system, substantial performance
improvement can be obtained with our proposed keyword
spotting system with deep neural network (KWS-DNN)
framework.
Index Terms—Keyword spotting, DNN, acoustic model.
The authors are with the Department of Electronic Engineering, Tsinghua
University, China (e-mail: skx13@mails.tsinghua.edu.cn,
cai-m10@mails.tsinghua.edu.cn, wqzhang@tsinghua.edu.cn,
chinaty188@163.com, ljia@tsinghua.edu.cn).
[PDF]
Cite: Kaixiang Shen, Meng Cai, Wei-Qiang Zhang, Yao Tian, and Jia Liu, "Investigation of DNN-Based Keyword Spotting in Low Resource Environments," International Journal of Future Computer and Communication vol. 5, no. 2, pp. 125-129, 2016.