ThaiScience  


WALAILAK JOURNAL OF SCIENCE AND TECHNOLOGY


Volume 16, No. 02, Month FEBRUARY, Year 2019, Pages 121 - 131


Hierarchical text categorization using level based neural networks of word embedding sequences with sharing layer information

Mongkud KLUNGPORNKUN, Peerapon VATEEKUL


Abstract Download PDF

In text corpora, it is common to categorize each document to a predefined class hierarchy, which is usually a tree. One of the most widely-used approaches is a level-based strategy that induces a multiclass classifier for each class level independently. However, all prior attempts did not utilize information from its parent level and employed a bag of words rather than considered a sequence of words. In this paper, we present a novel level-based hierarchical text categorization with a strategy called “sharing layer information” For each class level, a neural network is constructed, where its input is a sequence of word embedding vectors generated from Convolutional Neural Networks (CNN). Also, a training strategy to avoid imbalance issues is proposed called “the balanced resampling with mini-batch training” Furthermore, a label correction strategy is proposed to conform the predicted results from all networks on different class levels. The experiment was conducted on 2 standard benchmarks: WIPO and Wiki comparing to a top-down based SVM framework with TF-IDF inputs called “HR-SVM.” The results show that the proposed model can achieved the highest accuracy in terms of micro F1 and outperforms the baseline in the top levels in terms of macro F1.


Keywords

Text categorization, hierarchical multi-label classification, deep learning



WALAILAK JOURNAL OF SCIENCE AND TECHNOLOGY


Published by : Walailak University
Contributions welcome at : http://wjst.wu.ac.th/index.php/wjst