ThaiScience  


ECTI TRANSACTIONS ON COMPUTER INFORMATION TECHNOLOGY


Volume 16, No. 02, Month JUNE, Year 2022, Pages 125 - 134


Machine reading comprehension using multi-passage bert with dice loss on thai corpus

Theerit Lapchaicharoenkit, Peerapon Vateekeul


Abstract Download PDF

Nowadays there is an advancement in the field of machine reading comprehension task (MRC) due to the invention of large scale pre-trained language models, such as BERT. However, the performance is still limited when the context is long and contains many passages. BERT can only embed a part of the whole passage equal to the input size; thus, sliding windows must be used which leads to discontinued information when the passage is long. In this paper, we aim to propose a BERT-based MRC framework tailored for a long passage context in the Thai corpus. Our framework employs the multi-passage BERT along with self-adjusting dice loss, which can help the model focuses more on the answer region of the context passage. We also show that there is an improvement in the performance when an auxiliary task is used. The experiment was conducted on the Thai Question Answering (QA) dataset used in Thailand National Software Competition. The results show that our method improves the model’s performance over a traditional BERT framework.


Keywords

Machine Reading Comprehension, Natural Lan- guage Processing, Deep Learn- ing, BERT



ECTI TRANSACTIONS ON COMPUTER INFORMATION TECHNOLOGY


Published by : ECTI Association
Contributions welcome at : http://www.ecti-thailand.org/paper/journal/ECTI-CIT