ECTI TRANSACTIONS ON COMPUTER INFORMATION TECHNOLOGYVolume 16, No. 02, Month JUNE, Year 2022, Pages 125 - 134
Machine reading comprehension using multi-passage bert with dice loss on thai corpus
Theerit Lapchaicharoenkit, Peerapon Vateekeul
Abstract Download PDFNowadays there is an advancement in the field of machine reading comprehension task (MRC) due to the invention of large scale pre-trained language models, such as BERT. However, the performance is still limited when the context is long and contains many passages. BERT can only embed a part of the whole passage equal to the input size; thus, sliding windows must be used which leads to discontinued information when the passage is long. In this paper, we aim to propose a BERT-based MRC framework tailored for a long passage context in the Thai corpus. Our framework employs the multi-passage BERT along with self-adjusting dice loss, which can help the model focuses more on the answer region of the context passage. We also show that there is an improvement in the performance when an auxiliary task is used. The experiment was conducted on the Thai Question Answering (QA) dataset used in Thailand National Software Competition. The results show that our method improves the model’s performance over a traditional BERT framework.
Machine Reading Comprehension, Natural Lan- guage Processing, Deep Learn- ing, BERT