Tell Me How to Ask Again: Question Data Augmentation with Controllable Rewriting in Continuous Space.

Image credit: Unsplash

Abstract

In this paper, we propose a novel data augmentation method, referred to as Controllable Rewriting based Question Data Augmentation (CRQDA), for machine reading comprehension (MRC), question generation, and question-answering natural language inference tasks. We treat the question data augmentation task as a constrained question rewriting problem to generate context-relevant, high-quality, and diverse question data samples. CRQDA utilizes a Transformer Autoencoder to map the original discrete question into a continuous embedding space. It then uses a pre-trained MRC model to revise the question representation iteratively with gradient-based optimization. Finally, the revised question representations are mapped back into the discrete space, which serve as additional question data. Comprehensive experiments on SQuAD 2.0, SQuAD 1.1 question generation, and QNLI tasks demonstrate the effectiveness of CRQDA.

Dayiheng Liu
Dayiheng Liu
Ph.D. Student

My name is Dayiheng Liu (刘大一恒).

Jiancheng Lv
Jiancheng Lv
Dean and professor of Computer Science of Sichuan University

My research interests include natural language processing, computer vision, industrial intelligence, smart medicine and smart cultural creation.

Related