조재민 (Jaemin Cho)
Allen Institute for AI
Generating diverse sequences is important in many NLP applications such as question generation or summarization that exhibit semantically one-to-many relationships between source and the target sequences. We present a method to explicitly separate diversification from generation using a general plug-and-play module (called Selector) that wraps around and guides an existing encoder-decoder model. The diversification stage uses a mixture of experts to sample different binary masks on the source sequence for diverse content selection. The generation stage uses a standard encoder-decoder model given each selected content from the source sequence. Due to the non-differentiable nature of discrete sampling and the lack of ground truth labels for binary mask, we leverage a proxy for ground truth mask and adopt stochastic hard-EM for training. In question generation (SQuAD) and abstractive summarization (CNN-DM), our method demonstrates significant improvements in accuracy, diversity and training efficiency.
Jaemin Cho is a predoctoral young investigator at Allen Institute for AI (AI2). His interests broadly lie in sequence modeling, generative models and machine learning. Before joining AI2, he worked on machine learning and natural language processing at Naver Clova and Seoul National University.
•Homepage: https://j-min.io/