Andreas Kerren (Linnaeus University, Sweden)
Title: Visual Text Analytics: Overview, State-of-the-Art, and Challenges
The interest for text visualization and visual text analytics has been heavily increasing for the last ten years. The reasons for this development are manifold, but for sure the availability of large amounts of heterogeneous text data (caused by the popularity of online social media) and the adoption of text processing algorithms by the visualization community are possible explanations. This invited talk at the EMNLP 2017 Workshop on New Frontiers in Summarization will primarily give an overview of visualization research with a focus on text visualization. In order to classify traditional and state-of-the-art visualization methods, I will present and use an interactive visual survey of text visualization techniques, called TextVis Browser, which is freely available online (textvis.lnu.se). My talk will conclude with a discussion of the most important challenges and open problems in text visualization. All together I hope that this talk can serve as a starting point for further discussions and the identification of synergies between the fields of visualization and summarization.
Andreas Kerren received the B.S. and M.S. degrees as well as his PhD degree in Computer Science from Saarland University, Saarbrücken (Germany). In 2008, he achieved his habilitation (docent competence) from Växjö University (Sweden). Dr. Kerren is currently a Full Professor in Computer Science at the Department of Computer Science, Linnaeus University (Sweden), where he is heading the research group for Information and Software Visualization, called ISOVIS. His main research interests include the areas of Information Visualization, Visual Analytics, and Human-Computer Interaction. He is, among others, editorial board member of the Information Visualization journal, has served as organizer/program chair at various conferences, such as IEEE VISSOFT 2013/2018 or IVAPP 2013-15/2018, and has edited a number of successful books on human-centered visualization.
Katja Filippova (Google Research, Switzerland)
Title: Sentence and Passage Summarization for Question Answering
Question answering (QA) requires NLP to understand the user request and serve the right information. The rise of mobile, where voice queries are typically answered with speech, brings additional NLP challenges with it because reading answers aloud is not always possible (e.g., for images) or may provide a poor user experience (e.g., when the answer is very long). In this talk I will first give an overview of how sentence-level summarization, aka sentence compression, has been approached by our team in the past years and will describe an evolution from a syntax-based optimization algorithm to a syntax-free deep neural network. I will show how the sentence compression models are applied to provide a better TTS (text-to-speech) experience for QA. Finally, I will talk about experiments on further improving TTS quality with query-focused answer snippet summarization models.
Katja Filippova is a research scientist at Google. She holds a Ph.D. from the Technical University of Darmstadt (2009) and a MA from the University of Tübingen (2005). During her Ph.D. she was supported by the Klaus Tschira Foundation and was affiliated with the EML Research in Heidelberg (now the Heidelberg Institute for Theoretical Studies). She has worked on applying statistical methods to text understanding and generation.
Ani Nenkova (University of Pennsylvania, USA)
Title: New Frontiers and Paths Well-travelled in Summarization Research
Exciting new summarization work is pushing the frontiers of the field, exploring new domains, datasets, and tasks and developing modern solutions for each. In this talk I take a look back at the vast body of research in news summarization, to overview the main results that may inform future research.
In multi-document summarization of news, optimization approaches to content selection have worked best and systems with supervised components have outperformed fully unsupervised ones. In single-document summarization of news, the beginning-of-document baseline has been very strong, outperforming many of the most competitive systems. Most systems generate summaries of fixed length but differences between systems emerge more clearly in longer summaries. Moreover, the type of the user information need influences the ideal summary length.
New tasks need new evaluation methods that are informative, reliable and efficient. To develop these, we establish protocols for human evaluation that capture key aspects of system characteristics and then ideally develop an automated replacement for that human evaluation. I will overview how such methods have been developed for news summarization, what we know about the shortcomings of automatic evaluation and what are some of the desirable aspects of evaluation for several emerging summarization tasks outside the news domain or those requiring text generation.
Ani Nenkova is an associate professor of computer and information science at the University of Pennsylvania. Her main areas of research are computational linguistics and artificial intelligence, with emphasis on developing computational methods for analysis of text quality and style, discourse, affect recognition and summarization. She obtained her PhD degree in computer science from Columbia University and was a postdoctoral fellow at Stanford University before joining Penn.
Ani and her collaborators are recipients of the best student paper award at SIGDial in 2010 and best paper award at EMNLP-CoNLL in 2012. The Penn team co-led by Ani won the audio-visual emotion recognition challenge (AVEC) for word-level prediction in 2012.
Ani was a program co-chair for SIGDial 2014 and NAACL-HLT in 2016.