Podcasts by Category

NLP Highlights

Allen Institute for Artificial Intelligence

Welcome to the NLP highlights podcast, where we invite researchers to talk about their work in various areas in natural language processing. All views expressed belong to the hosts/guests, and do not represent their employers.

145 - Are LLMs safe?

0:00 / 0:00

145 - Are LLMs safe?
Curious about the safety of LLMs? 🤔 Join us for an insightful new episode featuring Suchin Gururangan, Young Investigator at Allen Institute for Artificial Intelligence and Data Science Engineer at Appuri. 🚀 Don't miss out on expert insights into the world of LLMs!
Thu, 29 Feb 2024 - 42min
144 - "Imaginative AI" with Mohamed Elhoseiny
This podcast episode features Dr. Mohamed Elhoseiny, a true luminary in the realm of computer vision with over a decade of groundbreaking research. As an Assistant Professor at KAUST, Dr. Elhoseiny's work delves into the intersections of Computer Vision, Language & Vision, and Computational Creativity in Art, Fashion, and AI. Notably, he co-organized the 1st and 2nd Workshops on Closing the Loop between Vision and Language, demonstrating his commitment to advancing interdisciplinary research. With a rich educational background from Stanford University's Graduate School of Business Ignite Program, and Rutgers University as MS/PhD Researcher, coupled with influential stints at Stanford, Baidu Research, Facebook AI Research, Adobe Research, and SRI International, Dr. Elhoseiny brings a wealth of experience to our discussion.
Mon, 08 Jan 2024 - 23min
143 - 142 - Science Of Science, with Kyle Lo
Our first guest with this new format is Kyle Lo, the most senior lead scientist in the Semantic Scholar team at Allen Institute for AI (AI2), who kindly agreed to share his perspective on #Science of #Science (#scisci) on our podcast. SciSci is concerned with studying how people do science, and includes developing methods and tools to help people consume AND produce science. Kyle has made several critical contributions in this field which enabled a lot of SciSci work over the past 5+ years, ranging from novel NLP methods (eg, SciBERT https://lnkd.in/gTP_tYiF ), to open data collections (eg, S2ORK https://lnkd.in/g4J6tXCG), to toolkits for manipulating scientific documents (eg, PaperMage https://lnkd.in/gwU7k6mJ which JUST received a Best Paper Award 🏆 at EMNLP 2023). Kyle Lo's homepage: https://kyleclo.github.io/
Thu, 28 Dec 2023 - 48min
142 - 141 - Building an open source LM, with Iz Beltagy and Dirk Groeneveld
In this special episode of NLP Highlights, we discussed building and open sourcing language models. What is the usual recipe for building large language models? What does it mean to open source them? What new research questions can we answer by open sourcing them? We particularly focused on the ongoing Open Language Model (OLMo) project at AI2, and invited Iz Beltagy and Dirk Groeneveld, the research and engineering leads of the OLMo project to chat. Blog post announcing OLMo: https://blog.allenai.org/announcing-ai2-olmo-an-open-language-model-made-by-scientists-for-scientists-ab761e4e9b76 Organizations interested in partnership can express their interest here: https://share.hsforms.com/1blFWEWJ2SsysSXFUEJsxuA3ioxm You can find Iz at twitter.com/i_beltagy and Dirk at twitter.com/mechanicaldirk
Thu, 29 Jun 2023 - 29min
141 - 140 - Generative AI and Copyright, with Chris Callison-Burch
In this special episode, we chatted with Chris Callison-Burch about his testimony in the recent U.S. Congress Hearing on the Interoperability of AI and Copyright Law. We started by asking Chris about the purpose and the structure of this hearing. Then we talked about the ongoing discussion on how the copyright law is applicable to content generated by AI systems, the potential risks generative AI poses to artists, and Chris’ take on all of this. We end the episode with a recording of Chris’ opening statement at the hearing.
Tue, 06 Jun 2023 - 51min
140 - 139 - Coherent Long Story Generation, with Kevin Yang
How can we generate coherent long stories from language models? Ensuring that the generated story has long range consistency and that it conforms to a high level plan is typically challenging. In this episode, Kevin Yang describes their system that prompts language models to first generate an outline, and iteratively generate the story while following the outline and reranking and editing the outputs for coherence. We also discussed the challenges involved in evaluating long generated texts. Kevin Yang is a PhD student at UC Berkeley. Kevin's webpage: https://people.eecs.berkeley.edu/~yangk/ Papers discussed in this episode: 1. Re3: Generating Longer Stories With Recursive Reprompting and Revision (https://www.semanticscholar.org/paper/Re3%3A-Generating-Longer-Stories-With-Recursive-and-Yang-Peng/2aab6ca1a8dae3f3db6d248231ac3fa4e222b30a) 2. DOC: Improving Long Story Coherence With Detailed Outline Control (https://www.semanticscholar.org/paper/DOC%3A-Improving-Long-Story-Coherence-With-Detailed-Yang-Klein/ef6c768f23f86c4aa59f7e859ca6ffc1392966ca)
Fri, 24 Mar 2023 - 45min
139 - 138 - Compositional Generalization in Neural Networks, with Najoung Kim
Compositional generalization refers to the capability of models to generalize to out-of-distribution instances by composing information obtained from the training data. In this episode we chatted with Najoung Kim, on how to explicitly evaluate specific kinds of compositional generalization in neural network models of language. Najoung described COGS, a dataset she built for this, some recent results in the space, and why we should be careful about interpreting the results given the current practice of pretraining models of lots of unlabeled text. Najoung's webpage: https://najoungkim.github.io/ Papers we discussed: 1. COGS: A Compositional Generalization Challenge Based on Semantic Interpretation (Kim et al., 2020): https://www.semanticscholar.org/paper/b20ddcbd239f3fa9acc603736ac2e4416302d074 2. Compositional Generalization Requires Compositional Parsers (Weissenhorn et al., 2022): https://www.semanticscholar.org/paper/557ebd17b7c7ac4e09bd167d7b8909b8d74d1153 3. Uncontrolled Lexical Exposure Leads to Overestimation of Compositional Generalization in Pretrained Models (Kim et al., 2022): https://www.semanticscholar.org/paper/8969ea3d254e149aebcfd1ffc8f46910d7cb160e Note that we referred to the final paper by an earlier name in the discussion.
Fri, 20 Jan 2023 - 48min
138 - 137 - Nearest Neighbor Language Modeling and Machine Translation, with Urvashi Khandelwal
We invited Urvashi Khandelwal, a research scientist at Google Brain to talk about nearest neighbor language and machine translation models. These models interpolate parametric (conditional) language models with non-parametric distributions over the closest values in some data stores built from relevant data. Not only are these models shown to outperform the usual parametric language models, they also have important implications on memorization and generalization in language models. Urvashi's webpage: https://urvashik.github.io Papers discussed: 1) Generalization through memorization: Nearest Neighbor Language Models (https://www.semanticscholar.org/paper/7be8c119dbe065c52125ee7716601751f3116844) 2)Nearest Neighbor Machine Translation (https://www.semanticscholar.org/paper/20d51f8e449b59c7e140f7a7eec9ab4d4d6f80ea)
Fri, 13 Jan 2023 - 35min
137 - 136 - Including Signed Languages in NLP, with Kayo Yin and Malihe Alikhani
In this episode, we talk with Kayo Yin, an incoming PhD at Berkeley, and Malihe Alikhani, an assistant professor at the University of Pittsburgh, about opportunities for the NLP community to contribute to Sign Language Processing (SLP). We talked about history and misconceptions about sign languages, high-level similarities and differences between spoken and sign languages, distinct linguistic features of signed languages, representations, computational resources, SLP tasks, and suggestions for better design and implementation of SLP models.
Thu, 19 May 2022 - 1h 02min
136 - 135 - PhD Application Series: After Submitting Applications
This episode is the third in our current series on PhD applications. We talk about what the PhD application process looks like after applications are submitted. We start with a general overview of the timeline, then talk about how to approach interviews and conversations with faculty, and finish by discussing the different factors to consider in deciding between programs. The guests for this episode are Rada Mihalcea (Professor at the University of Michigan), Aishwarya Kamath (PhD student at NYU), and Sanjay Subramanian (PhD student at UC Berkeley). Homepages: - Aishwarya Kamath: https://ashkamath.github.io/ - Sanjay Subramanian: https://sanjayss34.github.io/ - Rada Mihalcea: https://web.eecs.umich.edu/~mihalcea/ The hosts for this episode are Alexis Ross and Nishant Subramani.
Wed, 02 Mar 2022 - 36min
135 - 134 - PhD Application Series: PhDs in Europe versus the US, with Barbara Plank and Gonçalo Correia
This episode is the second in our current series on PhD applications. How do PhD programs in Europe differ from PhD programs in the US, and how should people decide between them? In this episode, we invite Barbara Plank (Professor at ITU, IT University of Copenhagen) and Gonçalo Correia (ELLIS PhD student at University of Lisbon and University of Amsterdam) to share their perspectives on this question. We start by talking about the main differences between pursuing a PhD in Europe and the US. We then talk about the application requirements for European PhD programs and factors to consider when deciding whether to apply in Europe or the US. We conclude by talking about the ELLIS PhD program, a relatively new program for PhD students that facilitates collaborations across Europe. ELLIS PhD program: https://ellis.eu/phd-postdoc (Application Deadline: November 15, 2021) Homepages: - Barbara Plank: https://bplank.github.io/ - Gonçalo Correia: https://goncalomcorreia.github.io/ The hosts for this episode are Alexis Ross and Zhaofeng Wu.
Tue, 19 Oct 2021 - 38min
134 - 133 - PhD Application Series: Preparing Application Materials, with Nathan Schneider and Roma Patel
This episode is the first in our current series on PhD applications. How should people prepare their applications to PhD programs in NLP? In this episode, we invite Nathan Schneider (Professor of Linguistics and Computer Science at Georgetown University) and Roma Patel (PhD student in Computer Science at Brown University) to share their perspectives on preparing application materials. We start by talking about what factors should go into the decision to apply for PhD programs and how to gain relevant experience. We then talk about the most important parts of an application, focusing particularly on how to write a strong statement of purpose and choose recommendation letter writers. Blog posts mentioned in this episode: - Nathan Schneider’s Advice on Statements of Purpose: https://nschneid.medium.com/inside-ph-d-admissions-what-readers-look-for-in-a-statement-of-purpose-3db4e6081f80 - Student Perspectives on Applying to NLP PhD Programs: https://blog.nelsonliu.me/2019/10/24/student-perspectives-on-applying-to-nlp-phd-programs/ Homepages: - Nathan Schneider: https://people.cs.georgetown.edu/nschneid/ - Roma Patel: http://cs.brown.edu/people/rpatel59/ The hosts for this episode are Alexis Ross and Nishant Subramani.
Wed, 06 Oct 2021 - 43min
133 - 132 - Alexa Prize Socialbot Grand Challenge and Alquist 4.0, with Petr Marek
In this episode, we discussed the Alexa Prize Socialbot Grand Challenge and this year's winning submission, Alquist 4.0, with Petr Marek, a member of the winning team. Petr gave us an overview of their submission, the design choices that led to them winning the competition, including combining a hardcoded dialog tree and a neural generator model and extracting implicit personal information about users from their responses, and some outstanding challenges. Petr Marek is a PhD student at the Czech Technical University in Prague. More about the Alexa Prize challenges: https://developer.amazon.com/alexaprize Technical report on Alquist 4.0: https://arxiv.org/abs/2109.07968
Mon, 27 Sep 2021 - 41min
132 - 131 - Opportunities and Barriers between HCI and NLP, with Nanna Inie and Leon Derczynski
What can NLP researchers learn from Human Computer Interaction (HCI) research? We chatted with Nanna Inie and Leon Derczynski to find out. We discussed HCI's research processes including methods of inquiry, the data annotation processes used in HCI, and how they are different from NLP, and the cognitive methods used in HCI for qualitative error analyses. We also briefly talked about the opportunities the field of HCI presents for NLP researchers. This discussion is based on the following paper: https://aclanthology.org/2021.hcinlp-1.16/ Nanna Inie is a postdoctoral researcher and Leon Derczynski is an associate professor in CS at the IT University of Copenhagen. The hosts for this episode are Ana Marasović and Pradeep Dasigi.
Fri, 20 Aug 2021 - 46min
131 - 130 - Linking human cognitive patterns to NLP Models, with Lisa Beinborn
In this episode, we talk with Lisa Beinborn, an assistant professor at Vrije Universiteit Amsterdam, about how to use human cognitive signals to improve and analyze NLP models. We start by discussing different kinds of cognitive signals—eye-tracking, EEG, MEG, and fMRI—and challenges associated with using them. We then turn to Lisa’s recent work connecting interpretability measures with eye-tracking data, which reflect the relative importance measures of different tokens in human reading comprehension. We discuss empirical results suggesting that eye-tracking signals correlate strongly with gradient-based saliency measures, but not attention, in NLP methods. We conclude with discussion of the implications of these findings, as well as avenues for future work. Papers discussed in this episode: Towards best practices for leveraging human language processing signals for natural language processing: https://api.semanticscholar.org/CorpusID:219309655 Relative Importance in Sentence Processing: https://api.semanticscholar.org/CorpusID:235358922 Lisa Beinborn’s webpage: https://beinborn.eu/ The hosts for this episode are Alexis Ross and Pradeep Dasigi.
Mon, 09 Aug 2021 - 44min
130 - 129 - Transformers and Hierarchical Structure, with Shunyu Yao
In this episode, we talk to Shunyu Yao about recent insights into how transformers can represent hierarchical structure in language. Bounded-depth hierarchical structure is thought to be a key feature of natural languages, motivating Shunyu and his coauthors to show that transformers can efficiently represent bounded-depth Dyck languages, which can be thought of as a formal model of the structure of natural languages. We went on to discuss some of the intuitive ideas that emerge from the proofs, connections to RNNs, and insights about positional encodings that may have practical implications. More broadly, we also touched on the role of formal languages and other theoretical tools in modern NLP. Papers discussed in this episode: - Self-Attention Networks Can Process Bounded Hierarchical Languages (https://arxiv.org/abs/2105.11115) - Theoretical Limitations of Self-Attention in Neural Sequence Models (https://arxiv.org/abs/1906.06755) - RNNs can generate bounded hierarchical languages with optimal memory (https://arxiv.org/abs/2010.07515) - On the Practical Computational Power of Finite Precision RNNs for Language Recognition (https://arxiv.org/abs/1805.04908) Shunyu Yao's webpage: https://ysymyth.github.io/ The hosts for this episode are William Merrill and Matt Gardner.
Fri, 02 Jul 2021 - 35min
129 - 128 - Dynamic Benchmarking, with Douwe Kiela
We discussed adversarial dataset construction and dynamic benchmarking in this episode with Douwe Kiela, a research scientist at Facebook AI Research who has been working on a dynamic benchmarking platform called Dynabench. Dynamic benchmarking tries to address the issue of many recent datasets getting solved with little progress being made towards solving the corresponding tasks. The idea is to involve models in the data collection loop to encourage humans to provide data points that are hard for those models, thereby continuously collecting harder datasets. We discussed the details of this approach, and some potential caveats. We also discussed dynamic leaderboards, a recent addition to Dynabench that rank systems based on their utility given specific use cases. Papers discussed in this episode: 1. Dynabench: Rethinking Benchmarking in NLP (https://www.semanticscholar.org/paper/Dynabench%3A-Rethinking-Benchmarking-in-NLP-Kiela-Bartolo/77a096d80eb4dd4ccd103d1660c5a5498f7d026b) 2. Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking (https://www.semanticscholar.org/paper/Dynaboard%3A-An-Evaluation-As-A-Service-Platform-for-Ma-Ethayarajh/d25bb256e5b69f769a429750217b0d9ec1cf4d86) 3. Adversarial NLI: A New Benchmark for Natural Language Understanding (https://www.semanticscholar.org/paper/Adversarial-NLI%3A-A-New-Benchmark-for-Natural-Nie-Williams/9d87300892911275520a4f7a5e5abf4f1c002fec) 4. DynaSent: A Dynamic Benchmark for Sentiment Analysis (https://www.semanticscholar.org/paper/DynaSent%3A-A-Dynamic-Benchmark-for-Sentiment-Potts-Wu/284dfcf7f25ca87b2db235c6cdc848b4143d3923) Douwe Kiela's webpage: https://douwekiela.github.io/ The hosts for this episode are Pradeep Dasigi and Alexis Ross.
Sat, 19 Jun 2021 - 47min
128 - 127 - Masakhane and Participatory Research for African Languages, with Tosin Adewumi and Perez Ogayo
We invited members of Masakhane, Tosin Adewumi and Perez Ogayo, to talk about their EMNLP Findings paper that discusses why typical research is limited for low-resourced NLP and how participatory research can help. As a result of participatory research, Masakhane has many, many success stories: first datasets and benchmarks in African languages, first research on human evaluation specifically for MT for low-resource languages, etc. In this episode, we talked about one of them—MasakhaNER—in more detail. The hosts for this episode are Pradeep Dasigi and Ana Marasović. -------------------------- Tosin Adewumi is a PhD student at the Luleå University of Technology in Sweden. His Twitter handle: @tosintwit Perez Ogayo is an undergrad student at the African Leadership University in Rwanda. Her Twitter handle: @a_ogayo Masakhane is a grassroots organization whose mission is to strengthen and spur NLP research in African languages, for Africans, by Africans: https://www.masakhane.io/ Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages (Findings of EMNLP 2020): https://arxiv.org/abs/2010.02353 MasakhaNER: Named Entity Recognition for African languages (AfricaNLP Workshop @ EACL 2021): https://arxiv.org/abs/2103.11811
Tue, 08 Jun 2021 - 47min
127 - 126 - Optimizing Continuous Prompts for Generation, with Lisa Li
We invited Lisa Li to talk about her recent work, Prefix-Tuning: Optimizing Continuous Prompts for Generation. Prefix tuning is a lightweight alternative to finetuning, and the idea is to tune only a fixed-length task-specific continuous vector, and to keep the pretrained transformer parameters frozen. We discussed how prefix tuning compares with finetuning and other efficient alternatives on two tasks in various experimental settings, and in what scenarios prefix tuning is preferable. Lisa is a Phd student at Stanford University. Lisa's webpage: https://xiangli1999.github.io/ The hosts for this episode are Pradeep Dasigi and Ana Marasović.
Mon, 24 May 2021 - 47min
126 - 125 - VQA for Real Users, with Danna Gurari
How can we build Visual Question Answering systems for real users? For this episode, we chatted with Danna Gurari, about her work in building datasets and models towards VQA for people who are blind. We talked about the differences between the existing datasets, and Vizwiz, a dataset built by Gurari et al., and the resulting algorithmic changes. We also discussed the unsolved challenges in this field, and the new tasks they result in. Danna Gurari is an Assistant Professor as well as Founding Director of the Image and Video Computing group in the School of Information at University of Texas at Austin (UT-Austin). Vizwiz project page: https://vizwiz.org/ The hosts for this episode are Ana Marasović and Pradeep Dasigi.
Tue, 04 May 2021 - 42min
125 - 124 - Semantic Machines and Task-Oriented Dialog, with Jayant Krishnamurthy and Hao Fang
We invited Jayant Krishnamurthy and Hao Fang, researchers at Microsoft Semantic Machines to discuss their platform for building task-oriented dialog systems, and their recent TACL paper on the topic. The paper introduces a new formalism for task-oriented dialog to effectively handle references and revisions in complex dialog, and a large realistic dataset that uses this formalism. Leaderboard associated with the dataset: https://microsoft.github.io/task_oriented_dialogue_as_dataflow_synthesis/ Jayant's Twitter handle: https://twitter.com/jayantkrish Hao's Twitter handle: https://twitter.com/hfang90
Wed, 14 Apr 2021 - 45min
124 - 123 - Robust NLP, with Robin Jia
In this episode, Robin Jia talks about how to build robust NLP systems. We discuss the different senses in which a system can be robust, reasons to care about system robustness, and the challenges involved in evaluating robustness of NLP models. We talk about how to build certifiably robust models through interval bound propagation and discrete encoding functions, as well as how to modify data collection procedures through active learning for more robust model development. Robin Jia is currently a visiting researcher at Facebook AI Research, and will be an assistant professor in the Department of Computer Science at the University of Southern California starting Fall 2021.
Mon, 05 Apr 2021 - 47min
123 - 122 - Statutory Reasoning in Tax Law, with Nils Holzenberger
We invited Nils Holzenberger, a PhD student at JHU to talk about a dataset involving statutory reasoning in tax law Holzenberger et al. released recently. This dataset includes difficult textual entailment and question answering problems that involve reasoning about how sections in tax law are applicable to specific cases. They also released a Prolog solver that fully solves the problems, and show that learned models using dense representations of text perform poorly. We discussed why this is the case, and how one can train models to solve these challenges. Project webpage: https://nlp.jhu.edu/law/
Thu, 12 Nov 2020 - 46min
122 - 121 - Language and the Brain, with Alona Fyshe
We invited Alona Fyshe to talk about the link between NLP and the human brain. We began by talking about what we currently know about the connection between representations used in NLP and representations recorded in the brain. We also discussed how different brain imaging techniques compare to each other. We then dove into experiments investigating how hidden states of LSTM language models correlate with EEG brain imaging data on three types of language inputs: well-formed grammatical sentences, pseudo-word sentences preserving syntax but not semantics, and word-lists preserving neither. We talk about the kinds of conclusions that can be drawn from these correlations and conclude by discussing avenues for future work.
Fri, 30 Oct 2020 - 42min
121 - 120 - Evaluation of Text Generation, with Asli Celikyilmaz
We invited Asli Celikyilmaz for this episode to talk about evaluation of text generation systems. We discussed the challenges in evaluating generated text, and covered human and automated metrics, with a discussion of recent developments in learning metrics. We also talked about some open research questions, including the difficulties in evaluating factual correctness of generated text. Asli Celikyilmaz is a Principal Researcher at Microsoft Research. Link to a survey co-authored by Asli on this topic: https://arxiv.org/abs/2006.14799
Sat, 03 Oct 2020 - 55min
120 - 119 - Social NLP, with Diyi Yang
In this episode, Diyi Yang gives us an overview of using NLP models for social applications, including understanding social relationships, processes, roles, and power. As NLP systems are getting used more and more in the real world, they additionally have increasing social impacts that must be studied. We talk about how to get started in this field, what datasets exist and are commonly used, and potential ethical issues. We additionally cover two of Diyi's recent papers, on neutralizing subjective bias in text, and on modeling persuasiveness in text. Diyi Yang is an assistant professor in the School of Interactive Computing at Georgia Tech.
Thu, 03 Sep 2020 - 53min
119 - 118 - Coreference Resolution, with Marta Recasens
In this episode, we talked about Coreference Resolution with Marta Recasens, a Research Scientist at Google. We discussed the complexity involved in resolving references in language, the simplification of the problem that the NLP community has focused on by talking about specific datasets, and the complex coreference phenomena that are not yet captured in those datasets. We also briefly talked about how coreference is handled in languages other than English, and how some of the notions we have about modeling coreference phenomena in English do not necessarily transfer to other languages. We ended the discussion by talking about large language models, and to what extent they might be good at handling coreference.
Wed, 26 Aug 2020 - 47min
118 - 117 - Interpreting NLP Model Predictions, with Sameer Singh
We interviewed Sameer Singh for this episode, and discussed an overview of recent work in interpreting NLP model predictions, particularly instance-level interpretations. We started out by talking about why it is important to interpret model outputs and why it is a hard problem. We then dove into the details of three kinds of interpretation techniques: attribution based methods, interpretation using influence functions, and generating explanations. Towards the end, we spent some time discussing how explanations of model behavior can be evaluated, and some limitations and potential concerns in evaluation methods. Sameer Singh is an Assistant Professor of Computer Science at the University of California, Irvine. Some of the techniques discussed in this episode have been implemented in the AllenNLP Interpret framework (details and demo here: https://allennlp.org/interpret).
Thu, 13 Aug 2020 - 56min
117 - 116 - Grounded Language Understanding, with Yonatan Bisk
We invited Yonatan Bisk to talk about grounded language understanding. We started off by discussing an overview of the topic, its research goals, and the the challenges involved. In the latter half of the conversation, we talked about ALFRED (Shridhar et al., 2019), a grounded instruction following benchmark that simulates training a robot butler. The current best models built for this benchmark perform very poorly compared to humans. We discussed why that might be, and what could be done to improve their performance. Yonatan Bisk is currently an assistant professor at Language Technologies Institute at Carnegie Mellon University. The data and the leaderboard for ALFRED can be accessed here: https://askforalfred.com/.
Fri, 03 Jul 2020 - 59min
116 - 115 - AllenNLP, interviewing Matt Gardner
In this special episode, Carissa Schoenick, a program manager and communications director at AI2 interviewed Matt Gardner about AllenNLP. We chatted about the origins of AllenNLP, the early challenges in building it, and the design decisions behind the library. Given the release of AllenNLP 1.0 this week, we asked Matt what users can expect from the new release, what improvements the AllenNLP team is working on for the future versions.
Wed, 17 Jun 2020 - 33min
115 - 114 - Behavioral Testing of NLP Models, with Marco Tulio Ribeiro
We invited Marco Tulio Ribeiro, a Senior Researcher at Microsoft, to talk about evaluating NLP models using behavioral testing, a framework borrowed from Software Engineering. Marco describes three kinds of black-box tests the check whether NLP models satisfy certain necessary conditions. While breaking the standard IID assumption, this framework presents a way to evaluate whether NLP systems are ready for real-world use. We also discuss what capabilities can be tested using this framework, how one can come up with good tests, and the need for an evolving set of behavioral tests for NLP systems. Marco’s homepage: https://homes.cs.washington.edu/~marcotcr/
Tue, 26 May 2020 - 43min
114 - 113 - Managing Industry Research Teams, with Fernando Pereira
We invited Fernando Pereira, a VP and Distinguished Engineer at Google, where he leads NLU and ML research, to talk about managing NLP research teams in industry. Topics we discussed include prioritizing research against product development and effective collaboration with product teams, dealing with potential research interest mismatch between individuals and the company, managing publications, hiring new researchers, and diversity and inclusion.
Fri, 22 May 2020 - 42min
113 - 112 - Alignment of Multilingual Contextual Representations, with Steven Cao
We invited Steven Cao to talk about his paper on multilingual alignment of contextual word embeddings. We started by discussing how multilingual transformers work in general, and then focus on Steven’s work on aligning word representations. The core idea is to start from a list of words automatically aligned from parallel corpora and to ensure the representations of the aligned words are similar to each other while not moving too far away from their original representations. We discussed the experiments on the XNLI dataset in the paper, analysis, and the decision to do the alignment at word level and compare it to other possibilities such as aligning word pieces or higher level encoded representations in transformers. Paper: https://openreview.net/forum?id=r1xCMyBtPS Steven Cao’s webpage: https://stevenxcao.github.io/
Wed, 13 May 2020 - 33min
112 - 111 - Typologically diverse, multi-lingual, information-seeking questions, with Jon Clark
We invited Jon Clark from Google to talk about TyDi QA, a new question answering dataset, for this episode. The dataset contains information seeking questions in 11 languages that are typologically diverse, i.e., they differ from each other in terms of key structural and functional features. The questions in TyDiQA are information-seeking, like those in Natural Questions, which we discussed in the previous episode. In addition, TyDiQA also has questions collected in multiple languages using independent crowdsourcing pipelines, as opposed to some other multilingual QA datasets like XQuAD and MLQA where English data is translated into other languages. The dataset and the leaderboard can be accessed at https://ai.google.com/research/tydiqa.
Mon, 27 Apr 2020 - 38min
111 - 110 - Natural Questions, with Tom Kwiatkowski and Michael Collins
In this episode, Tom Kwiatkowski and Michael Collins talk about Natural Questions, a benchmark for question answering research. We discuss how the dataset was collected to reflect naturally-occurring questions, the criteria used for identifying short and long answers, how this dataset differs from other QA datasets, and how easy it might be to game the benchmark with superficial processing of the text. We also contrast the holistic design in Natural Questions to deliberately targeting specific linguistic phenomena of interest when building a QA dataset. Dataset: https://ai.google.com/research/NaturalQuestions Paper: https://www.mitpressjournals.org/doi/full/10.1162/tacl_a_00276
Mon, 06 Apr 2020 - 43min
110 - 109 - What Does Your Model Know About Language, with Ellie Pavlick
How do we know, in a concrete quantitative sense, what a deep learning model knows about language? In this episode, Ellie Pavlick talks about two broad directions to address this question: structural and behavioral analysis of models. In structural analysis, we often train a linear classifier for some linguistic phenomenon we'd like to probe (e.g., syntactic dependencies) while using the (frozen) weights of a model pre-trained on some tasks (e.g., masked language models). What can we conclude from the results of probing experiments? What does probing tell us about the linguistic abstractions encoded in each layer of an end-to-end pre-trained model? How well does it match classical NLP pipelines? How important is it to freeze the pre-trained weights in probing experiments? In contrast, behavioral analysis evaluates a model's ability to distinguish between inputs which respect vs. violate a linguistic phenomenon using acceptability or entailment tasks, e.g., can the model predict which is more likely: "dog bites man" vs. "man bites dog"? We discuss the significance of which format to use for behavioral tasks, and how easy it is for humans to perform such tasks. Ellie Pavlick's homepage: https://cs.brown.edu/people/epavlick/ BERT rediscovers the classical nlp pipeline , by Ian Tenney, Dipanjan Das, Ellie Pavlick https://arxiv.org/pdf/1905.05950.pdf?fbclid=IwAR3gzFibSBoDGdjqVu9Gq0mh1lDdRZa7dm42JuXXUfjG6rKZ44iHIOdV6jg Inherent Disagreements in Human Textual Inferences by Ellie Pavlick and Tom Kwiatkowski https://www.mitpressjournals.org/doi/full/10.1162/tacl_a_00293
Mon, 30 Mar 2020 - 46min
109 - 108 - Data-To-Text Generation, with Verena Rieser and Ondřej Dušek
In this episode we invite Verena Rieser and Ondřej Dušek on to talk to us about the complexities of generating natural language when you have some kind of structured meaning representation as input. We talk about when you might want to do this, which is often is some kind of a dialog system, but also generating game summaries, and even some language modeling work. We then talk about why this is hard, which in large part is due to the difficulty of collecting data, and how to evaluate the output of these systems. We then move on to discussing the details of a major challenge that Verena and Ondřej put on, called the end-to-end natural language generation challenge (E2E NLG). This was a dataset of task-based dialog generation focused on the restaurant domain, with some very innovative data collection techniques. They held a shared task with 16 participating teams in 2017, and the data has been further used since. We talk about the methods that people used for the task, and what we can learn today from what methods have been used on this data. Verena's website: https://sites.google.com/site/verenateresarieser/ Ondřej's website: https://tuetschek.github.io/ The E2E NLG Challenge that we talked about quite a bit: http://www.macs.hw.ac.uk/InteractionLab/E2E/
Mon, 23 Mar 2020 - 49min
108 - 107 - Multi-Modal Transformers, with Hao Tan and Mohit Bansal
In this episode, we invite Hao Tan and Mohit Bansal to talk about multi-modal training of transformers, focusing in particular on their EMNLP 2019 paper that introduced LXMERT, a vision+language transformer. We spend the first third of the episode talking about why you might want to have multi-modal representations. We then move to the specifics of LXMERT, including the model structure, the losses that are used to encourage cross-modal representations, and the data that is used. Along the way, we mention latent alignments between images and captions, the granularity of captions, and machine translation even comes up a few times. We conclude with some speculation on the future of multi-modal representations. Hao's website: http://www.cs.unc.edu/~airsplay/ Mohit's website: http://www.cs.unc.edu/~mbansal/ LXMERT paper: https://www.aclweb.org/anthology/D19-1514/
Mon, 24 Feb 2020 - 37min
107 - 106 - Ethical Considerations In NLP Research, with Emily Bender
In this episode, we talked to Emily Bender about the ethical considerations in developing NLP models and putting them in production. Emily cited specific examples of ethical issues, and talked about the kinds of potential concerns to keep in mind, both when releasing NLP models that will be used by real people, and also while conducting NLP research. We concluded by discussing a set of open-ended questions about designing tasks, collecting data, and publishing results, that Emily has put together towards addressing these concerns. Emily M. Bender is a Professor in the Department of Linguistics and an Adjunct Professor in the Department of Computer Science and Engineering at the University of Washington. She’s active on Twitter at @emilymbender.
Mon, 17 Feb 2020 - 39min
106 - 105 - Question Generation, with Sudha Rao
In this episode we invite Sudha Rao to talk about question generation. We talk about different settings where you might want to generate questions: for human testing scenarios (rare), for data augmentation (has been done a bunch for SQuAD-like tasks), for detecting missing information / asking clarification questions, for dialog uses, and others. After giving an overview of the general area, we talk about the specifics of some of Sudha's work, including her ACL 2018 best paper on ranking clarification questions using EVPI. We conclude with a discussion of evaluating question generation, which is a hard problem, and what the exciting open questions there are in this research area. Sudha's website: https://raosudha.weebly.com/
Mon, 10 Feb 2020 - 42min
105 - 104 - Model Distillation, with Victor Sanh and Thomas Wolf
In this episode we talked with Victor Sanh and Thomas Wolf from HuggingFace about model distillation, and DistilBERT as one example of distillation. The idea behind model distillation is compressing a large model by building a smaller model, with much fewer parameters, that approximates the output distribution of the original model, typically for increased efficiency. We discussed how model distillation was typically done previously, and then focused on the specifics of DistilBERT, including training objective, empirical results, ablations etc. We finally discussed what kinds of information you might lose when doing model distillation.
Mon, 03 Feb 2020 - 31min
104 - 103 - Processing Language in Social Media, with Brendan O'Connor
We talked to Brendan O’Connor for this episode about processing language in social media. Brendan started off by telling us about his projects that studied the linguistic and geographical patterns of African American English (AAE), and how obtaining data from Twitter made these projects possible. We then talked about how many tools built for standard English perform very poorly on AAE, and why collecting dialect-specific data is important. For the rest of the conversation, we discussed the issues involved in scraping data from social media, including ethical considerations and the biases that the data comes with. Brendan O’Connor is an Assistant Professor at the University of Massachusetts, Amherst. Warning: This episode contains explicit language (one swear word).
Mon, 27 Jan 2020 - 43min
103 - 102 - Biomedical NLP research at the National Institute of Health with Dina Demner-Fushman
What exciting NLP research problems are involved in processing biomedical and clinical data? In this episode, we spoke with Dina Demner-Fushman, who leads NLP and IR research at the Lister Hill National Center for Biomedical Communications, part of the National Library of Medicine. We talked about processing biomedical scientific literature, understanding clinical notes, and answering consumer health questions, and the challenges involved in each of these applications. Dina listed some specific tasks and relevant data sources for NLP researchers interested in such applications, and concluded with some pointers to getting started in this field.
Mon, 20 Jan 2020 - 36min
102 - 101 - The lottery ticket hypothesis, with Jonathan Frankle
In this episode, Jonathan Frankle describes the lottery ticket hypothesis, a popular explanation of how over-parameterization helps in training neural networks. We discuss pruning methods used to uncover subnetworks (winning tickets) which were initialized in a particularly effective way. We also discuss patterns observed in pruned networks, stability of networks pruned at different time steps and transferring uncovered subnetworks across tasks, among other topics. A recent paper on the topic by Frankle and Carbin, ICLR 2019: https://arxiv.org/abs/1803.03635 Jonathan Frankle’s homepage: http://www.jfrankle.com/
Tue, 14 Jan 2020 - 41min
101 - 100 - NLP Startups, with Oren Etzioni
For our 100th episode, we invite AI2 CEO Oren Etzioni to talk to us about NLP startups. Oren has founded several successful startups, is himself an investor in startups, and helps with AI2's startup incubator. Some of our discussion topics include: What's the similarity between being a researcher and an entrepreneur? How do you transition from being a researcher to doing a startup? How do you evaluate early-stage startups? What advice would you give to a researcher who's thinking about a startup? What are some typical mistakes that you've seen startups make? Along the way, Oren predicts a that we'll see a whole generation of startup companies based on the technology underlying ELMo, BERT, etc.
Wed, 08 Jan 2020 - 30min
100 - 99 - Evaluating Protein Transfer Learning, With Roshan Rao And Neil Thomas
For this episode, we chatted with Neil Thomas and Roshan Rao about modeling protein sequences and evaluating transfer learning methods for a set of five protein modeling tasks. Learning representations using self-supervised pretaining objectives has shown promising results in transferring to downstream tasks in protein sequence modeling, just like it has in NLP. We started off by discussing the similarities and differences between language and protein sequence data, and how the contextual embedding techniques are applicable also to protein sequences. Neil and Roshan then described a set of five benchmark tasks to assess the quality of protein embeddings (TAPE), particularly in terms of how well they capture the structural, functional, and evolutionary aspects of proteins. The results from the experiments they ran with various model architectures indicated that there was not a single best performing model across all tasks, and that there is a lot of room for future work in protein sequence modeling. Neil Thomas and Roshan Rao are PhD students at UC Berkeley. Paper: https://www.biorxiv.org/content/10.1101/676825v1 Blog post: https://bair.berkeley.edu/blog/2019/11/04/proteins/
Mon, 16 Dec 2019 - 44min
99 - 98 - Analyzing Information Flow In Transformers, With Elena Voita
What function do the different attention heads serve in multi-headed attention models? In this episode, Lena describes how to use attribution methods to assess the importance and contribution of different heads in several tasks, and describes a gating mechanism to prune the number of effective heads used when combined with an auxiliary loss. Then, we discuss Lena’s work on studying the evolution of representations of individual tokens in transformers model. Lena’s homepage: https://lena-voita.github.io/ Blog posts: https://lena-voita.github.io/posts/acl19_heads.html https://lena-voita.github.io/posts/emnlp19_evolution.html Papers: https://arxiv.org/abs/1905.09418 https://arxiv.org/abs/1909.01380
Mon, 09 Dec 2019 - 37min
98 - 97 - Automated Analysis Of Historical Printed Documents, With Taylor Berg-Kirkpatrick
In this episode, we talk to Taylor Berg-Kirkpatrick about optical character recognition (OCR) on historical documents. Taylor starts off by describing some practical issues related to old scanning processes of documents that make performing OCR on them a difficult problem. Then he explains how one can build latent variable models for this data using unsupervised methods, the relative importance of various modeling choices, and summarizes how well the models do. We then take a higher level view of historical OCR as a Machine Learning problem, and discuss how it is different from other ML problems in terms of the tradeoff between learning from data and imposing constraints based on prior knowledge of the underlying process. Finally, Taylor talks about the applications of this research, and how these predictions can be of interest to historians studying the original texts.
Wed, 27 Nov 2019 - 44min
97 - 96 - Question Answering as an Annotation Format, with Luke Zettlemoyer
In this episode, we chat with Luke Zettlemoyer about Question Answering as a format for crowdsourcing annotations of various semantic phenomena in text. We start by talking about QA-SRL and QAMR, two datasets that use QA pairs to annotate predicate-argument relations at the sentence level. Luke describes how this annotation scheme makes it possible to obtain annotations from non-experts, and discusses the tradeoffs involved in choosing this scheme. Then we talk about the challenges involved in using QA-based annotations for more complex phenomena like coreference. Finally, we briefly discuss the value of crowd-labeled datasets given the recent developments in pretraining large language models. Luke is an associate professor at the University of Washington and a Research Scientist at Facebook AI Research.
Tue, 12 Nov 2019 - 29min
96 - 95 - Common sense reasoning, with Yejin Choi
In this episode, we invite Yejin Choi to talk about common sense knowledge and reasoning, a growing area in NLP. We start by discussing a working definition of “common sense” and the practical utility of studying it. We then talk about some of the datasets and resources focused on studying different aspects of common sense (e.g., ReCoRD, CommonsenseQA, ATOMIC) and contrast implicit vs. explicit modeling of common sense, and what it means for downstream applications. To conclude, Yejin shares her thoughts on some of the open problems in this area and where it is headed in the future. Yejin Choi’s homepage: https://homes.cs.washington.edu/~yejin/ ATOMIC: https://homes.cs.washington.edu/~msap/atomic/ ReCoRD: https://sheng-z.github.io/ReCoRD-explorer/ CommonsenseQA: https://www.tau-nlp.org/commonsenseqa
Mon, 07 Oct 2019 - 35min
95 - 94 - Decompositional Semantics, with Aaron White
In this episode, Aaron White tells us about the decompositional semantics initiative (Decomp), an attempt to re-think the prototypical approach to semantic representation and annotation. The basic idea is to decompose complex semantic classes such as ‘agent’ and ‘patient’ into simpler semantic properties such as ‘causation’ and ‘volition’, while embracing the uncertainty inherent in language by allowing annotators to choose answers such as ‘probably’ or ‘probably not’. In order to scale the collection of labeled data, each property is annotated by asking crowd workers intuitive questions about phrases in a given sentence. Aaron White's homepage: http://aaronstevenwhite.io/ Decomp initiative page: http://decomp.io/
Mon, 30 Sep 2019 - 27min
94 - 93 - NLP/ML for clinical data, with Alistair Johnson
In this episode, we invite Alistair Johnson to discuss the main challenge in applying NLP/ML to clinical domains: the lack of data. We discuss privacy concerns, de-identification, synthesizing records, legal liabilities and data heterogeneity. We also discuss how the MIMIC dataset evolved over the years, how it is being used, and some of the under-explored ways in which it can be used. Alistair’s homepage: http://alistairewj.github.io/ MIMIC dataset: https://mimic.physionet.org/
Mon, 22 Jul 2019 - 37min
93 - 92 - Computational Humanities, with David Bamman
In this episode, we invite David Bamman to give an overview of computational humanities. We discuss examples of questions studied in computational humanities (e.g., characterizing fictionality, assessing novelty, measuring the attention given to male vs. female characters in the literature). We talk about the role NLP plays in addressing these questions and how the accuracy and biases of NLP models can influence the results. We also discuss understudied NLP tasks which can help us answer more questions in this domain such as literary scene coreference resolution and constructing a map of literature geography. David Bamman's homepage: http://people.ischool.berkeley.edu/~dbamman/ LitBank dataset: https://github.com/dbamman/litbank
Fri, 05 Jul 2019 - 33min
92 - 91 - (Executable) Semantic Parsing, with Jonathan Berant
In this episode, we invite Jonathan Berant to talk about executable semantic parsing. We discuss what executable semantic parsing is and how it differs from related tasks such as semantic dependency parsing and abstract meaning representation (AMR) parsing. We talk about the main components of a semantic parser, how the formal language affects design choices in the parser, and end with a discussion of some exciting open problems in this space. Jonathan Berant's homepage: http://www.cs.tau.ac.il/~joberant/
Wed, 26 Jun 2019 - 42min
91 - 90 - Research in Academia versus Industry, with Philip Resnik and Jason Baldridge
How is it like to do research in academia vs. industry? In this episode, we invite Jason Baldridge (UT Austin => Google) and Philip Resnik (Sun Microsystems => UMD) to discuss some of the aspects one may want to consider when planning their research careers, including flexibility, security and intellectual freedom. Perhaps most importantly, we discuss how the career choices we make influence and are influenced by the relationships we forge. Check out the Careers in NLP Panel at NAACL'19 on Monday, June 3, 2019 for further discussion. Careers in NLP panel @ NAACL'19: https://naacl2019.org/blog/careers-panel-survey/ Jason Baldridge's homepage: http://www.jasonbaldridge.com/ Philip Resnik's homepage: http://users.umiacs.umd.edu/~resnik/
Fri, 31 May 2019 - 54min
90 - 89 - Dialog Systems, with Zhou Yu
In this episode, we invite Zhou Yu to give an overview of dialogue systems. We discuss different types of dialogue systems (task-oriented vs. non-task-oriented), the main building blocks and how they relate to other research areas in NLP, how to transfer models across domains, and the different ways used to evaluate these systems. Zhou also shares her thoughts on exciting future directions such as developing dialogue methods for non-cooperative environments (e.g., to negotiate prices) and multimodal dialogue systems (e.g., using video as well as audio/text). Zhou Yu's homepage: http://zhouyu.cs.ucdavis.edu/
Fri, 31 May 2019 - 37min
89 - 88 - A Structural Probe for Finding Syntax in Word Representations, with John Hewitt
In this episode, we invite John Hewitt to discuss his take on how to probe word embeddings for syntactic information. The basic idea is to project word embeddings to a vector space where the L2 distance between a pair of words in a sentence approximates the number of hops between them in the dependency tree. The proposed method shows that ELMo and BERT representations, trained with no syntactic supervision, embed many of the unlabeled, undirected dependency attachments between words in the same sentence. Paper: https://nlp.stanford.edu/pubs/hewitt2019structural.pdf GitHub repository: https://github.com/john-hewitt/structural-probes Blog post: https://nlp.stanford.edu/~johnhew/structural-probe.html Twitter thread: https://twitter.com/johnhewtt/status/1114252302141886464 John's homepage: https://nlp.stanford.edu/~johnhew/
Tue, 07 May 2019 - 40min
88 - 87 - Pathologies of Neural Models Make Interpretation Difficult, with Shi Feng
In this episode, Shi Feng joins us to discuss his recent work on identifying pathological behaviors of neural models for NLP tasks. Shi uses input word gradients to identify the least important word for a model's prediction, and iteratively removes that word until the model prediction changes. The reduced inputs tend to be significantly smaller than the original inputs, e.g., 2.3 words instead of 11.5 in the original in SQuAD, on average. We discuss possible interpretations of these results, and a proposed method for mitigating these pathologies. Shi Feng's homepage: http://users.umiacs.umd.edu/~shifeng/ Paper: https://www.semanticscholar.org/paper/Pathologies-of-Neural-Models-Make-Interpretation-Feng-Wallace/8e141b5cb01c88b315c9a94dc97e50738cc7370d Joint work with Eric Wallace, Alvin Grissom II, Mohit Iyyer, Pedro Rodriguez and Jordan Boyd-Graber
Thu, 25 Apr 2019 - 33min
87 - 86 - NLP for Evidence-based Medicine, with Byron Wallace
In this episode, Byron Wallace tells us about interdisciplinary work between evidence based medicine and natural language processing. We discuss extracting PICO frames from articles describing clinical trials and data available for direct and weak supervision. We also discuss automating the assessment of risks of bias in, e.g., random sequence generation, allocation containment and outcome assessment, which have been used to help domain experts who need to review hundreds of articles. Byron Wallace's homepage: http://www.byronwallace.com/ EBM-NLP dataset: https://ebm-nlp.herokuapp.com/ MIMIC dataset: https://mimic.physionet.org/ Cochrane database of systematic reviews: https://www.cochranelibrary.com/cdsr/about-cdsr The bioNLP workshop at ACL'19 (submission due date was extended to May 10): https://aclweb.org/aclwiki/BioNLP_Workshop The workshop on health text mining and information analysis at EMNLP'19: https://louhi2019.fbk.eu/ Machine learning for healthcare conference: https://www.mlforhc.org/
Mon, 15 Apr 2019 - 32min
86 - 85 - Stress in Research, with Charles Sutton
In this episode, Charles Sutton walks us through common sources of stress for researchers and suggests coping strategies to maintain your sanity. We talk about how pursuing a research career is similar to participating in a life-long international tournament, conflating research worth and self-worth, and how freedom can be both a blessing and a curse, among other stressors one may encounter in a research career. Charles Sutton's homepage: https://homepages.inf.ed.ac.uk/csutton/ A series of blog posts Charles wrote on this topic: http://www.theexclusive.org/tag/stress%20in%20research/
Fri, 29 Mar 2019 - 36min
85 - 84 - Large Teams Develop, Small Groups Disrupt, with Lingfei Wu
In a recent Nature paper, Lingfei Wu (Ling) suggests that smaller teams of scientists tend to do more disruptive work. In this episode, we invite Ling to discuss their results, how they define disruption and possible reasons why smaller teams may be better positioned to do disruptive work. We also touch on robustness of the disruption metric, differences between research disciplines, and sleeping beauties in science. Lingfei Wu’s homepage: https://www.knowledgelab.org/people/detail/lingfei_wu/ Paper: https://www.nature.com/articles/s41586-019-0941-9 Note: Lingfei is on the job market for faculty positions at the intersection of social science, computer science and communication.
Tue, 26 Mar 2019 - 38min
84 - 83 - Knowledge Base Construction, with Sebastian Riedel
In this episode, we invite Sebastian Riedel to talk about knowledge base construction (KBC). Why is it an important research area? What are the tradeoffs between using an open vs. closed schema? What are popular methods currently used, and what challenges prevent the adoption of KBC methods? We also briefly discuss the AKBC workshop and its graduation into a conference in 2019. Sebastian Riedel's homepage: http://www.riedelcastro.org/ AKBC conference: http://www.akbc.ws/2019/
Wed, 13 Mar 2019 - 38min
83 - 82 - Visual Reasoning, with Yoav Artzi
In this episode, Yoav Artzi joins us to talk about visual reasoning. We start by defining what visual reasoning is, then discuss the pros and cons of different tasks and datasets. We discuss some of the models used for visual reasoning and how they perform, before ending with open questions in this young, exciting research area. Yoav Artzi: https://yoavartzi.com/ NLVR: https://github.com/clic-lab/nlvr/tree/master/nlvr NLVR2: https://github.com/clic-lab/nlvr/tree/master/nlvr2 CLEVR dataset: https://cs.stanford.edu/people/jcjohns/clevr/ VQA: https://visualqa.org/ GQA: https://cs.stanford.edu/people/dorarad/gqa/index.html Neural module networks: https://arxiv.org/abs/1511.02799
Wed, 06 Mar 2019 - 42min
82 - 81 - BlackboxNLP, with Afra Alishahi and Tal Linzen
Neural models recently resulted in large performance improvements in various NLP problems, but our understanding of what and how the models learn remains fairly limited. In this episode, Tal Linzen and Afra Alishahi talk to us about BlackboxNLP, an EMNLP’18 workshop dedicated to the analysis and interpretation of neural networks for NLP. In the workshop, computer scientists and cognitive scientists joined forces to probe and analyze neural NLP models. BlackboxNLP 2018 website: https://blackboxnlp.github.io/2018/ BlackboxNLP 2018 proceedings: https://aclanthology.info/events/ws-2018#W18-54 BlackboxNLP 2019 website: https://blackboxnlp.github.io/
Wed, 06 Feb 2019 - 31min
81 - 80 - Leaderboards and Science, with Siva Reddy
Originally used to entice fierce competitions in arcade games, leaderboards recently made their way into NLP research circles. Leaderboards could help mitigate some of the problems in how researchers run experiments and share results (e.g., accidentally overfitting models on a test set), but they also introduce new problems (e.g., breaking author anonymity in peer reviewing). In this episode, Siva Reddy joins us to talk about the good, the bad, and the ugly of using leaderboards in science. We also discuss potential solutions to address some of the outstanding problems with existing leaderboard frameworks. Software platforms for leaderboards: http://codalab.org/ https://leaderboard.allenai.org/
Tue, 29 Jan 2019 - 29min
80 - 79 - The glass ceiling in NLP, with Natalie Schluter
In this episode, Natalie Schluter talks to us about a data-driven analysis of career progression of male vs. female researchers in NLP through the lens of mentor-mentee networks based on ~20K papers in the ACL anthology. Directed edges in the network describe a mentorship relation from the last author on a paper to the last author, and author names were annotated for gender when possible. Interesting observations include the increase of percentage of mentors (regardless of gender), and an increasing gap between the fraction of mentors who are males and females since the early 2000s. By analyzing the number of years between a researcher’s first publication and the year at which they achieve mentorship status at threshold T, defined by publishing T or more papers as a last author, Natalie also found that female researchers tend to take much longer to be mentors. Another interesting finding is that in-gender mentorship is a strong predictor of the mentee’s success in becoming mentors themselves. Finally, Natalie describes the bias preferential attachment model of Avin et al. (2015) and applies it to the gender-annotated mentor-mentee network in NLP, formally describing a glass ceiling in NLP for female researchers. https://www.semanticscholar.org/paper/The-glass-ceiling-in-NLP-Schluter/abfb1eb2d27194269503afce8be45909c8f86f4b See also: Homophily and the glass ceiling effect in social networks, at ITCS 2015, by Chen Avin, Barbara Keller, Zvi Lotker, Claire Mathieu, David Peleg, and Yvonne-Anne Pignolet. https://www.semanticscholar.org/paper/Homophily-and-the-Glass-Ceiling-Effect-in-Social-Avin-Keller/23dcb12dd918fcf29f7abb287dd466478031b8ff Apologies for the relatively poor audio quality on this one; we did our best.
Mon, 21 Jan 2019 - 26min
79 - 78. Where do corpora come from?, with Matt Honnibal and Ines Montani
Most NLP projects rely crucially on the quality of annotations used for training and evaluating models. In this episode, Matt and Ines of Explosion AI tell us how Prodigy can improve data annotation and model development workflows. Prodigy is an annotation tool implemented as a python library, and it comes with a web application and a command line interface. A developer can define input data streams and design simple annotation interfaces. Prodigy can help break down complex annotation decisions into a series of binary decisions, and it provides easy integration with spaCy models. Developers can specify how models should be modified as new annotations come in in an active learning framework. Prodigy: https://prodi.gy Prodigy recipe scripts: https://github.com/explosion/prodigy-recipes Twitter: https://twitter.com/_inesmontani https://twitter.com/honnibal
Tue, 15 Jan 2019 - 30min
78 - 77. On Writing Quality Peer Reviews, with Noah A. Smith
It's not uncommon for authors to be frustrated with the quality of peer reviews they receive in (NLP) conferences. In this episode, Noah A. Smith shares his advice on how to write good peer reviews. The structure Noah recommends for writing a peer review starts with a dispassionate summary of what a paper has to offer, followed by the strongest reasons the paper may be accepted, followed by the strongest reasons it may be rejected, and concludes with a list of minor, easy-to-fix problems (e.g., typos) which can be easily addressed in the camera ready. Noah stresses on the importance of thinking about how the reviews we write could demoralize (junior) researchers, and how to be precise and detailed when discussing the weaknesses of a paper to help the authors see the path forward. Other questions we discuss in this episode include: How to read a paper for reviewing purposes? How long it takes to review a paper and how many papers to review? What types of mistakes to be on the lookout for while reviewing? How to review pre-published work?
Mon, 07 Jan 2019 - 38min
77 - 76 - Increasing In-Class Similarity by Retrofitting Embeddings with Demographics, with Dirk Hovy
EMNLP 2018 paper by Dirk Hovy and Tommaso Fornaciari. https://www.semanticscholar.org/paper/Improving-Author-Attribute-Prediction-by-Linguistic-Hovy-Fornaciari/71aad8919c864f73108aafd8e926d44e9df51615 In this episode, Dirk Hovy talks about natural language as social phenomenon which can provide insights about those who generate it. For example, this paper uses retrofitted embeddings to improve on two tasks: predicting the gender and age group of a person based on their online reviews. In this approach, authors embeddings are first generated using Doc2Vec, then retrofitted such that authors with similar attributes are closer in the vector space. In order to estimate the retrofitted vectors for authors with unknown attributes, a linear transformation is learned which maps Doc2Vec vectors to the retrofitted vectors. Dirk also used a similar approach to encode geographic information to model regional linguistic variations, in another EMNLP 2018 paper with Christoph Purschke titled “Capturing Regional Variation with Distributed Place Representations and Geographic Retrofitting” [link: https://www.semanticscholar.org/paper/Capturing-Regional-Variation-with-Distributed-Place-Hovy-Purschke/6d9babd835d0cdaaf175f098bb4fd61fd75b1be0].
Tue, 27 Nov 2018 - 29min
76 - 75 - Reinforcement / Imitation Learning in NLP, with Hal Daumé III
In this episode, we invite Hal Daumé to continue the discussion on reinforcement learning, focusing on how it has been used in NLP. We discuss how to reduce NLP problems into the reinforcement learning framework, and circumstances where it may or may not be useful. We discuss imitation learning, roll-in and roll-out, and how to approximate an expert with a reference policy. DAgger: https://www.semanticscholar.org/paper/A-Reduction-of-Imitation-Learning-and-Structured-to-Ross-Gordon/17eddf33b513ae1134abadab728bdbf6abab2a05?navId=citing-papers RESLOPE: http://legacydirs.umiacs.umd.edu/~hal/docs/daume18reslope.pdf
Wed, 21 Nov 2018 - 43min
75 - 74 - Deep Reinforcement Learning Doesn't Work Yet, with Alex Irpan
Blog post by Alex Irpan titled "Deep Reinforcement Learning Doesn't Work Yet" https://www.alexirpan.com/2018/02/14/rl-hard.html In this episode, Alex Irpan talks about limitations of current deep reinforcement learning methods and why we have a long way to go before they go mainstream. We discuss sample inefficiency, instability, the difficulty to design reward functions and overfitting to the environment. Alex concludes with a list of recommendations he found useful when training models with deep reinforcement learning.
Fri, 16 Nov 2018 - 40min
74 - 73 - Supersense Disambiguation of English Prepositions and Possessives, with Nathan Schneider
ACL 2018 paper by Nathan Schneider, Jena D. Hwang, Vivek Srikumar, Jakob Prange, Austin Blodgett, Sarah R. Moeller, Aviram Stern, Adi Bitan, Omri Abend. In this episode, Nathan discusses how the meaning of prepositions varies, proposes a hierarchy for classifying the semantics of function words (e.g., comparison, temporal, purpose), and describes empirical results using the provided dataset for disambiguating preposition semantics. Along the way, we talk about lexicon-based semantics, multilinguality and pragmatics. https://www.semanticscholar.org/paper/Comprehensive-Supersense-Disambiguation-of-English-Schneider-Hwang/8310213af102913b9e74e7dfe6864f3aa62a5a5e
Tue, 13 Nov 2018 - 52min
73 - 72 - The Anatomy Question Answering Task, with Jordan Boyd-Graber
Our first episode in a new format: broader surveys of areas, instead of specific discussions on individual papers. In this episode, we talk with Jordan Boyd-Graber about question answering. Matt starts the discussion by giving five different axes on which question answering tasks vary: (1)how complex is the language in the question, (2)what is the genre of the question / nature of the question semantics, (3)what is the context or knowledge source used to answer the question, (4)how much "reasoning" is required to answer the question, and (5) what's the format of the answer? We talk about each of these in detail, giving examples from Jordan's and others' work. In the end, we conclude that "question answering" is a format to study a particular phenomenon, it is not a "phenomenon" in itself. Sometimes it's useful to pose a phenomenon you want to study as a question answering task, and sometimes it's not. During the conversation, Jordan mentioned the QANTA competition; you can find that here: http://qanta.org. We also talked about an adversarial question creation task for Quiz Bowl questions; the paper on that can be found here: https://www.semanticscholar.org/paper/Trick-Me-If-You-Can%3A-Adversarial-Writing-of-Trivia-Wallace-Boyd-Graber/11caf090fef96605d6d67c7505572b1a26796971.
Tue, 16 Oct 2018 - 43min
72 - 71 - DuoRC: Complex Language Understanding with Paraphrased Reading Comprehension, with Amrita Saha
ACL 2018 paper by Amrita Saha, Rahul Aralikatte, Mitesh M. Khapra, Karthik Sankaranarayanan Amrita and colleagues at IBM Research introduced a harder dataset for "reading comprehension", where you have to answer questions about a given passage of text. Amrita joins us on the podcast to talk about why a new dataset is necessary, what makes this one unique and interesting, and how well initial baseline systems perform on it. Along the way, we talk about the problems with using BLEU or ROUGE as evaluation metrics for question answering systems. https://www.semanticscholar.org/paper/DuoRC%3A-Towards-Complex-Language-Understanding-with-Saha-Aralikatte/1e70a4830840d48486ecfbc6c89b774cdd0b6399
Fri, 12 Oct 2018 - 33min
71 - 70 - Measuring the Evolution of a Scientific Field through Citation Frames, with David Jurgens
TACL 2018 paper (presented at ACL 2018) by David Jurgens, Srijan Kumar, Raine Hoover, Daniel A. McFarland, and Daniel Jurafsky David comes on the podcast to talk to us about citation frames. We discuss the dataset they created by painstakingly annotating the "citation type" for all of the citations in a large collection of papers (around 2000 citations in total), then training a classifier on that data to annotate the rest of the ACL anthology. This process itself is interesting, including how exactly the citations are classified, and we talk about this for a bit. The second half of the podcast talks about the analysis that David and colleagues did using the (automatically) annotated ACL anthology, trying to gauge how the field has changed over time. https://www.semanticscholar.org/paper/Measuring-the-Evolution-of-a-Scientific-Field-Jurgens-Kumar/65118f3a7463f54bdf9b9e5cdd655953a2488c2f
Tue, 18 Sep 2018 - 40min
70 - 69 - Second language acquisition modeling, with Burr Settles
A shared task held in conjunction with a NAACL 2018 workshop, organized by Burr Settles and collaborators at Duolingo. Burr tells us about the shared task. The goal of the task was to predict errors that a language learner would make when doing exercises on Duolingo. We talk about the details of the data, why this particular data is interesting to study for second language acquisition, what could be better about it, and what systems people used to approach this task. We also talk a bit about what you could do with a system that can predict these kinds of errors to build better language learning systems. https://www.semanticscholar.org/paper/Second-Language-Acquisition-Modeling-Settles-Brust/10685728fab1dfe9d1cf0cd4240ed687dd601ac6
Mon, 10 Sep 2018 - 34min
69 - 68 - Neural models of factuality, with Rachel Rudinger
NAACL 2018 paper, by Rachel Rudinger, Aaron Steven White, and Benjamin Van Durme Rachel comes on to the podcast, telling us about what factuality is (did an event happen?), what datasets exist for doing this task (a few; they made a new, bigger one), and how to build models to predict factuality (turns out a vanilla biLSTM does quite well). Along the way, we have interesting discussions about how you decide what an "event" is, how you label factuality (whether something happened) on inherently uncertain text (like "I probably failed the test"), and how you might use a system that predicts factuality in some end task. https://www.semanticscholar.org/paper/Neural-models-of-factuality-Rudinger-White/4d62a1e7819f9e3f8c837832c66659db5a6d9b37
Tue, 04 Sep 2018 - 36min
68 - 67 - GLUE: A Multi-Task Benchmark and Analysis Platform, with Sam Bowman
Paper by Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. Sam comes on to tell us about GLUE. We talk about the motivation behind setting up a benchmark framework for natural language understanding, how the authors defined "NLU" and chose the tasks for this benchmark, a very nice diagnostic dataset that was constructed for GLUE, and what insight they gained from the experiments they've run so far. We also have some musings about the utility of general-purpose sentence vectors, and about leaderboards. https://www.semanticscholar.org/paper/GLUE%3A-A-Multi-Task-Benchmark-and-Analysis-Platform-Wang-Singh/a2054eff8b4efe0f1f53d88c08446f9492ae07c1
Mon, 27 Aug 2018 - 39min
67 - 66 - Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods, with Jieyu Zhao
NACL 2018 paper, by Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. Jieyu comes on the podcast to talk about bias in coreference resolution models. This bias makes models rely disproportionately on gender when making decisions for whether "she" refers to a noun like "secretary" or "physician". Jieyu and her co-authors show that coreference systems do not actually exhibit much bias in standard evaluation settings (OntoNotes), perhaps because there is a broad document context to aid in making coreference decisions. But they then construct a really nice diagnostic dataset that isolates simple coreference decisions, and evaluates whether the model is using common sense, grammar, or gender bias to make those decisions. This dataset shows that current models are quite biased, particularly when it comes to common sense, using gender to make incorrect coreference decisions. Jieyu then tells us about some simple methods to correct the bias without much of a drop in overall accuracy. https://www.semanticscholar.org/paper/Gender-Bias-in-Coreference-Resolution%3A-Evaluation-Zhao-Wang/e4a31322ed60479a6ae05d1f2580dd0fa2d77e50 Also, there was a very similar paper also published at NAACL 2018 that used similar methodology and constructed a similar dataset: https://www.semanticscholar.org/paper/Gender-Bias-in-Coreference-Resolution-Rudinger-Naradowsky/be2c8b5ec0eee2f32da950db1b6cf8cc4a621f8f.
Mon, 20 Aug 2018 - 26min
66 - 65 - Event Representations with Tensor-based Compositions, with Niranjan Balasubramanian
AAAI 2018 paper by Noah Weber, Niranjan Balasubramanian, and Nathanael Chambers Niranjan joins us on the podcast to tell us about his latest contribution in a line of work going back to Shank's scripts. This work tries to model sequences of events to get coherent narrative schemas, mined from large collections of text. For example, given an event like "She threw a football", you might expect future events involving catching, running, scoring, and so on. But if the event is instead "She threw a bomb", you would expect future events to involve things like explosions, damage, arrests, or other related things. We spend much of our conversation talking about why these scripts are interesting to study, and the general outline for how one might learn these scripts from text, and spend a little bit of time talking about the particular contribution of this paper, which is a better model that captures interactions among all of the arguments to an event. https://www.semanticscholar.org/paper/Event-Representations-With-Tensor-Based-Weber-Balasubramanian/418f405a60b8d9009099777f7ae37f4496542f90
Mon, 13 Aug 2018 - 38min
65 - 64 - Neural Network Models for Sentence Pair Tasks, with Wuwei Lan and Wei Xu
Best reproduction paper at COLING 2018, by Wuwei Lan and Wei Xu. This paper takes a bunch of models for sentence pair classification (including paraphrase identification, semantic textual similarity, natural language inference / entailment, and answer sentence selection for QA) and compares all of them on all tasks. There's a very nice table in the paper showing the cross product of models and datasets, and how by looking at the original papers this table is almost empty; Wuwei and Wei fill in all of the missing values in that table with their own experiments. This is a very nice piece of work that helps us gain a broader understanding of how these models perform in diverse settings, and it's awesome that COLING explicitly asked for and rewarded this kind of paper, as it's not your typical "come look at my shiny new model!" paper. Our discussion with Wuwei and Wei covers what models and datasets the paper looked at, why the datasets can be treated similarly (and some reasons for why maybe they should be treated differently), the differences between the models that were tested, and the difficulties of reproducing someone else's model. https://www.semanticscholar.org/paper/Neural-Network-Models-for-Paraphrase-Semantic-and-Lan-Xu/6c990c162816bff2133a8e0ed9719bd0f87ae9d9
Wed, 08 Aug 2018 - 36min
64 - 63 - Neural Lattice Language Models, with Jacob Buckman
TACL 2018 paper by Jacob Buckman and Graham Neubig. Jacob tells us about marginalizing over latent structure in a sentence by doing a clever parameterization of a lattice with a model kind of like a tree LSTM. This lets you treat collocations as multi-word units, or allow words to have multiple senses, without having to commit to a particular segmentation or word sense disambiguation up front. We talk about how this works and what comes out. One interesting result that comes out of the sense lattice: learning word senses from a language modeling objective tends to give you senses that capture the mode of the "next word" distribution, like uses of "bank" that are always followed by "of". Helpful for local perplexity, but not really what you want if you're looking for semantic senses, probably. https://www.semanticscholar.org/paper/Neural-Lattice-Language-Models-Buckman-Neubig/f36b961ea5106c19c341763bd9942c1f09038e5d
Thu, 02 Aug 2018 - 30min
63 - 62 - Sounding Board: A User-Centric and Content-Driven Social Chatbot, with Hao Fang
NAACL 2018 demo paper, by Hao Fang, Hao Cheng, Maarten Sap, Elizabeth Clark, Ari Holtzman, Yejin Choi, Noah A. Smith, and Mari Ostendorf Sounding Board was the system that won the 2017 Amazon Alexa Prize, a competition to build a social chatbot that interacts with users as an Alexa skill. Hao comes on the podcast to tell us about the project. We talk for a little bit about how Sounding Board works, but spend most of the conversation talking about what these chatbots can do - the competition setup, some example interactions, the limits of current systems, and how chatbots might be more useful in the future. Even the best current systems seem pretty limited, but the potential future uses are compelling enough to warrant continued research. https://www.semanticscholar.org/paper/Sounding-Board%3A-A-User-Centric-and-Content-Driven-Fang-Cheng/b540fd427a02b19c6ea55dd7d9758ebf15ec3965
Mon, 30 Jul 2018 - 31min
62 - 61 - Neural Text Generation in Stories, with Elizabeth Clark and Yangfeng Ji
NAACL 2018 Outstanding Paper by Elizabeth Clark, Yangfeng Ji, and Noah A. Smith Both Elizabeth and Yangfeng come on the podcast to tell us about their work. This paper is an extension of an EMNLP 2017 paper by Yangfeng and co-authors that introduced a language model that included explicit entity representations. Elizabeth and Yangfeng take that model, improve it a bit, and use it for creative narrative generation, with some interesting applications. We talk a little bit about the model, but mostly about how the model was used to generate narrative text, how it was evaluated, and what other interesting applications there are of this idea. The punchline is that this model does a better job at generating coherent stories than other generation techniques, because it can track the entities in the story better. We've been experimenting with how we record the audio, trying to figure out how to get better audio quality. Sadly, this episode was a failed experiment, and there is a background hiss that we couldn't get rid of. Bear with us as we work on this... https://www.semanticscholar.org/paper/Generation-in-Stories-Using-Entity-Representations-Clark-Ji/56df50601975f6065b8acc0a08c169aaecad97bc
Mon, 23 Jul 2018 - 30min
61 - 60 - FEVER: a large-scale dataset for Fact Extraction and VERification, with James Thorne
NAACL 2018 paper by James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal James tells us about his paper, where they created a dataset for fact checking. We talk about how this dataset relates to other datasets, why a new one was needed, how it was built, and how well the initial baseline does on this task. There are some interesting side notes on bias in dataset construction, and on how "fact checking" relates to "fake news" ("fake news" could mean that an article is actively trying to deceive or mislead you; "fact checking" here is just determining if a single claim is true or false given a corpus of assumed-correct reference material). The baseline system does quite poorly, and the lowest-hanging fruit seems to be in improving the retrieval component that finds relevant supporting evidence for claims. There's a workshop and shared task coming up on this dataset: http://fever.ai/. The shared task test period starts on July 24th - get your systems ready! https://www.semanticscholar.org/paper/FEVER%3A-a-Large-scale-Dataset-for-Fact-Extraction-Thorne-Vlachos/7b1f840ecfafb94d2d9e6e926696dba7fad0bb88
Thu, 28 Jun 2018 - 28min
60 - 59 - Weakly Supervised Semantic Parsing With Abstract Examples, with Omer Goldman
ACL 2018 paper by Omer Goldman, Veronica Latcinnik, Udi Naveh, Amir Globerson, and Jonathan Berant Omer comes on to tell us about a class project (done mostly by undergraduates!) that made it into ACL. Omer and colleagues built a semantic parser that gets state-of-the-art results on the Cornell Natural Language Visual Reasoning dataset. They did this by using "abstract examples" - they replaced the entities in the questions and corresponding logical forms with their types, labeled about a hundred examples in this abstracted formalism, and used those labels to do data augmentation and train their parser. They also used some interesting caching tricks, and a discriminative reranker. https://www.semanticscholar.org/paper/Weakly-supervised-Semantic-Parsing-with-Abstract-Goldman-Latcinnik/5aec2ab5bf2979da067e2aa34762b589a0680030
Tue, 12 Jun 2018 - 35min
59 - 58 - Learning What’s Easy: Fully Differentiable Neural Easy-First Taggers, with André Martins
EMNLP 2017 paper by André F. T. Martins and Julia Kreutzer André comes on the podcast to talk to us the paper. We spend the bulk of the time talking about the two main contributions of the paper: how they applied the notion of "easy first" decoding to neural taggers, and the details of the constrained softmax that they introduced to accomplish this. We conclude that "easy first" might not be the right name for this - it's doing something that in the end is very similar to stacked self-attention, with standard independent decoding at the end. The particulars of the self-attention are inspired by "easy first", however, using a constrained softmax to enforce some novel constraints on the self-attention. https://www.semanticscholar.org/paper/Learning-What's-Easy%3A-Fully-Differentiable-Neural-Martins-Kreutzer/252571243aa4c0b533aa7fc63f88d07fd844e7bb
Fri, 08 Jun 2018 - 47min
58 - 57 - A Survey Of Cross-lingual Word Embedding Models, with Sebastian Ruder
Upcoming JAIR paper by Sebastian Ruder, Ivan Vulić, and Anders Søgaard. Sebastian comes on to tell us about his survey. He creates a typology of cross-lingual word embedding methods, and we discuss why you might use cross-lingual embeddings (low-resource languages in particular), what information they capture (semantics? syntax? both?), how the methods work (lots of different ways), and how to evaluate the embeddings (best when you have an extrinsic task to evaluate on). https://www.semanticscholar.org/paper/A-survey-of-cross-lingual-embedding-models-Ruder/3dbd28c63a7807280c9531735c715d4598024166
Tue, 05 Jun 2018 - 31min
57 - 56 - Deep contextualized word representations, with Matthew Peters
NAACL 2018 paper, by Matt Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Chris Clark, Kenton Lee, and Luke Zettlemoyer. In this episode, AI2's own Matt Peters comes on the show to talk about his recent work on ELMo embeddings, what some have called "the next word2vec". Matt has shown very convincingly that using a pre-trained bidirectional language model to get contextualized word representations performs substantially better than using static word vectors. He comes on the show to give us some more intuition about how and why this works, and to talk about some of the other things he tried and what's coming next. https://www.semanticscholar.org/paper/Deep-contextualized-word-representations-Peters-Neumann/4b17597b856c087f109381ce77d60d9017cb6f9a
Wed, 04 Apr 2018 - 30min
56 - 55 - Matchbox: Dispatch-driven autobatching for imperative deep learning, with James Bradbury
In this episode, we take a more systems-oriented approach to NLP, looking at issues with writing deep learning code for NLP models. As a lot of people have discovered over the last few years, efficiently batching multiple examples together for fast training on a GPU can be very challenging with complex NLP models. James Bradbury comes on to tell us about Matchbox, his recent effort to provide a framework for automatic batching with pytorch. In the discussion, we talk about why batching is hard, why it's important, how other people have tried to solve this problem in the past, and what James' solution to the problem is. Code is available here: https://github.com/salesforce/matchbox
Wed, 28 Mar 2018 - 31min
55 - 54 - Simulating Action Dynamics with Neural Process Networks, with Antoine Bosselut
ICLR 2018 paper, by Antoine Bosselut, Omer Levy, Ari Holtzman, Corin Ennis, Dieter Fox, and Yejin Choi. This is not your standard NLP task. This work tries to predict which entities change state over the course of a recipe (e.g., ingredients get combined into a batter, so entities merge, and then the batter gets baked, changing location, temperature, and "cookedness"). We talk to Antoine about the work, getting into details about how the data was collected, how the model works, and what some possible future directions are. https://www.semanticscholar.org/paper/Simulating-Action-Dynamics-with-Neural-Process-Bosselut-Levy/dc01c9401d1caab7f5e6d2f1280f5815f6919977
Mon, 26 Mar 2018 - 36min
54 - 53 - Classical Structured Prediction Losses for Sequence to Sequence Learning, with Sergey and Myle
NAACL 2018 paper, by Sergey Edunov, Myle Ott, Michael Auli, David Grangier, and Marc'Aurelio Ranzato, from Facebook AI Research In this episode we continue our theme from last episode on structured prediction, talking with Sergey and Myle about their paper. They did a comprehensive set of experiments comparing many prior structured learning losses, applied to neural seq2seq models. We talk about the motivation for their work, what turned out to work well, and some details about some of their loss functions. They introduced a notion of a "pseudo reference", replacing the target output sequence with the highest scoring output on the beam during decoding, and we talk about some of the implications there. It also turns out the minimizing expected risk was the best overall training procedure that they found for these structured models. https://www.semanticscholar.org/paper/Classical-Structured-Prediction-Losses-for-Sequence-Edunov-Ott/20ae11c08c6b0cd567c486ba20f44bc677f2ed23
Wed, 21 Mar 2018 - 26min
53 - 52 - Sequence-to-Sequence Learning as Beam-Search Optimization, with Sam Wiseman
EMNLP 2016 paper by Sam Wiseman and Sasha Rush. In this episode we talk with Sam about a paper from a couple of years ago on bringing back some ideas from structured prediction into neural seq2seq models. We talk about the classic problems in structured prediction of exposure bias, label bias, and locally normalized models, how people used to solve these problems, and how we can apply those solutions to modern neural seq2seq architectures using a technique that Sam and Sasha call Beam Search Optimization. (Note: while we said in the episode that BSO with beam size of 2 is equivalent to a token-level hinge loss, that's not quite accurate; it's close, but there are some subtle differences.) https://www.semanticscholar.org/paper/Sequence-to-Sequence-Learning-as-Beam-Search-Optim-Wiseman-Rush/28703eef8fe505e8bd592ced3ce52a597097b031
Thu, 15 Mar 2018 - 23min
52 - 51 - A Regularized Framework for Sparse and Structured Neural Attention, with Vlad Niculae
NIPS 2017 paper by Vlad Niculae and Mathieu Blondel. Vlad comes on to tell us about his paper. Attentions are often computed in neural networks using a softmax operator, which maps scalar outputs from a model into a probability space over latent variables. There are lots of cases where this is not optimal, however, such as when you really want to encourage a sparse attention over your inputs, or when you have additional structural biases that could inform the model. Vlad and Mathieu have developed a theoretical framework for analyzing the options in this space, and in this episode we talk about that framework, some concrete instantiations of attention mechanisms from the framework, and how well these work.
Mon, 12 Mar 2018 - 16min
51 - 50 - Cardinal Virtues: Extracting Relation Cardinalities from Text, with Paramita Mirza
ACL 2017 paper, by Paramita Mirza, Simon Razniewski, Fariz Darari, and Gerhard Weikum. There's not a whole lot of work on numbers in NLP, and getting good information out of numbers expressed in text can be challenging. In this episode, Paramita comes on to tell us about her efforts to use distant supervision to learn models that extract relation cardinalities from text. That is, given an entity and a relation in a knowledge base, like "Barack Obama" and "has child", the goal is to extract _how many_ related entities there are (in this case, two). There are a lot of challenges in getting this to work well, and Paramita describes some of those, and how she solved them. https://www.semanticscholar.org/paper/Cardinal-Virtues-Extracting-Relation-Cardinalities-Mirza-Razniewski/01afba9f40e0df06446b9cd3d5ea8725c4ba1342
Wed, 14 Feb 2018 - 27min
50 - 49 - A Joint Sequential and Relational Model for Frame-Semantic Parsing, with Bishan Yang
EMNLP 2017 paper by Bishan Yang and Tom Mitchell. Bishan tells us about her experiments on frame-semantic parsing / semantic role labeling, which is trying to recover the predicate-argument structure from natural language sentences, as well as categorize those structures into a pre-defined event schema (in the case of frame-semantic parsing). Bishan had two interesting ideas here: (1) use a technique similar to model distillation to combine two different model structures (her "sequential" and "relational" models), and (2) use constraints on arguments across frames in the same sentence to get a more coherent global labeling of the sentence. We talk about these contributions, and also touch on "open" versus "closed" semantics, in both predicate-argument structure and information extraction. https://www.semanticscholar.org/paper/A-Joint-Sequential-and-Relational-Model-for-Frame-Yang-Mitchell/a1deb609e3758519cbe3f1a542bdf1ea52b6f224
Mon, 05 Feb 2018 - 26min
49 - 48 - Incidental Supervision: Moving Beyond Supervised Learning, with Dan Roth
AAAI 2017 paper, by Dan Roth. In this episode we have a conversation with Dan about what he means by "incidental supervision", and how it's related to ideas in reinforcement learning and representation learning. For many tasks, there are signals you can get from seemingly unrelated data that will help you in making predictions. Leveraging the international news cycle to learn transliteration models for named entities is one example of this, as is the current trend in NLP of using language models or other multi-task signals to do better representation learning for your end task. Dan argues that we need to be thinking about this more explicitly in our research, instead of learning everything "end-to-end", as we will never have enough data to learn complex tasks directly from annotations alone. https://www.semanticscholar.org/paper/Incidental-Supervision-Moving-beyond-Supervised-Le-Roth/2997dcfc6d5ffc262d57d0a26f74d091de096573
Mon, 29 Jan 2018 - 27min
48 - 47 - Dynamic integration of background knowledge in neural NLU systems, with Dirk Weißenborn
How should you incorporate background knowledge into a neural net? A lot of people have been thinking about this problem, and Dirk Weissenborn comes on to tell us about his work in this area. Paper is with Tomáš Kočiský and Chris Dyer. https://arxiv.org/abs/1706.02596
Wed, 24 Jan 2018 - 35min
47 - 46 - Parsing with Traces, with Jonathan Kummerfeld
TACL 2017 paper by Jonathan K. Kummerfeld and Dan Klein. Jonathan tells us about his work on parsing algorithms that capture traces and null elements in sentence structure. We spend the first third of the conversation talking about what these are and why they are interesting - if you want to correctly handle wh-movement, or coordinating structures, or control structures, or many other phenomena that we commonly see in language, you really want to handle traces and null elements, but most current parsers totally ignore these phenomena. The second third of the conversation is about how the parser works, and we conclude by talking about some of the implications of the work, and where to go next - should we really be pushing harder on capturing linguistic structure when everyone seems to be going towards end-to-end learning on some higher-level task? https://www.semanticscholar.org/paper/Parsing-with-Traces-An-O-n-4-Algorithm-and-a-Struc-Kummerfeld-Klein/af89e56b3d9b720d43cae9f4971928c5cb95cbe3 Jonathan also blogs about papers that he's reading; check out his paper summaries at http://jkk.name/
Mon, 08 Jan 2018 - 39min
46 - 45 - Build It, Break It workshop, with Allyson Ettinger and Sudha Rao
How robust is your NLP system? High numbers on common datasets can be misleading, as most systems are easily fooled by small modifications that would not be hard for humans to understand. Allyson Ettinger, Sudha Rao, Hal Daumé III, and Emily Bender organized a workshop trying to characterize this issue, inviting participants to either build robust systems, or try to break them with targeted examples. Allyson and Sudha come on the podcast to talk about the workshop. We cover the motivation of the workshop, what a "minimal pair" is, what tasks the workshop focused on and why, and what the main takeaways of the workshop were. https://www.semanticscholar.org/paper/Towards-Linguistically-Generalizable-NLP-Systems-A-Ettinger-Rao/8472e999f723a9ccaffc6089b7be1865d8a1b863
Tue, 02 Jan 2018 - 37min

Show More Episodes

Podcasts by Category

NLP Highlights

Podcasts similar to NLP Highlights