Holistic Intelligence for Global Good

Privilege comes with the responsibility of extending it to those who don’t have it.

I tackle complex, socio-technical problems with long-term impact, in collaboration with high-performing, world-class R&D teams. My north star is leveraging multi-disciplinary science to safeguard vulnerable populations, regardless of what language they speak. If your work is aligned, do not hesitate to drop me a line at /waleed.ammar@gmail/.

Intellectual interests

Does AI-mediation diminish trust in human-human communication? Medium, May 2026
How do human and machine agents co-evolve? Medium, May 2026
A free agent. Medium, May 2026
Stop hate crime. Medium, May 2026
Why I work. Medium, April 2026
[Human vs. AI] Can you put time in a bottle? Medium, April 2026
American Gen alpha’s interpretation of Greek mythology. Medium, November 2025
TyDi QA-WANA: A Benchmark for Information-Seeking Question Answering in Languages of West Asia and North Africa. arXiv, 2025
Podcast interview: “Can AI help scientists solve problems?” September 2024
Invited talk, British Columbia NLP: “Can science help us reduce inequality gaps?” 2024
PRESTO: A Multilingual Dataset for Parsing Realistic Task-Oriented Dialogs. EMNLP, 2023
AI Safety in 2023. Medium, November 2023
How to belong without losing yourself? Medium, September 2023
Hallucination as a competitive advantage. Medium, August 2023
Podcast interview: “AI and its Impact on Society” (in Arabic). July 2023
DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer. Nature Biotechnology, 2022
DeepConsensus: Gap-Aware Sequence Transformers for Sequence Correction. BioRxiv, 2021
How do we build genomic LLMs? 2019–2021
Keynote, AKBC 2019: “Taming the scientific literature: progress and challenges.” 2019
Quantifying Sex Bias in Clinical Studies at Scale With Automated Data Extraction. JAMA Network Open, 2019
Extracting Evidence of Supplement-Drug Interactions from Literature (supp.ai). arXiv, 2019
GrapAL: Querying Semantic Scholar’s Literature Graph. ACL 2019 (demo track)
ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing. BioNLP Workshop at ACL, 2019
Structural Scaffolds for Citation Intent Classification in Scientific Publications. NAACL, 2019
Combining Distant and Direct Supervision for Neural Relation Extraction. NAACL, 2019
NLP Highlights, ep. 96: “Question Answering as an Annotation Format,” with Luke Zettlemoyer. 2019
NLP Highlights, ep. 88: “A Structural Probe for Finding Syntax in Word Representations,” with John Hewitt. 2019
NLP Highlights, ep. 87: “Pathologies of Neural Models Make Interpretation Difficult,” with Shi Feng. 2019
NLP Highlights, ep. 81: “BlackboxNLP,” with Afra Alishahi and Tal Linzen. February 2019
NLP Highlights, ep. 80: “Leaderboards and Science,” with Siva Reddy. January 2019
Adjunct faculty, University of Washington. 2018–2024
Ontology Alignment in the Biomedical Domain Using Entity Definitions and Context. BioNLP Workshop at ACL, 2018
Citation Count Analysis for Papers with Preprints. arXiv, 2018
Construction of the Literature Graph in Semantic Scholar. NAACL 2018 (industry track)
A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications. NAACL, 2018
Content-Based Citation Recommendation. NAACL, 2018
Extracting Scientific Figures with Distantly Supervised Neural Networks. JCDL, 2018
NLP Highlights, ep. 77: “On Writing Quality Peer Reviews,” with Noah A. Smith. 2018
NLP Highlights, ep. 70: “Measuring the Evolution of a Scientific Field through Citation Frames,” with David Jurgens. September 2018
NLP Highlights, ep. 62: “Sounding Board: A User-Centric and Content-Driven Social Chatbot,” with Hao Fang. 2018
NLP Highlights, ep. 60: “FEVER: a large-scale dataset for Fact Extraction and VERification,” with James Thorne. June 2018
NLP Highlights, ep. 56: “Deep contextualized word representations” (ELMo), with Matthew Peters. 2018
NLP Highlights, ep. 52: “Sequence-to-Sequence Learning as Beam-Search Optimization,” with Sam Wiseman. 2018
NLP Highlights, ep. 50: “Cardinal Virtues: Extracting Relation Cardinalities from Text,” with Paramita Mirza. 2018
NLP Highlights, podcast co-host with Matt Gardner. 2017–
The AI2 system at SemEval-2017 Task 10: Semi-supervised End-to-end Entity and Relation Extraction. SemEval, 2017
Semi-supervised Sequence Tagging with Bidirectional Language Models. ACL, 2017
Ontology-Aware Token Embeddings for Prepositional Phrase Attachment. ACL, 2017
DyNet: The Dynamic Neural Network Toolkit. arXiv, 2017
Keynote, 6th International Workshop on Mining Scientific Publications (JCDL). 2017
Invited talk, Institute for Health Metrics and Evaluation, Seattle. 2017
NLP Highlights, ep. 36: “Attention Is All You Need,” with Ashish Vaswani and Jakob Uszkoreit. 2017
NLP Highlights, ep. 31: “Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling.” 2017
NLP Highlights, ep. 28: “Data Programming: Creating Large Training Sets, Quickly.” 2017
NLP Highlights, ep. 20: “A simple neural network module for relational reasoning.” 2017
NLP Highlights, ep. 17: “pix2code: Generating Code from a Graphical User Interface Screenshot.” 2017
NLP Highlights, ep. 12: “Supervised Learning of Universal Sentence Representations from Natural Language Inference Data.” 2017
How do we model the scientific literature at scale? (as R&D Lead of Semantic Scholar, Allen Institute for Artificial Intelligence 2016–2019). 2016–2019
Many Languages, One Parser. TACL, 2016
Massively Multilingual Word Embeddings. arXiv, 2016
PhD in artificial intelligence, Carnegie Mellon University (dissertation). 2011–2016
Unsupervised POS Induction with Word Embeddings. NAACL, 2015
Model Selection for Type-Supervised Learning with Application to POS Tagging. CoNLL, 2015
Constraint-Based Models of Lexical Borrowing. NAACL, 2015
Conditional Random Field Autoencoders for Unsupervised Structured Prediction. NIPS, 2014
The CMU Submission for the Shared Task on Language Identification in Code Switched Data. Code Switching Workshop at EMNLP, 2014
The CMU Machine Translation Systems at WMT 2014. WMT Workshop at ACL, 2014
The CMU Machine Translation Systems at WMT 2013: Syntax, Synthetic Translation Options, and Pseudo-References. WMT Workshop at ACL, 2013
Transliteration by Sequence Labeling with Lattice Encoding and Reranking. NEWS Workshop at ACL, 2012
Automatic Categorization of Privacy Policies. Tech Report, 2012
Improved Transliteration Mining Using Graph Reinforcement. EMNLP, 2011
ICE-TEA: In-Context Expansion and Translation of English Abbreviations. CICLING, 2011
Secure Localization in Wireless Sensor Networks: A Survey. arXiv, 2010
Automatic Scoring of Online Discussion Posts. WICOW Workshop at CIKM, 2008