driving a boat in the red sea
Senior Research Scientist,
Allen Institute for Artificial Intelligence (AI2),
Seattle, WA, USA.
Waleed Ammar

I am a senior research scientist at the Allen Institute for Artificial Intelligence and an affiliate instructor at UW (University of Washington, Seattle). I develop models for converting natural language text into structured representations. In 2016, I received a Ph.D. degree in artificial intelligence from Carnegie Mellon University. Before pursuing the Ph.D., I was an SDE2 at Microsoft Research, web developer at eSpace Technologies, and teaching assistant at Alexandria University. I was awarded the Google PhD fellowship award and two Microsoft Research Tech Transfer awards. In my ample free time, I record for the NLP highlights podcast, run, juggle and play volleyball.

Publications [semantic scholar, gscholar]
GrapAL: Querying Semantic Scholar's Literature Graph [pdf]
Christine Betts, Joanna Power, Waleed Ammar.
ACL 2019 (demo track).

ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing [pdf]
Mark Neumann, Daniel King, Iz Beltagy, Waleed Ammar.
BioNLP Workshop at ACL 2019.

Structural Scaffolds for Citation Intent Classification in Scientific Publications [pdf]
Arman Cohan, Madeleine van Zuylen, Field Cady, Waleed Ammar.
NAACL 2019

Combining Distant and Direct Supervision for Neural Relation Extraction [pdf]
Iz Beltagy, Kyle Lo, Waleed Ammar.
NAACL 2019.

Ontology Alignment in the Biomedical Domain Using Entity Definitions and Context [pdf]
Lucy Lu Wang, Chandra Bhagavatula, Mark Neumann, Kyle Lo, Chris Wilhelm, Waleed Ammar.
BioNLP Workshop at ACL 2018.

Citation Count Analysis for Papers with Preprints [pdf]
Sergey Feldman, Kyle Lo, Waleed Ammar.
arXiv 2018.

Construction of the Literature Graph in Semantic Scholar [pdf]
Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Crawford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, Vu Ha, Rodney Kinney, Sebastian Kohlmeier, Kyle Lo, Tyler Murray, Hsu-Han Ooi, Matthew Peters, Joanna Power, Sam Skjonsberg, Lucy Lu Wang, Chris Wilhelm, Zheng Yuan, Madeleine van Zuylen, Oren Etzioni.
NAACL 2018 (industry track).

A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications [pdf]
Dongyeop Kang, Waleed Ammar, Bhavana Dalvi, Madeleine van Zuylen, Sebastian Kohlmeier, Eduard Hovy, Roy Schwartz.
NAACL 2018.

Content-Based Citation Recommendation [pdf]
Chandra Bhagavatula, Sergey Feldman, Russell Power, Waleed Ammar.
NAACL 2018.

Extracting Scientific Figures with Distantly Supervised Neural Networks [pdf]
Noah Siegel, Nicholas Lourie, Russell Power, Waleed Ammar.
JCDL 2018.

The AI2 system at SemEval-2017 Task 10: Semi-supervised End-to-end Entity and Relation Extraction [pdf]
Waleed Ammar, Matthew E. Peters, Chandra Bhagavatula, Russell Power.
SemEval 2017 (Task 10 on entity and relation extraction in scientific documents, best end-to-end submission).

Semi-supervised Sequence Tagging with Bidirectional Language Models [pdf]
Matthew E. Peters, Waleed Ammar, Chandra Bhagavatula, Russell Power.
ACL 2017.

Ontology-Aware Token Embeddings for Prepositional Phrase Attachment [pdf]
Pradeep Dasigi, Waleed Ammar, Chris Dyer, Eduard Hovy.
ACL 2017.

DyNet: The Dynamic Neural Network Toolkit [pdf]
Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Yoav Goldberg, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin. arXiv 2017.

Many Languages, One Parser [pdf]
Waleed Ammar, Phoebe Mulcaire, Miguel Ballesteros, Chris Dyer, Noah A. Smith.
TACL 2016.

Massively Multilingual Word Embeddings [pdf]
Waleed Ammar, Phoebe Mulcaire, Yulia Tsvetkov, Guillaume Lample, Chris Dyer, Noah A. Smith.
arXiv 2016.

Unsupervised POS Induction with Word Embeddings [pdf]
Chu-Cheng Lin, Waleed Ammar, Lori Levin, Chris Dyer.
NAACL 2015.

Model Selection for Type-Supervised Learning with Application to POS Tagging [pdf]
Kristina Toutanova, Waleed Ammar, Pallavi Choudhury, Hoifung Poon.
CoNLL 2015.

Constraint-Based Models of Lexical Borrowing [pdf]
Yulia Tsvetkov, Waleed Ammar, Chris Dyer.
NAACL 2015.

Conditional Random Field Autoencoders for Unsupervised Structured Prediction [pdf, talk]
Waleed Ammar, Chris Dyer, Noah Smith.
NIPS 2014.

The CMU Submission for the Shared Task on Language Identification in Code Switched Data [pdf]
Chu-Cheng Lin, Waleed Ammar, Chris Dyer and Lori Levin.
Code Switching Workshop at EMNLP 2014.

The CMU Machine Translation Systems at WMT 2014 [pdf]
Austin Matthews, Waleed Ammar, Archna Batia, Weston Feely, Greg Hanneman, Eva Schlinger, Swabha Swayampidta, Yulia Tsvetkov, Alon Lavie, Chris Dyer.
WMT workshop at ACL 2014.

The CMU Machine Translation Systems at WMT 2013: Syntax, Synthetic Translation Options, and Pseudo-References [pdf]
Waleed Ammar, Victor Chahuneau, Michael Denkowski, Greg Hanneman, Wang Ling, Austin Matthews, Kenton Murray, Nicola Segall, Yulia Tsvetkov, Alon Lavie, Chris Dyer.
WMT workshop at ACL 2013.

Transliteration by Sequence Labeling with Lattice Encoding and Reranking [pdf]
Waleed Ammar, Chris Dyer, Noah Smith.
NEWS workshop at ACL 2012.

Automatic Categorization of Privacy Policies [
Waleed Ammar, Shomir Wilson, Norman Sadeh, Noah Smith.
Tech Report 2012.

Improved Transliteration Mining Using Graph Reinforcement [pdf]
Ali El Kahki, Kareem Darwish, Ahmed Saad El Din, Mohamed Abd El-Wahab, Ahmed Hefny and Waleed Ammar.
EMNLP 2011.

ICE-TEA: In-Context Expansion and Translation of English Abbreviations [pdf]
Waleed Ammar, Kareem Darwish, Ali ElKahki and Khaled Hafez.

Secure localization in wireless sensor networks: a survey [pdf]
Waleed Ammar, Ahmed ElDawy and Moustafa Youssef.
arXiv 2010.

Automatic scoring of online discussion posts [pdf]
Nayer Wanas, Motaz El Saban, Heba Ashour and Waleed Ammar.
WICOW workshop at CIKM 2008.


Syntax-based Augmentation of Statistical Machine Translation Phrase Tables
Achraf Chalabi, Waleed Ammar, Mostafa Ashour.
US Patent, Publication No. US 2012/0296633.

User evaluation in a collaborative online forum
Nayer Wanas, Heba Ashour, Moustafa El-Baradei, Ahmed Morsy, Motaz El Saban and Waleed Ammar.
US patent, Publication No. US 2010/0162135 A1.

Professional Experience

Allen Institute for Artificial Intelligence
Senior Research Scientist (Jun 2018 – Now)
Research Scientist (Aug 2016 – Jun 2018)
Led AI2's research efforts to analyze the scientific literature at scale in context of semanticscholar.org. We develop models to extract meaningful structures (e.g., entities, relationships, figures) and establish connections between different artifacts in the literature (e.g., ontology alignment). We also use macro analysis to address controversial questions in the literature (e.g., association between pre-publishing on arXiv and citation counts).

Software Engineering Intern (Sep 2014 – Dec 2014)
Explored novel methods for large-scale online training of decision forests.

Microsoft Research
Research Intern (May 2013 – Aug 2013)
Explored novel methods for optimization and model selection of unsupervised and semi-supervised learning with lexical constraints.

Microsoft Research
Software Development Engineer II (Dec 2010 – Aug 2011)
Identified deficiencies of machine translated text and worked with researchers of the NLP group to find solutions. I was also responsible for integration of such solutions into the production system.

Microsoft Research
Research Software Development Engineer (Nov 2007 – Nov 2010)
As one of the early employees of the Cairo Microsoft Innovation Center, I collaborated with researchers in MSR to push state of the art in the fields of Data Mining and Natural Language Processing by engineering prototype technologies, writing papers and formulating patents. I was also responsible for the transfer of research prototypes into Microsoft products.

Alexandria University
Teaching Assistant (Aug 2007 – Nov 2007)
Tutored students, held office hours, graded homework and mid-term exams, administrated tests and exams, and assisted professors with laboratory sessions.
Courses: Probability and Statistics I, Technical Writing I, and Introduction to Computers.

eSpace Technologies
Part-Time Software Developer (Jul 2007 – Nov 2007)
My role encompassed design and development of features in web portals as well as identification and resolution of deficiencies in web applications. I also took part in collecting customer requirements.

Intern at the Human Language Technologies Group (Jul 2006 – Aug 2006)
Participated in TREC 2006 genomics track competition. We developed an information retrieval (IR) system capable of answering specific types of questions from within biological documents.

Procter & Gamble (P&G)
Intern on Project Management (Jun 2005 – Aug 2005)
Managed a real-world automation project at P&G powder factory in Egypt. Project scope included automatic identification of objects, semi-automatic acquisition of product type information, and rich web reporting system.

Academic Services
  • Served as co-chair of the demo track at NAACL 2019.
  • Reviewed for the Synthesis Lectures in Human Language Technologies.
  • Serving on the reviewing committee for TACL since 2016.
  • Reviewed for JAIR in 2016, 2018
  • Served on the program committee of NAACL 2016, 2018, 2019.
  • Served on the program committee of ACL 2016, 2017, 2019.
  • Served on the program committee of EACL 2017.
  • Served on the program committee of CoNLL 2017.
  • Served on the program committee of EMNLP 2015, 2016, 2018.
  • Served on the program committee of IJCAI 2015.
  • Served on the program committee of the First Women and Underrepresented Minorities in NLP Workshop (WiNLP) at ACL 2017.
  • Served on the program committee of the 2nd Workshop on Representation Learning for NLP (RepL4NLP) at ACL 2017.
  • Served on the program committee of the NAACL-HLT 2016 workshop on multilingual and crosslingual methods in NLP.
  • Served on the program committee of the NAACL 2015 workshop on vector space modeling for NLP.
  • Helped write a proposal for a multi-million-dollars multi-university NSF project on making privacy policies more usable.
  • Was the PhD student body representative of LTI-CMU 2013.
  • Co-founded the ACM chapter at Alexandria University in 2005.


Activities Log

  • Oren Etzioni --Allen Institute for Artificial Intelligence (ex-manager)
  • Noah A. Smith --University of Washington (PhD advisor)
  • Chris Dyer --Carnegie Mellon University (PhD advisor)
  • Tom Mitchell --Carnegie Mellon University (PhD thesis committee)
  • Kuzman Ganchev --Google Research (PhD thesis committee)
  • Miguel Ballesteros --Carnegie Mellon University (co-author, collaborator on cross-lingual parsing)
  • D. Sculley --Google Research (internship host)
  • Kristina Toutanova --Microsoft Research (co-author, internship host)
  • Kareem Darwish --Qatar Computing Research Institute (co-author, ex-manager)
  • Ayman Kaheel --Yahoo Inc. (ex-manager)
  • Tarek Elabbady --Microsoft Research (ex-manager)
  • Mei-Yuh Hwang --Microsoft Research (ex-manager)
  • Yulia Tsvetkov --Carnegie Mellon University (co-author, colleague)
  • Ahmed Hefny --Carnegie Mellon University (co-author, colleague)
  • Ali ElKahki --Google Research (ex-colleague at MSR Cairo)
  • Chu-Cheng Lin --Carnegie Mellon University (co-author, collaborator on modeling code switching with CRF autoencoders)
  • Phoebe Mulcaire --University of Washington (collaborator on estimating multilingual word embeddings)
  • Pradeep Dasigi --Carnegie Mellon University (collaborator on modeling selectional preferences with CRF autoencoders)
  • Moustafa Youssef --Egypt-Japan University of Science and Technology (co-author, M.Sc. ex-advisor)
  • Jeffrey Micher --US Army Research Lab (collaborator on the low-density MT project)
  • Norman Sadeh --Carnegie Mellon University (lead principal investigator of the usable privacy policy project)
  • George Foster --National Research Council Canada (Google fellowship research mentor)
  • Lori Levin --Carnegie Mellon University (co-author)
  • Jaime Carbonell --Carnegie Mellon University (department head, lead principal investigator of the low-density MT project)

Recent Projects
  • Language-universal dependency parsing* (code).
  • CRF autoencoder models for Scalable and feature-rich unsupervised learning* (code).
  • Multilingual word embeddings (unification-based*).
  • A universal depenency treebanks analyzer* (code).
  • Large-scale online training of random forests.*
  • Bayesian models for record linkage* (code).
  • CRF model for transliteration* (code).
  • Dual decomposition of a CFG parser and a POS tagger* (code)
  • A bunch of handy C, C++ and python utilities* (code).
  • Privacy policy crawler* (code).
  • C++ library for training recurrent neural network (code).
  • A neural network model which generalizes CRF autoencoders, for modeling selectional preferences (code).
  • A computational model for linguistic borrowing (code).
  • Semi-supervised learning for token-level language identification. (task, Twitter results, surprise genre results)
  • Improved training and model selection of unsupervised sequence-labeling models with lexical constraints.
  • Yet another implementation of the dependency parsing with DMV* (code).
  • Yet another implementation of logistic regression* (code).
  • Yet another implementation of word-alignment induced preordering for machine translation* (code).
Projects led by me are marked with *