driving a boat in the red sea
Research Scientist,
Allen Institute for Artificial Intelligence (AI2),
Seattle, WA, USA.
Waleed Ammar

I am a research scientist at the Allen Institute for Artificial Intelligence. I develop models for converting natural language text into structured representations. In 2016, I received a Ph.D. degree in artificial intelligence from Carnegie Mellon University. Before pursuing the Ph.D., I was an SDE2 at Microsoft Research, web developer at eSpace Technologies, and teaching assistant at Alexandria University. I was awarded the Google PhD fellowship award and two Microsoft Research Tech Transfer awards. In my ample free time, I play with his daughter Salma, record for the NLP highlights podcast, run, swim, play tennis and juggle.

Publications [semantic scholar, gscholar]
The AI2 system at SemEval-2017 Task 10: Semi-supervised End-to-end Entity and Relation Extraction [pdf]
Waleed Ammar, Matthew E. Peters, Chandra Bhagavatula, Russell Power.
SemEval 2017 (Task 10 on entity and relation extraction in scientific documents, best end-to-end submission).

Semi-supervised Sequence Tagging with Bidirectional Language Models [pdf]
Matthew E. Peters, Waleed Ammar, Chandra Bhagavatula, Russell Power.
ACL 2017.

Ontology-Aware Token Embeddings for Prepositional Phrase Attachment [pdf]
Pradeep Dasigi, Waleed Ammar, Chris Dyer, Eduard Hovy.
ACL 2017.

DyNet: The Dynamic Neural Network Toolkit [pdf]
Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Yoav Goldberg, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin. arXiv 2017.

Many Languages, One Parser [pdf]
Waleed Ammar, George Mulcaire, Miguel Ballesteros, Chris Dyer, Noah A. Smith.
TACL 2016.

Massively Multilingual Word Embeddings [pdf]
Waleed Ammar, George Mulcaire, Yulia Tsvetkov, Guillaume Lample, Chris Dyer, Noah A. Smith.
arXiv 2016.

Unsupervised POS Induction with Word Embeddings [pdf]
Chu-Cheng Lin, Waleed Ammar, Lori Levin, Chris Dyer.
NAACL 2015.

Model Selection for Type-Supervised Learning with Application to POS Tagging [pdf]
Kristina Toutanova, Waleed Ammar, Pallavi Choudhury, Hoifung Poon.
CoNLL 2015.

Constraint-Based Models of Lexical Borrowing [pdf]
Yulia Tsvetkov, Waleed Ammar, Chris Dyer.
NAACL 2015.

Conditional Random Field Autoencoders for Unsupervised Structured Prediction [pdf, talk]
Waleed Ammar, Chris Dyer, Noah Smith.
NIPS 2014.

The CMU Submission for the Shared Task on Language Identification in Code Switched Data [pdf]
Chu-Cheng Lin, Waleed Ammar, Chris Dyer and Lori Levin.
Code Switching Workshop at EMNLP 2014.

The CMU Machine Translation Systems at WMT 2014 [pdf]
Austin Matthews, Waleed Ammar, Archna Batia, Weston Feely, Greg Hanneman, Eva Schlinger, Swabha Swayampidta, Yulia Tsvetkov, Alon Lavie, Chris Dyer.
WMT workshop at ACL 2014.

The CMU Machine Translation Systems at WMT 2013: Syntax, Synthetic Translation Options, and Pseudo-References [pdf]
Waleed Ammar, Victor Chahuneau, Michael Denkowski, Greg Hanneman, Wang Ling, Austin Matthews, Kenton Murray, Nicola Segall, Yulia Tsvetkov, Alon Lavie, Chris Dyer.
WMT workshop at ACL 2013.

Transliteration by Sequence Labeling with Lattice Encoding and Reranking [pdf]
Waleed Ammar, Chris Dyer, Noah Smith.
NEWS workshop at ACL 2012.

Automatic Categorization of Privacy Policies [
Waleed Ammar, Shomir Wilson, Norman Sadeh, Noah Smith.
Tech Report 2012.

Improved Transliteration Mining Using Graph Reinforcement [pdf]
Ali El Kahki, Kareem Darwish, Ahmed Saad El Din, Mohamed Abd El-Wahab, Ahmed Hefny and Waleed Ammar.
EMNLP 2011.

ICE-TEA: In-Context Expansion and Translation of English Abbreviations [pdf]
Waleed Ammar, Kareem Darwish, Ali ElKahki and Khaled Hafez.

Secure localization in wireless sensor networks: a survey [pdf]
Waleed Ammar, Ahmed ElDawy and Moustafa Youssef.
arXiv 2010.

Automatic scoring of online discussion posts [pdf]
Nayer Wanas, Motaz El Saban, Heba Ashour and Waleed Ammar.
WICOW workshop at CIKM 2008.


Syntax-based Augmentation of Statistical Machine Translation Phrase Tables
Achraf Chalabi, Waleed Ammar, Mostafa Ashour.
US Patent, Publication No. US 2012/0296633.

User evaluation in a collaborative online forum
Nayer Wanas, Heba Ashour, Moustafa El-Baradei, Ahmed Morsy, Motaz El Saban and Waleed Ammar.
US patent, Publication No. US 2010/0162135 A1.

Professional Experience

Allen Institute for Artificial Intelligence – Seattle
Research Scientist (Aug 2016 – Now)
Exploring NLP problems in low-resource settings, focusing on the scientific domain.

Google – Pittsburgh
Software Engineering Intern (Sep 2014 – Dec 2014)
Explored novel methods for large-scale online training of decision forests.

Microsoft Research – Redmond
Research Intern (May 2013 – Aug 2013)
Explored novel methods for optimization and model selection of unsupervised and semi-supervised learning with lexical constraints.

Microsoft Research – Redmond
Software Development Engineer II (Dec 2010 – Aug 2011)
Identified deficiencies of machine translated text and worked with researchers of the NLP group to find solutions. I was also responsible for integration of such solutions into the production system.

Microsoft Research – Microsoft Innovation Laboratory in Cairo
Research Software Development Engineer (Nov 2007 – Nov 2010)
Collaborated with researchers in MSR to push state of the art in the fields of Data Mining and Natural Language Processing by engineering prototype technologies, writing papers and formulating patents. I was also responsible for the transfer of research prototypes into Microsoft products.

Alexandria University
Teaching Assistant (Aug 2007 – Nov 2007)
Tutored students, held office hours, graded homework and mid-term exams, administrated tests and exams, and assisted professors with laboratory sessions.
Courses: Probability and Statistics I, Technical Writing I, and Introduction to Computers.

eSpace Technologies
Part-Time Software Developer (Jul 2007 – Nov 2007)
My role encompassed design and development of features in web portals as well as identification and resolution of deficiencies in web applications. I also took part in collecting customer requirements.

IBM Egypt – Cairo Technology Development Center
Intern at Human Language Technologies Group (Jul 2006 – Aug 2006)
Participated in TREC 2006 genomics track competition. We developed an information retrieval (IR) system capable of answering specific types of questions from within biological documents.

Procter & Gamble (P&G)
Intern on Project Management (Jun 2005 – Aug 2005)
Managed a real-world automation project at P&G powder factory in Egypt. Project scope included automatic identification of objects, semi-automatic acquisition of product type information, and rich web reporting system.

Academic Services
  • Served on the program committee of ACL 2017.
  • Served on the program committee of EACL 2017.
  • Served on the program committee of CoNLL 2017.
  • Served on the program committee of the First Women and Underrepresented Minorities in NLP Workshop (WiNLP) at ACL 2017.
  • Served on the program committee of the 2nd Workshop on Representation Learning for NLP (RepL4NLP) at ACL 2017.
  • Serving on the reviewing committee for TACL 2016-2018.
  • Served on the program committee of ACL 2016.
  • Served on the program committee of NAACL-HLT 2016.
  • Served on the program committee of the NAACL-HLT 2016 workshop on multilingual and crosslingual methods in NLP.
  • Reviewed for Journal of Artificial Intelligence Research.
  • Served on the program committee of EMNLP 2015.
  • Served on the program committee of IJCAI 2015.
  • Served on the program committee of the NAACL 2015 workshop on vector space modeling for NLP.
  • Helped write a proposal for a multi-million-dollars multi-university NSF project on making privacy policies more usable.
  • Was the PhD student body representative of LTI-CMU 2013.

  • Google PhD Fellowship in Natural Language Processing
  • Two Technology Transfer awards at Microsoft Research
  • Best Poster at the LTI Student Research Symposium


Activities Log

  • Oren Etzioni --Allen Institute for Artificial Intelligence (manager)
  • Noah A. Smith --University of Washington (PhD advisor)
  • Chris Dyer --Carnegie Mellon University (PhD advisor)
  • Tom Mitchell --Carnegie Mellon University (PhD thesis committee)
  • Kuzman Ganchev --Google Research (PhD thesis committee)
  • Miguel Ballesteros --Carnegie Mellon University (co-author, collaborator on cross-lingual parsing)
  • D. Sculley --Google Research (internship host)
  • Kristina Toutanova --Microsoft Research (co-author, internship host)
  • Kareem Darwish --Qatar Computing Research Institute (co-author, ex-manager)
  • Ayman Kaheel --Yahoo Inc. (ex-manager)
  • Tarek Elabbady --Microsoft Research (ex-manager)
  • Mei-Yuh Hwang --Microsoft Research (ex-manager)
  • Yulia Tsvetkov --Carnegie Mellon University (co-author, colleague)
  • Ahmed Hefny --Carnegie Mellon University (co-author, colleague)
  • Ali ElKahki --Google Research (ex-colleague at MSR Cairo)
  • Chu-Cheng Lin --Carnegie Mellon University (co-author, collaborator on modeling code switching with CRF autoencoders)
  • George Mulcaire --University of Washington (collaborator on estimating multilingual word embeddings)
  • Pradeep Dasigi --Carnegie Mellon University (collaborator on modeling selectional preferences with CRF autoencoders)
  • Moustafa Youssef --Egypt-Japan University of Science and Technology (co-author, M.Sc. ex-advisor)
  • Jeffrey Micher --US Army Research Lab (collaborator on the low-density MT project)
  • Norman Sadeh --Carnegie Mellon University (lead principal investigator of the usable privacy policy project)
  • George Foster --National Research Council Canada (Google fellowship research mentor)
  • Lori Levin --Carnegie Mellon University (co-author)
  • Jaime Carbonell --Carnegie Mellon University (department head, lead principal investigator of the low-density MT project)

Recent Projects
  • Language-universal dependency parsing* (code).
  • CRF autoencoder models for Scalable and feature-rich unsupervised learning* (code).
  • Multilingual word embeddings (unification-based*).
  • A universal depenency treebanks analyzer* (code).
  • Large-scale online training of random forests.*
  • Bayesian models for record linkage* (code).
  • CRF model for transliteration* (code).
  • Dual decomposition of a CFG parser and a POS tagger* (code)
  • A bunch of handy C, C++ and python utilities* (code).
  • Privacy policy crawler* (code).
  • C++ library for training recurrent neural network (code).
  • A neural network model which generalizes CRF autoencoders, for modeling selectional preferences (code).
  • A computational model for linguistic borrowing (code).
  • Semi-supervised learning for token-level language identification. (task, Twitter results, surprise genre results)
  • Improved training and model selection of unsupervised sequence-labeling models with lexical constraints.
  • Yet another implementation of the dependency parsing with DMV* (code).
  • Yet another implementation of logistic regression* (code).
  • Yet another implementation of word-alignment induced preordering for machine translation* (code).
Projects led by me are marked with *