Patrick Littell
Research Officer - Multilingual Text Processing
Digital Technologies Research Centre
National Research Council of Canada
fistname.lastname@nrc-cnrc.gc.ca
Recent Projects
Indigenous Language Technologies at the National Research Council
I am currently a Research Officer at the National Research Council of Canada, working on the development of practical language technologies for Indigenous languages spoken in Canada.
LORELEI
My previous position at the CMU Language Technologies Institute involves coordination of human and linguistic input, and the generation of linguistically-aware word representations, for the ARIEL-CMU team in the DARPA LORELEI project. In LORELEI, teams participate in a surprise language evaluation, developing machine translation, entity-detection and linking, and information extraction systems within a constrained timespan (e.g. 17-28 days) in unfamiliar languages.
Gitxsan/English Online Dictionary
I was the lead designer and programmer for the Gitxsan/English Online Dictionary, which uses advanced search and a modern user-interface to solve some of the perennial problems that face users of low-resource language dictionaries.
Totem Field Storyboards
I was the site manager for the Totem Field Storyboards project, a collection of language-neutral comic-style stories intended for linguistic fieldwork and language education. Each storyboard is carefully constructed to draw out a particular semantic phenomenon. I am also one of the editors of the forthcoming open-access, peer-reviewed journal Storyboards for Linguistic Fieldwork.
North American Computational Linguistics Olympiad
I help run the North American Computational Linguistics Olympiad, a contest that
introduces U.S. and Canadian high-school students to linguistics and computational
linguistics. As the lead curriculum designer, I've published over 35 fun age-appropriate linguistics problem sets and served as an editor for many more.
International Linguistics Olympiad
I am the archivist and webmaster for the International Linguistics Olympiad, the international finals round for high-school linguistics competition.
Past Projects
LLabSWA (download .zip)
A morphological parser for Swahili, written in XFST+LEXC for the CMU Language Technologies Institute. Using crowdsourced lexical resources from Kamusi.org, this parser breaks words into features in preparation for statistical machine translation.
"What's the Difference?" Activities for Linguistic Fieldwork
This site generates "What's the difference between these
two pictures activities" in order to elicit contrastive focus constructions. The user
can choose what sentences they want illustrated, and how those sentences will differ, and the
site engine will render the appropriate images.
Spikipedia
Spikipedia is a custom wiki engine focused on multilanguage, multi-orthography documents with multiple data types. Rather than make the user declare each type of data manually, the engine analyzed each line of data to determine its language, its orthography, and what sort of data it was. The engine then laid out the data in the orthography and format chosen by the user. (NB: This site has recently ceased to function due to a change in the way the Google Apps API handles databases. If you would like to use Spikipedia in another project, contact me and I'll see if I can get it working with the new API.)
Canadian Language Museum/Musée canadien des langues
From 2011 to 2014, I was the webmaster and site designer for the
Canadian Language Museum/Musée canadien des langues.
Refereed Papers
The Indigenous Languages Technology project at NRC Canada: An empowerment-oriented approach to developing language software.
Roland Kuhn, Fineen Davis, Alain Désilets, Eric Joanis, Anna Kazantseva, Rebecca Knowles, Patrick Littell, Delaney Lothian, Aidan Pine, Caroline Running Wolf, Eddie Santos, Darlene Stewart, Gilles Boulianne, Vishwa Gupta, Brian Maracle Owennatékha, Christopher Cox, Marie-Odile Junker, Olivia Sammons, Delasie Torkornoo, Nathan Thanyehténhas Brinklow, Sara Child, Benoît Farley, David Huggins-Daines, Daisy Rosenblum, Heather Souter. Proceedings of the 28th International Conference on Computational Linguistics.
NRC systems for low resource German-Upper Sorbian machine translation 2020: Transfer learning with lexical modifications.
R Knowles, S Larkin, D Stewart, P Littell. Proceedings of the Fifth Conference on Machine Translation (WMT 2020), 1112-1122.
NRC Systems for the 2020 Inuktitut-English News Translation Task.
R Knowles, D Stewart, S Larkin, P Littell. Proceedings of the Fifth Conference on Machine Translation (WMT 2020), pp. 156-170.
Universal phone recognition with a multilingual allophone system.
Xinjian Li, Siddharth Dalmia, Juncheng Li, Matthew Lee, Patrick Littell, Jiali Yao, Antonios Anastasopoulos, David R Mortensen, Graham Neubig, Alan W Black, Florian Metze. In Proceedings of ICASSP 2020.
The Nunavut Hansard Inuktitut--English Parallel Corpus 3.0 with Preliminary Machine Translation Results
Eric Joanis, Rebecca Knowles, Roland Kuhn, Samuel Larkin, Patrick Littell, Chi-kiu Lo, Darlene Stewart, Jeffrey Micher. In Proceedings of LREC 2020.
AlloVera: A Multilingual Allophone Database
David R. Mortensen, Xinjian Li, Patrick Littell, Alexis Michaud, Shruti Rijhwani, Antonios Anastasopoulos, Alan W. Black, Florian Metze, Graham Neubig. In Proceedings of LREC 2020.
A Summary of the First Workshop on Language Technology for Language Documentation and Revitalization.
Graham Neubig, Shruti Rijhwani, Alexis Palmer, Jordan MacKenzie, Hilaria Cruz, Xinjian Li, Matthew Lee, Aditi Chaudhary, Luke Gessler, Steven Abney, Shirley Anugrah Hayati, Antonios Anastasopoulos, Olga Zamaraeva7, Emily Prud’hommeaux, Jennette Child, Sara Child, Rebecca Knowles, Sarah Moeller, Jeffrey Micher, Yiyuan Li, Sydney Zink, Mengzhou Xia, Roshan Sharma, Patrick Littell. In Proceedings of SLTU-CCURL 2020.
Multi-Source Transformer for Kazakh-Russian-English Neural Machine Translation.
Patrick Littell, Chi-kiu Lo, Samuel Larkin, and Darlene Stewart. In Proceedings of the Fourth Conference on Machine Translation (WMT 2019).
Choosing Transfer Languages for Cross-Lingual Learning.
Yu-Hsiang Lin, Chian-Yu Chen, Jean Lee, Zirui Li, Yuyan Zhang, Mengzhou Xia, Shruti Rijhwani, Junxian He, Zhisong Zhang, Xuezhe Ma, Antonios Anastasopoulos, Patrick Littell, and Graham Neubig. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.
Identifying Misaligned Spans in Parallel Corpora Using Change Point Detection.
Andrea Pagotto, Patrick Littell, Yunli Wang, and Cyril Goutte. In Advances in Artificial Intelligence, pp 200-211.
Towards a General-Purpose Linguistic Annotation Backend
Graham Neubig, Patrick Littell, Chian-Yu Chen, Jean Lee, Zirui Li, Yu-Hsiang Lin and Yuyan Zhang. Towards a General-Purpose Linguistic Annotation Backend. In Proceedings of the 3rd Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL-3).
Cleaning a parallel corpus without parallel corpora: The NRC unsupervised submissions to the WMT18 Parallel Corpus Filtering shared task.
Littell, P., Larkin, S., Stewart, D., Simard, M., Goutte, C. and Lo, C.-K. In Proceedings of the 3rd Conference on Machine Translation (WMT 2018).
Accurate semantic textual similarity for web crawled parallel corpora using semantic machine translation evaluation metric: The NRC supervised submissions to the Parallel Corpus Filtering Task.
Lo, C.-K., Simard, M., Stewart, D., Larkin, S., Goutte, C. and Littell, P. In Proceedings of the 3rd Conference on Machine Translation (WMT 2018).
Indigenous language technologies in Canada: Assessment, challenges, and successes.
Littell, P., Kazantseva, A., Kuhn, R., Pine, A., Arppe, A., Cox, C., and Junker, M.-O. In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2018).
Finite-state morphology for Kwak’wala: A phonological approach.
Littell, P. In All Together Now: Computational Modeling of Polysynthetic Languages.
Parser combinators for Tigrinya and Oromo morphology.
Littell, P., McCoy, T., Han, N.-R., Rijhwani, S., Sheikh, Z., Mortensen, D., Mitamura, T., and Levin, L. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC’18)
Epitran: Precision G2P for Many Languages.
Mortensen, D., Dalmia, S., and Littell, P. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC’18).
The ARIEL-CMU Situation Frame detection pipeline for LoReHLT16: A model translation approach.
Littell, P., Tian, T., Xu, R., Sheikh, Z., Mortensen, D., Levin, L., Tyers, F., Hayashi, H., Horwood, G., Sloto, S., Tagtow, E., Black, A., Yang, Y., Mitamura, T., and Hovy, E. To appear in Machine Translation.
Learning language representations for typology prediction
Malaviya, C., Neubig, G. and Littell, P. 2017. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017).
URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors
Littell, P., Mortensen, D., Turner, C., Lin, K., Kairis, K., and Levin, L. 2017. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers (EACL 2017).
Waldayu and Waldayu Mobile: Modern digital dictionary interfaces for endangered languages
Littell, P., Pine, A., and Davis, H. 2017. In Proceedings of the 2017 Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL-2).
STREAMLInED Challenges: Aligning research interests with shared tasks
Levow, G.-A., Bender, E., Littell, P., Howell, K., Chelliah, S., Crowgey, J., Garrette, D., Good, J., Hargus, S., Inman, D., Maxwell, M., Tjalve, M., and Xia, F. 2017. In Proceedings of the 2017 Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL-2).
Named entity recognition for linguistic rapid response in low-resource languages: Sorani Kurdish and Tajik
Littell, P., Goyal, K., Mortensen, D., Little, A., Dyer, C., and Levin, L. 2016. In Proceedings of the 27th International Conference on Computational Linguistics (COLING 2016).
The role of context in neural morphological disambiguation
Shen, Q., Clothiaux, D. Tagtow, E., Littell, P. and Dyer, C. 2016. In Proceedings of the 27th International Conference on Computational Linguistics (COLING 2016).
PanPhon: A resource for mapping IPA segments to articulatory feature vectors
Mortensen D., Littell P., Bharadwaj A., Goyal K., Dyer C., and Levin L. 2016. In Proceedings of the 27th International Conference on Computational Linguistics (COLING 2016).
Bridge-language capitalization inference in Western Iranian: Sorani, Kurmanji, Zazaki, and Tajik
Littell, P., Mortensen, D., Goyal, K., Dyer, C., and Levin, L. 2016. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC’16).
Polyglot neural language models: Case study in cross-lingual phonetic representation learning
Tsvetkov, Y., Sitaram, S., Faruqui, M., Lample, G., Littell, P., Mortensen, D., Black, A., Levin, L. and Dyer, C. 2016. In Proceedings of NAACL 2016.
Morphological parsing of Swahili using crowdsourced lexical resources
Littell, P, Price, K. and Levin, L. 2014. In Proceedings of LREC 2014.
Introducing computational concepts in a linguistic olympiad
Littell, P., Levin, L., Eisner, J. and Radev, D. 2013. In Derzhanski, I. and Radev, D. (eds.), Proceedings of the fourth workshop on teaching NLP, 51st annual meeting of the ACL.
Other Papers
NACLO Style Guide (v6)
"Linguistics Olympiad" problems -- age-appropriate problem sets in linguistics for middle- and secondary-school students -- are a distinctive genre with their own rules of composition. This document is a
normative "style guide" for the North American Computational Linguistics Olympiad, detailing the criteria that NACLO editors use when choosing a problem
for publication.
Further dimensions of evidential variation: Evidence from Nɬeʔkepmxcín
Littell, P. 2014. To appear in H. Greene (ed.), Proceedings of SULA 7.
The content of copulas in Kwak'wala
Littell, P. 2012. In E. Bogal-Allbritten (ed.), Proceedings of the sixth conference on the semantics of under-represented languages in the Americas and SULA-Bar.
Kwak'wala "agreement" as partial subject copy
Littell, P. 2012. In LSA Annual Meeting Extended Abstracts 2012.
Mistaken identity: Boas's dilemma and the missing Kwak'wala copula
Littell, P. 2012. In Gutiérrez, A. and Stelle, E. (eds.), UBC Linguistics Qualifying Papers 1 (2010-11).
Reconsidering sensory evidence in Nɬeʔkepmxcín
Littell, P. and Mackie, S. 2011. In J. Lyon and J. Dunham (eds.), Papers for ICSNL XLVI: The forty-sixth international conference on Salish and
neighboring languages. Vancouver: University of British Columbia.
On the semantics of conjectural questions
Littell, P., Matthewson, L. and Peterson, T. 2009. In M. Schenner, R.-M. Déchaine, T. Peterson and U. Sauerland (eds.), Evidence from evidentials. Vancouver: University of British Columbia Working Papers in Linguistics.
Just Fun
Lexicopolis: A-B-City
A simple but addictive word game in which you build a city by spelling out the names of buildings. Made for the April/May 2013 CONSTRUCT competition at the Experimental Gameplay Project, and featured in the Boston Globe, PC Gamer, and IndieStatik.
Les blocs de l'est
A cooperative multiplayer arcade game that gives a backstage look at the reality behind a classic game. Made for the RETRO NO FUTURE contest (June-July 2013) at oujevipo.fr, and featured in an arcade cabinet at the VISAGES DU MONDE media festival in Cergy, France (October 5-6, 2013).