Patrick Littell

Research Officer - Multilingual Text Processing
Digital Technologies Research Centre
National Research Council of Canada
fistname.lastname@nrc-cnrc.gc.ca

Recent Projects Indigenous Language Technologies at the National Research Council I am currently a Research Officer at the National Research Council of Canada, working on the development of practical language technologies for Indigenous languages spoken in Canada. LORELEI My previous position at the CMU Language Technologies Institute involves coordination of human and linguistic input, and the generation of linguistically-aware word representations, for the ARIEL-CMU team in the DARPA LORELEI project. In LORELEI, teams participate in a surprise language evaluation, developing machine translation, entity-detection and linking, and information extraction systems within a constrained timespan (e.g. 17-28 days) in unfamiliar languages. Gitxsan/English Online Dictionary I was the lead designer and programmer for the Gitxsan/English Online Dictionary, which uses advanced search and a modern user-interface to solve some of the perennial problems that face users of low-resource language dictionaries. Totem Field Storyboards I was the site manager for the Totem Field Storyboards project, a collection of language-neutral comic-style stories intended for linguistic fieldwork and language education. Each storyboard is carefully constructed to draw out a particular semantic phenomenon. I am also one of the editors of the forthcoming open-access, peer-reviewed journal Storyboards for Linguistic Fieldwork. North American Computational Linguistics Olympiad I help run the North American Computational Linguistics Olympiad, a contest that introduces U.S. and Canadian high-school students to linguistics and computational linguistics. As the lead curriculum designer, I've published over 35 fun age-appropriate linguistics problem sets and served as an editor for many more. International Linguistics Olympiad I am the archivist and webmaster for the International Linguistics Olympiad, the international finals round for high-school linguistics competition.
Refereed Papers The Indigenous Languages Technology project at NRC Canada: An empowerment-oriented approach to developing language software. Roland Kuhn, Fineen Davis, Alain Désilets, Eric Joanis, Anna Kazantseva, Rebecca Knowles, Patrick Littell, Delaney Lothian, Aidan Pine, Caroline Running Wolf, Eddie Santos, Darlene Stewart, Gilles Boulianne, Vishwa Gupta, Brian Maracle Owennatékha, Christopher Cox, Marie-Odile Junker, Olivia Sammons, Delasie Torkornoo, Nathan Thanyehténhas Brinklow, Sara Child, Benoît Farley, David Huggins-Daines, Daisy Rosenblum, Heather Souter. Proceedings of the 28th International Conference on Computational Linguistics. NRC systems for low resource German-Upper Sorbian machine translation 2020: Transfer learning with lexical modifications. R Knowles, S Larkin, D Stewart, P Littell. Proceedings of the Fifth Conference on Machine Translation (WMT 2020), 1112-1122. NRC Systems for the 2020 Inuktitut-English News Translation Task. R Knowles, D Stewart, S Larkin, P Littell. Proceedings of the Fifth Conference on Machine Translation (WMT 2020), pp. 156-170. Universal phone recognition with a multilingual allophone system. Xinjian Li, Siddharth Dalmia, Juncheng Li, Matthew Lee, Patrick Littell, Jiali Yao, Antonios Anastasopoulos, David R Mortensen, Graham Neubig, Alan W Black, Florian Metze. In Proceedings of ICASSP 2020. The Nunavut Hansard Inuktitut--English Parallel Corpus 3.0 with Preliminary Machine Translation Results Eric Joanis, Rebecca Knowles, Roland Kuhn, Samuel Larkin, Patrick Littell, Chi-kiu Lo, Darlene Stewart, Jeffrey Micher. In Proceedings of LREC 2020. AlloVera: A Multilingual Allophone Database David R. Mortensen, Xinjian Li, Patrick Littell, Alexis Michaud, Shruti Rijhwani, Antonios Anastasopoulos, Alan W. Black, Florian Metze, Graham Neubig. In Proceedings of LREC 2020. A Summary of the First Workshop on Language Technology for Language Documentation and Revitalization. Graham Neubig, Shruti Rijhwani, Alexis Palmer, Jordan MacKenzie, Hilaria Cruz, Xinjian Li, Matthew Lee, Aditi Chaudhary, Luke Gessler, Steven Abney, Shirley Anugrah Hayati, Antonios Anastasopoulos, Olga Zamaraeva7, Emily Prud’hommeaux, Jennette Child, Sara Child, Rebecca Knowles, Sarah Moeller, Jeffrey Micher, Yiyuan Li, Sydney Zink, Mengzhou Xia, Roshan Sharma, Patrick Littell. In Proceedings of SLTU-CCURL 2020. Multi-Source Transformer for Kazakh-Russian-English Neural Machine Translation. Patrick Littell, Chi-kiu Lo, Samuel Larkin, and Darlene Stewart. In Proceedings of the Fourth Conference on Machine Translation (WMT 2019). Choosing Transfer Languages for Cross-Lingual Learning. Yu-Hsiang Lin, Chian-Yu Chen, Jean Lee, Zirui Li, Yuyan Zhang, Mengzhou Xia, Shruti Rijhwani, Junxian He, Zhisong Zhang, Xuezhe Ma, Antonios Anastasopoulos, Patrick Littell, and Graham Neubig. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Identifying Misaligned Spans in Parallel Corpora Using Change Point Detection. Andrea Pagotto, Patrick Littell, Yunli Wang, and Cyril Goutte. In Advances in Artificial Intelligence, pp 200-211. Towards a General-Purpose Linguistic Annotation Backend Graham Neubig, Patrick Littell, Chian-Yu Chen, Jean Lee, Zirui Li, Yu-Hsiang Lin and Yuyan Zhang. Towards a General-Purpose Linguistic Annotation Backend. In Proceedings of the 3rd Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL-3). Cleaning a parallel corpus without parallel corpora: The NRC unsupervised submissions to the WMT18 Parallel Corpus Filtering shared task. Littell, P., Larkin, S., Stewart, D., Simard, M., Goutte, C. and Lo, C.-K. In Proceedings of the 3rd Conference on Machine Translation (WMT 2018). Accurate semantic textual similarity for web crawled parallel corpora using semantic machine translation evaluation metric: The NRC supervised submissions to the Parallel Corpus Filtering Task. Lo, C.-K., Simard, M., Stewart, D., Larkin, S., Goutte, C. and Littell, P. In Proceedings of the 3rd Conference on Machine Translation (WMT 2018). Indigenous language technologies in Canada: Assessment, challenges, and successes. Littell, P., Kazantseva, A., Kuhn, R., Pine, A., Arppe, A., Cox, C., and Junker, M.-O. In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2018). Finite-state morphology for Kwak’wala: A phonological approach. Littell, P. In All Together Now: Computational Modeling of Polysynthetic Languages. Parser combinators for Tigrinya and Oromo morphology. Littell, P., McCoy, T., Han, N.-R., Rijhwani, S., Sheikh, Z., Mortensen, D., Mitamura, T., and Levin, L. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC’18) Epitran: Precision G2P for Many Languages. Mortensen, D., Dalmia, S., and Littell, P. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC’18). The ARIEL-CMU Situation Frame detection pipeline for LoReHLT16: A model translation approach. Littell, P., Tian, T., Xu, R., Sheikh, Z., Mortensen, D., Levin, L., Tyers, F., Hayashi, H., Horwood, G., Sloto, S., Tagtow, E., Black, A., Yang, Y., Mitamura, T., and Hovy, E. To appear in Machine Translation. Learning language representations for typology prediction Malaviya, C., Neubig, G. and Littell, P. 2017. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017). URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors Littell, P., Mortensen, D., Turner, C., Lin, K., Kairis, K., and Levin, L. 2017. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers (EACL 2017). Waldayu and Waldayu Mobile: Modern digital dictionary interfaces for endangered languages Littell, P., Pine, A., and Davis, H. 2017. In Proceedings of the 2017 Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL-2). STREAMLInED Challenges: Aligning research interests with shared tasks Levow, G.-A., Bender, E., Littell, P., Howell, K., Chelliah, S., Crowgey, J., Garrette, D., Good, J., Hargus, S., Inman, D., Maxwell, M., Tjalve, M., and Xia, F. 2017. In Proceedings of the 2017 Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL-2). Named entity recognition for linguistic rapid response in low-resource languages: Sorani Kurdish and Tajik Littell, P., Goyal, K., Mortensen, D., Little, A., Dyer, C., and Levin, L. 2016. In Proceedings of the 27th International Conference on Computational Linguistics (COLING 2016). The role of context in neural morphological disambiguation Shen, Q., Clothiaux, D. Tagtow, E., Littell, P. and Dyer, C. 2016. In Proceedings of the 27th International Conference on Computational Linguistics (COLING 2016). PanPhon: A resource for mapping IPA segments to articulatory feature vectors Mortensen D., Littell P., Bharadwaj A., Goyal K., Dyer C., and Levin L. 2016. In Proceedings of the 27th International Conference on Computational Linguistics (COLING 2016). Bridge-language capitalization inference in Western Iranian: Sorani, Kurmanji, Zazaki, and Tajik Littell, P., Mortensen, D., Goyal, K., Dyer, C., and Levin, L. 2016. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC’16). Polyglot neural language models: Case study in cross-lingual phonetic representation learning Tsvetkov, Y., Sitaram, S., Faruqui, M., Lample, G., Littell, P., Mortensen, D., Black, A., Levin, L. and Dyer, C. 2016. In Proceedings of NAACL 2016. Morphological parsing of Swahili using crowdsourced lexical resources Littell, P, Price, K. and Levin, L. 2014. In Proceedings of LREC 2014. Introducing computational concepts in a linguistic olympiad Littell, P., Levin, L., Eisner, J. and Radev, D. 2013. In Derzhanski, I. and Radev, D. (eds.), Proceedings of the fourth workshop on teaching NLP, 51st annual meeting of the ACL.
Other Papers NACLO Style Guide (v6) "Linguistics Olympiad" problems -- age-appropriate problem sets in linguistics for middle- and secondary-school students -- are a distinctive genre with their own rules of composition. This document is a normative "style guide" for the North American Computational Linguistics Olympiad, detailing the criteria that NACLO editors use when choosing a problem for publication. Further dimensions of evidential variation: Evidence from Nɬeʔkepmxcín Littell, P. 2014. To appear in H. Greene (ed.), Proceedings of SULA 7. The content of copulas in Kwak'wala Littell, P. 2012. In E. Bogal-Allbritten (ed.), Proceedings of the sixth conference on the semantics of under-represented languages in the Americas and SULA-Bar. Kwak'wala "agreement" as partial subject copy Littell, P. 2012. In LSA Annual Meeting Extended Abstracts 2012. Mistaken identity: Boas's dilemma and the missing Kwak'wala copula Littell, P. 2012. In Gutiérrez, A. and Stelle, E. (eds.), UBC Linguistics Qualifying Papers 1 (2010-11). Reconsidering sensory evidence in Nɬeʔkepmxcín Littell, P. and Mackie, S. 2011. In J. Lyon and J. Dunham (eds.), Papers for ICSNL XLVI: The forty-sixth international conference on Salish and neighboring languages. Vancouver: University of British Columbia. On the semantics of conjectural questions Littell, P., Matthewson, L. and Peterson, T. 2009. In M. Schenner, R.-M. Déchaine, T. Peterson and U. Sauerland (eds.), Evidence from evidentials. Vancouver: University of British Columbia Working Papers in Linguistics.