Dr. Narayan Kumar Choudhary
Lecturer cum Junior Research Officer, Central Institute of Indian Languages, Ministry of Human Resource Development, Department of Higher Education, Government of India, Mysuru – 570006
Email: nchoudhary.ciil AT gmail {dot} com
Phone (O): 0821-2345092
Personal Weblog: Narayan Choudhary's Blog

Research Interests

Computational Linguistics, Natural Language Processing (NLP) and Language Technology in general.
Corpus Linguistics, Data Science, Part of Speech Annotation, Syntactic/Dependency Parsing, and related areas.

Other interests

Language Typology, Language Documentation, General Linguistics.



1. Proceedings of the Third Students’ Conference of Linguistics in India (SCONLI-3). 2011. ed. with Gibu Sabu M., Parimal Publishers, New Delhi.
2. Indian Language Part-of-Speech Tagset: Hindi. 2010. Co-authored by Kalika Bali, Monojit Choudhury, Priyanka Biswas, Girish Nath Jha, Maansi Sharma. Linguistic Data Consortium, Philadelphia. (This is actually a PoS Annotated corpus of Hindi general domain text)

Research Papers in Refereed Journals/Conferences

1. Parth Pathak, Pinal Patel, Vishal Panchal, Sagar Soni, Kinjal Dani, Narayan Choudhary, Amrish Patel. 2015. ezDI: A Supervised NLP System for Clinical Narrative Analysis. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015). Denver, Colorado. (Accorded first rank in the shared task)
2. Neha Dixit and Narayan Choudhary. 2014. Evaluating Two Annotated Corpora of Hindi Using a Verb Class Identifier. In Proceedings of ICON 2014. Goa University, Goa (To appear in ACL Anthology).
3. Neha Dixit and Narayan Choudhary. 2014. Automatic Classification of Hindi Verbs in Syntactic Perspective. International Journal of Emerging Technology and Advanced Engineering, Volume 4, 8th Issue. ( ISSN 2250 – 2459 (Online))
4. Parth Pathak, Pinal Patel, Vishal Panchal, Narayan Choudhary, Amrish Patel, Gautam Joshi. 2014. ezDI: A Hybrid CRF and SVM based Model for Detecting and Encoding Disorder Mentions in Clinical Notes. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014 Shared Task, awarded third best result). Dublin, Ireland. ISBN 978-1-941643-29-7
5. Narayan Choudhary, Parth Pathak, Pinal Patel, Vishal Panchal. 2014. Annotating a Large Representative Corpus of Clinical Notes for Parts of Speech. 2014. In: Proceedings of 8th Linguistic Annotation Workshop, Dublin, Ireland. ISBN 978-1-941643-29-7
6. Narayan Choudhary, Girish Nath Jha. 2011.Creating Multilingual Parallel Corpora in Indian Languages. 2011. In Proceedings of the 5th Language & Technology Conference, Poznan, Poland. (awarded the Best Student Paper)
7. Narayan Choudhary, Girish Nath Jha and Pramod Pandey. 20011. A Rule based Method for the Identification of TAM features in a PoS Tagged Corpus. In Proceedings of the 5th Language & Technology Conference, Poznan, Poland.
8. Narayan Choudhary. 2011. Web-drawn corpus for Indian Languages: A Case of Hindi. In Proceedings of Information Systems for Indian Languages. Volume 139, Part 2, 218-223. Springer Verlag.
9. Narayan Choudhary. 2008. बोधात्मक भाषाविज्ञान, in Gaveshanaa, April-June, 2008 vol.:90/2008 Central Institute of Hindi, Agra. pp.:11-18 (This is a Hindi translation of the article “Cognitive Linguistics” from Encyclopedia of Linguistics by Gilles Falkner, 2006)
10. Narayan Choudhary. 2007. Syllable Structure of Great Andamanese, November, 2006. In proceedings of National Seminar on Perspectives in Linguistics, Kashmir University, Srinagar, Kashmir. India. Pp. 141-146
11. Narayan Choudhary, Anvita Abbi, Girish Nath Jha. 2007.Morphological Analyzer for Great Andamanese Verbs: Implementing a Concatenative Template. In Vishwabharat ( April 2007 - January 2008 Journal) TDIL, New Delhi, pp.113-118 http://tdil.mit.gov.in/april-jan-2008/8.8_Morphological_analyzer.pdf


1. “NLP and Information Extraction”, SCONLI-07, Aligarh Muslim University, Aligarh, 8-10 February, 2013
2. Orientation Course in Computational Linguistics, Tezpur University, Assam. 21-23 December, 2014


1. Ranked 1st in “SemEval-2015 Task 14: Analysis of Clinical Text”, to occur in NAACL-2015, Denver, Colorado.
2. Ranked 3rd in “SEMEVAL 2014: Shared Task 7: Analysis of Clinical Text”, Dublin, Ireland
3. Best Student Paper Award for the paper titled “Creating Multilingual Parallel Corpora in Indian Languages” at LTC’11, Poznan, Poland
4. UGC-NET Luctureship Award, 2003 and 2004

Past Research & Work Experiences

1. NLP Research Engineer, ezDI, LLC.: July, 2012 – February, 2016
2. Senior Linguist, Shallow Parser Tools for Indian Languages, JNU, New Delhi: May, 2012 – June, 2012
3. Senior Linguist, Indian Languages Corpora Initiative (ILCI), JNU, New Delhi: March, 2009 – May, 2010
4. Teaching Assistant, Centre for Linguistics and Special Centre for Sanskrit Studies, JNU, New Delhi: August, 2007 - July, 2009
5. Research Assistant, Centre for Linguistics, JNU, New Delhi: August, 2005-July, 2007
6. Project Associate, CSE, IIT Kanpur: May, 2005-June, 2005

Educational Qualifications

1. Ph. D. Thesis Title: Automatic Identification and Analysis of Verb Groups in Hindi. JNU. 2006-2011.
2. M. Phil. Dissertation Title: Developing a Computational Framework for the Verb Morphology of Great Andamanese. JNU, New Delhi, 2006.
3. National Eligibility Test for Lectureship (NET -December, 2003; June, 2004) of the UGC
4. Masters (Linguistics), JNU, New Delhi, 2004. MA Thesis: Word Order in Pnar (Jaintia)
5. Bachelor of Arts (English Hons., Economics, History, Hindi), LNMU, Darbhanga, 2001.


Linguistic Society of India, Life Member

Association of Computing Machinery, Student Member, 2010-2013

Other Info

Computing Skills

Platforms: Well versed with Windows and Linux (Ubuntu/RHEL)

Development Environment: MySQL 5; PHP, JAVA, C++, Perl, Prolog, LISP, CSS, HTML


Well Versed with Expertise: English, Hindi, Maithili
Academic Knowledge: Sanskrit, Pnar (Jaintia), Great Andamanese, Gujarati


Reading, Writing, Music, Yoga, Swimming, Mountaineering.