Projects
Machine Transliteration
Structured perceptron based transliteration system for cross transliterations among 16 Indian languages including English and Urdu.
Hindi-English UD Treebank
Universal Dependency Treebank for Hindi-English Code Switching.
Language Identification and Normalization in Code Switching
Language identification and normalisation in code switching data tailored with a three-step decoding process.
Dependency Parser for Code Switching Texts
Neural Stacking Dependency Parsers for monolingual, multilingual and code switching texts.
Wx converter
Python library for UTF to WX conversion and vice-versa for Indian languages.
Polyglot Tokenizer
Tokenizer for world’s most spoken languages and social media texts like Facebook, Twitter etc.