Projects

Machine Transliteration

Structured perceptron based transliteration system for cross transliterations among 16 Indian languages including English and Urdu.

Hindi-English UD Treebank

Universal Dependency Treebank for Hindi-English Code Switching.

Language Identification and Normalization in Code Switching

Language identification and normalisation in code switching data tailored with a three-step decoding process.

Dependency Parser for Code Switching Texts

Neural Stacking Dependency Parsers for monolingual, multilingual and code switching texts.

Wx converter

Python library for UTF to WX conversion and vice-versa for Indian languages.

Polyglot Tokenizer

Tokenizer for world’s most spoken languages and social media texts like Facebook, Twitter etc.