Hinglish, a mix of Hindi and English, is one if the most spoken code-switched language. Code-switching occurs when speakers use grammatical structures from multiple languages in their speech. This can be challenging for NLP algorithms, as they are typically not designed for code-switched data. As a result, the performance of these algorithms may be degraded when applied to code-switched data. To improve their performance, additional processing steps such as language identification, normalization, and back-transliteration may be necessary.
I feel privileged to have been given the opportunity to work for Libindic organization under the Google Summer of Code 2016. Libindic is an open source library that supports many utilities for text processing of Indian languages. I would be contributing towards the automatic script transliteration between scheduled languages of India including English. For the project, I would be mentored by Riyaz Ahmad Bhat and Santhosh Thottingal.