Hinglish, a hybrid language combining elements of Hindi and English, is considered one of the most commonly spoken code-switched languages. Code-switching, also known as code-mixing, involves the simultaneous usage of grammatical units derived from two or more languages within a single speech utterance. This linguistic phenomenon occurs when speakers incorporate grammatical structures from multiple languages into their discourse. However, the presence of code-switching poses a challenge for Natural Language Processing (NLP) algorithms, which are typically not tailored for handling code-switched data. Consequently, the performance of these algorithms may be compromised when applied to code-switched data. To enhance their effectiveness, additional processing steps such as language identification, normalization, and back-transliteration may be required.
I feel privileged to have been given the opportunity to work for Libindic organization under the Google Summer of Code 2016. Libindic is an open source library that supports many utilities for text processing of Indian languages. I would be contributing towards the automatic script transliteration between scheduled languages of India including English. For the project, I would be mentored by Riyaz Ahmad Bhat and Santhosh Thottingal.