Evaluating the Amarkosha to Generate Computational Model for Sanskrit Vocabulary and Sanskrit Word Bank
Keywords:
Amarkosha, Natural Language Processing, Word Bank, Clustering, K-Means Clustering, Louvain Community Detection, Sanskrit.Abstract
Amarkosha is considered to be one of the most complete word banks ever generated for the Sanskrit language. It has the listing of almost 10,000 words along with their morphological construct, a list of paryayavachi words (synonyms), and their gender study or linganushasanam. The scripture is divided into three sections listing 27 clusters of words. The last cluster of the last section is completely dedicated to defining the genders of the words. The scripture itself is so composed that a computational model for the Sanskrit vocabulary can easily be generated from it. As natural language processing (NLP) for any language needs a good word bank along with all its characteristics and behavioral aspects, in this paper we have made an effort to cluster the Sanskrit vocabulary and construct the computational model for the Sanskrit word bank. The clustering of the words is made by two standard methods, k-means clustering and the Louvain community detection method. In a comparative study of both methods, we have observed the Louvain method to be more efficient in clustering the Sanskrit vocabulary as it provides the output that aligns with the original construct and clusters of the Amarkosha itself. Louvain method gives the output of 24 distinct communities for the words, whereas k-means clustering gives 36 clusters as output. This gives 88% accuracy for the Louvain community detection method and 67% accuracy for k-means clustering.