Language is a medium through which we express our thoughts while literature is a mirror that reflects ideas and philosophies which govern our society. Hence, to know any particular culture and its tradition it is very important that we understand the evolution of its language and the various forms of literature like poetry, drama and religious and non-religious writings.


Language consists of the development, acquisition, maintenance and use of complex systems of communication, particularly the human ability to do so; and a language is any specific example of such a system. Human language has the properties of productivity and displacement, and relies entirely on social convention and learning. Its complex structure affords a much wider range of expressions than any known system of animal communication. Languages evolve and diversify over time, and the history of their evolution can be reconstructed by comparing modern languages to determine which traits their ancestral languages must have had in order for the later developmental stages to occur. A group of languages that descend from a common ancestor is known as a language family. Academic consensus holds that between 50% and 90% of languages spoken at the beginning of the 21st century will probably have become extinct by the year 2100.


  • Language: the method of human communication, either spoken or written, consisting of the use of words in a structured and conventional way.
  • Language family: A language family is a group of languages related through descent from a common ancestral language or parental language, called the proto-language of that family.
  • Language isolates: The languages that have no known relatives are called language isolates, essentially language families consisting of a single language.
  • Proto-language: A proto-language in the tree model of historical linguistics is a language, usually hypothetical or reconstructed, and unattested, from which a number of attested, or documented, known languages are believed to have descended by evolution, or slow modification of the proto-language into languages that form a language family.
  • Mother-tongue: Mother tongue is the language spoken in childhood by the person’s mother to the person. If the mother died in infancy, the language mainly spoken in the person’s home in childhood will be the mother tongue.

Indian Scenario

Indian language regions

First language in each state of India

Language is an important attribute of a population, and has great relevance and significance in a pluri-lingual and pluri-ethnic land like India. Languages act as bridges because it enables us to know about others. Indian multilingualism is multi layered and complex. Every single language has many variations, which are based on caste, region, gender, occupation, age, etc. For instance, Hindi has as many as  forty nine varieties based on region.

India (780) has the world's second highest number of languages, after Papua New Guinea (839). Indian multilingualism cannot be understood under a single heading of Language Families. The real essence of Indian multilingualism can best be defined in terms of variations, that is knowing about the language families, tribal languages, races, script, regional languages, dialectical variations, idiolectal variation, registral variation, stylistic variation, etc.

According to Census of India of 2001, India has 122 major languages and 1599 other languages. However, figures from other sources vary, primarily due to differences in definition of the terms "language" and "dialect". Two contact languages have played an important role in the history of India: Persian and English. Persian was the court language during the Mughal period in India. It reigned as an administrative language for several centuries until the era of British colonisation. English continues to be an important language in India. Hindi, the most widely spoken language in India today, serves as the lingua franca across much of North and Central India.

Article 343 of the Indian constitution states that the official language of the Union government shall become Hindi in Devanagari script instead of the extant English, but is superseded by English subsequently too as mentioned in section 3 of the same constitutional article that is put to effect by The Official Languages Act, 1963. Despite the misconceptions, Hindi is not the national language of India. The Constitution of India does not give any language the status of national language.

Language Families (Genetic Variation) of India

Indian language families

The languages of India belong to several language families, the most important of which are:

  1. Indo-Aryan language family
  2. Dravidian language family
  3. Austroasiatic language family
  4. Sino-Tibetan language family
  5. Great Andamanese languages

Apart from the above five, Tai–Kadai language family is also present in north-eastern India.

Indo-Aryan Language Family

Indo-Aryan Languages

Indo-Aryan languages of Indian subcontinent

The Indo-Aryan or Indic languages are the dominant language family of the Indian subcontinent. They constitute a branch of the Indo-Iranian languages, itself a branch of the Indo-European language family.

In India, it is the largest language family both in geographical spread and numerical strength. Its speakers constitute 76.86% of the total Indian population and consists of 21 languages.
These 21 languages are: Assamese, Bengali, Bhili/Bhilodi, Bishnuputiya, Dogri, Gujarati, Halabi, Hindi, Kashmiri, Khandeshi, Konkani, Lahnda, Maithili, Marathi, Nepali, Oriya, Punjabi, Sanskrit, Shina, Sindhi, Urdu.

Out of these, 15 have been recognized by the Constitution as Scheduled languages. They are Assamese, Bengali, Dogri, Gujarati, Hindi, Kashmiri, Konkani, Maithili, Marathi, Nepali, Oriya, Punjabi, Sanskrit, Sindhi, Urdu.

The Indo-Aryan languages basically cover the northern and central region of India.

Development of Indo-Aryan Language

1. Proto-Indo-Aryan: It is a proto-language hypothesized to have been the direct ancestor of all Indo-Aryan languages. It would have had similarities to Proto-Indo-Iranian, but would ultimately have used Sanskritized phonemes and morphology.

2. Old Indo-Aryan (ca. 1500–300 BCE)

Old Indo-Aryan language is categorized as Vedic Sanskrit and Classical Sanskrit. Sanskrit literally means "put together", meaning perfected or elaborated. It was developed as the prestige language of culture, science and religion, as well as the court, theatre, etc.

  • Vedic Sanskrit (early Old Indo-Aryan) (1500 to 500 BCE): Vedic Sanskrit is the language of the Vedas. The hymns preserved in the Rigveda were preserved by oral tradition alone over several centuries before the introduction of writing, the oldest among them predating the introduction of Brahmi by as much as a millennium. The end of the Vedic period is marked by the composition of the Upanishads after when Sanskrit began the transition from a first language to a second language of religion and learning, marking the beginning of the Classical period.
  • Epic Sanskrit or Classical Sanskrit or Paninian Sanskrit (late Old Indo-Aryan) (500 to 300 BCE): Vedic Sanskrit and Classical or Sanskrit, while broadly similar, are separate varieties, which differ in a number of points of phonology, vocabulary, and grammar. The oldest surviving Sanskrit grammar is Pāṇini's Aṣtādhyāyī ("Eight-Chapter Grammar") dating to c. the 5th century BCE. At that time, Sanskrit was not thought of as a specific language rather as a particularly refined or perfected manner of speaking. Knowledge of Sanskrit was a marker of social class and educational attainment.

3. Middle Indo-Aryan or Prakrits, Old Odia (ca. 300 BCE to 1500 CE)

Prakrit: Prakrit literally means "original, natural, artless, normal, ordinary, usual", i.e. "vernacular", in contrast to samskrta "excellently made". Outside the learned sphere of Sanskrit, vernacular dialects (Prakrits) continued to evolve. Some modern scholars include all Middle Indo-Aryan languages under the rubric of "Prakrits".

The Prakrits became literary languages, generally patronized by kings identified with the kshatriya caste. The oldest attested Prakrits are the Buddhist and Jain canonical languages Pali and Ardha Magadhi, respectively and the earliest inscriptions in Prakrit are those of Ashoka. By medieval times, the Prakrits had diversified into various Middle Indo-Aryan dialects.

In Sanskrit drama, kings speak in Prakrit when addressing women or servants, in contrast to the Sanskrit used in reciting more formal poetic monologues.

Apabhramsas: The Prakrits were gradually transformed into Apabhraṃśas (अपभ्रंश) which were used until the 13th century CE. The term apabhraṃśa, meaning "fallen away", refers to the dialects of Northern India before the rise of modern Northern Indian languages, and implies a corrupt or non-standard language.

It connects late Middle Indo-Aryan with early Modern Indo-Aryan, spanning roughly the 6th to 13th centuries. Some of these dialects showed considerable literary production; the Sravakachar of Devasena (dated to the 930s) is now considered to be the first Hindi book. The two largest languages that formed from Apabhramsa were Bengali and Hindustani; others include Sindhi, Gujarati, Odia, Marathi, and Punjabi.

Under the flourishing Turco-Mongol Mughal empire, Persian became very influential as the language of prestige of the Islamic courts due to adoptation of the foreign language by the Mughal emperors. However, Persian was soon displaced by Hindustani. Hindustani language is a combination with Persian, Arabic, and Turkic elements in its vocabulary, with the grammar of the local dialects.

4. Early Modern Indo-Aryan (Late Medieval India)

  • early Dakkhini and emergence of Khariboli

Dravidian Language Family

The Dravidian family of languages includes approximately 73 languages that are mainly spoken in southern India and northeastern Sri Lanka, as well as certain areas in Pakistan, Nepal, Bangladesh, and eastern and central India, as well as in parts of southern Afghanistan. Caldwell coined the term "Dravidian" from the Sanskrit drāvida, related to the word ‘Tamil’ or ‘Tamilan’, and was was used in a 7th-century text to refer to the languages of the southern India.

Dravidian languages

In India, this family is the second largest language family. It constitutes 20.82% of the Indian population. There are 4 are Scheduled languages from this family. They are Kannada, Malayalam, Tamil and Telugu.

Though some scholars have argued that the Dravidian languages may have been brought to India by migrations in the fourth or third millennium BCE or even earlier, the Dravidian languages cannot easily be connected to any other language, and they could well be indigenous to India. Proto-Dravidian languages were spoken in India in the 4th millennium BCE and started disintegrating into various branches around 3rd millennium BCE. The Indus Valley civilisation (3,300-1,900 BCE) is often identified as having been Dravidian.

The Dravidian languages are classified in four groups:

  1. North (Brahul, Kurukh, Malto)
  2. Central (Kolami–Parji)
  3. South-Central (Telugu–Kui)
  4. South Dravidian (Tamil-Kannada).

Only two Dravidian languages are exclusively spoken outside India: Brahui in Pakistan's and to a lesser extant Afghanistan's Balochistan region, and Dhangar, a dialect of Kurukh, in parts of Nepal and Bhutan. Dravidian place names along the Arabian Sea coasts and Dravidian grammatical influence such as clusivity in the Indo-Aryan languages, namely Marathi, Konkani, Gujarati, Marwari, and Sindhi, suggest that Dravidian languages were once spoken more widely across the Indian subcontinent.

There are also small groups of Dravidian-speaking scheduled tribes, who live beyond the mainstream communities, such as the Kurukh in Eastern India, Kui people of Odisha and Gond tribes in Central India.

Dravidian Influence on Sanskrit

Dravidian languages show extensive lexical (vocabulary) borrowing, but only a few traits of structural (either phonological or grammatical) borrowing from Indo-Aryan, whereas Indo-Aryan shows more structural than lexical borrowings from the Dravidian languages. Many of these features are already present in the oldest known Indo-Aryan language, the language of the Rigveda (c. 1500 BCE), which also includes over a dozen words borrowed from Dravidian. In addition, a number of grammatical features of Vedic Sanskrit not found in its sister Avestan language appear to have been borrowed from Dravidian languages.

Dravidian Literature

Four Dravidian languages, Tamil, Kannada, Malayalam and Telugu, have lengthy literary traditions. Literature in Tulu and Kodava is more recent. The earliest known Dravidian inscriptions are 76 Old inscriptions on cave walls in Madurai and Tirunelveli districts in Tamil Nadu, dating from the 2nd century BCE. These inscriptions are written in a variant of the Brahmi script called Tamil Brahmi. The earliest long text in Old Tamil is the Tolkāppiyam, an early work on Tamil grammar and poetics, whose oldest layers could date from the 1st century BCE.

Austroasiatic Language Family

Austroasiatic is a family with smaller numbers of speakers constituting 1.11% of the total population of India. It consists of 14 languages, which can be divided into two groups

  1. Khmer-Nicobarese: It consists of 2 languages - Khasi and Nicobarese.
  2. Munda: This group comprises of 12 languages -  Bhumij, Gadaba, Ho, Juang, Kharia, Koda/Kora, Korku, Korwa, Munda, Mundari,  Santali and Savara.

The Austroasiatic language family (austro meaning South) is the autochthonous (meaning indigenous) language in South Asia and Southeast Asia, other language families having arrived by migration. Languages of this family are spoken in the islands of Andaman and Nicobar and some states like Jharkhand , Madhya Pradesh, etc. Among all these languages only Santali has been included in the Eighth scheduled, under the  92nd Constitutional Amendment in 2003.

All Austroasiatic languages on Indian territory are endangered with the exceptions of Khasi and Santhali.

Sino-Tibetan Language Family

The Sino-Tibetan language family are well represented in India. However, their interrelationships are not discernible, and the family has been described as "a patch of leaves on the forest floor" rather than with the conventional metaphor of a "family tree".

Sino-Tibetan languages are spoken across the Himalayas in the regions of Ladakh, Himachal Pradesh, Nepal, Sikkim, Bhutan, Arunachal Pradesh, and also in the Indian states of West Bengal, Assam, (hills and autonomous councils - BTC) Meghalaya, Nagaland, Manipur, Tripura and Mizoram. In the North-Eastern India, mainly the tribal languages belong to Sino-Tibetan family except Khasi in Meghalaya whichbelong to the Mon-Khmer group of the Austro-Asiatic family.

It is smallest in population strength and largest in the numbers of languages. It constitutes 1.0% speakers of the total population of India. It consists of 66 languages.

It has three main sub branches:

  1. Tibeto-Himalayan:  It consists of two groups - Bhotia and Himalayan.
    • Bhotia: The languages of Bhotia groups are - Balti, Bhotia, Ladakhi, Lahauli, Monpa, Sherpa and Tibetan.
    • Himalayan: It consists of 3 languages: Kinnauri, Limbu and Lepcha.
  2. North-Assam: It consists of 3 languages: Adi, Nissi/Dafla and Mishmi.
  3. Assam-Burmese: The languages are Bodo, Burmese, Kuki-Chin and Naga.

Sino-Tibetan languages spoken in India include two scheduled languages Meitei and Bodo.

Andamanese Language Family

The extinct and endangered languages of the Andaman Islands form a fifth family - the Great Andamanese language family. While some connections have been tentatively proposed with other language families, the consensus view is currently that Andamanese languages form a separate language family — or rather, two unrelated linguistic families:

  1. Great Andamanese, comprising a number of extinct languages apart from one highly endangered language with a dwindling number of speakers.
  2. Ongan family of the southern Andaman Islands, comprising two extant languages, Önge and Jarawa, and one extinct tongue, Jangil.

Sentinelese is also considered a third one but its number of speakers is unknown and hence unclassifiable.

In addition, Sentinelese, an unattested language of the Andaman Islands, is generally considered to be related and part of the language family.

By the late 18th century, when the British first settled on the Andaman islands, there were an estimated 5,000 Great Andamanese living on Great Andaman and surrounding islands, comprising 10 distinct tribes with distinct but closely related languages. By 1994 seven of the ten tribes were already extinct, and divisions among the surviving tribes (Jeru, Bo and Cari) had effectively ceased to exist due to intermarriage and resettlement. Hindi increasingly serves as their primary language, and is the only language for around half of them.

About half of the population now speak what may be considered a new language (a kind of mixed or koine language) of the Great Andamanese family, based mainly on Aka-Jeru. This modified version has been called "Present Great Andamanese" by some scholars, but also may be referred to simply as "Jero" or "Great Andamanese".

Language Isolates

The only language found in the Indian mainland that is considered a language isolate is Nihali and is found in Madhya Pradesh and Maharashtra. The status of Nihali is ambiguous, having been considered as a distinct Austro-Asiatic language, as a dialect of Munda language and also as being a "thieves' argot" rather than a legitimate language.

The validity of the Great Andamanese language group as a language family has been questioned and it has been considered a language isolate by some authorities.

In addition, a Bantu language, Sidi, was spoken until the mid-20th century in Gujarat.

Language Scripts

Indus Script

The Indus script is the short strings of symbols (around 400 distinct signs) associated with the Harappan civilization used between 2600–1900 BCE, which evolved from an early Indus script attested from around 3500–3300 BCE. The symbols are most commonly associated with flat, rectangular stone tablets called seals. Over 4000 symbol-bearing objects have been discovered, some as far afield as Mesopotamia. After 1500 BCE, coinciding with the final stage of Harappan civilization, use of the symbols ends. The symbols remain undeciphered, and some scholars classify them as proto-writing rather than writing proper.


Family-wise grouping of the 122 Scheduled and Non-Scheduled Languages -2001
Language FamiliesNumber of LanguagesPercentage to total population

1. Indo-European

(a) Indo-Aryan


(Assamese, Bengali, Bhili/Bhilodi, Bishnupuriya, Dogri, Gujarati, Halabi, Hindi, Kashmiri, Khandeshi, Konkani, Lahnda, Maithili, Marathi, Nepali, Oriya, Punjabi, Sanskrit, Shina, Sindhi, Urdu)

(b) Iranian2 (Afghani/Kabuli/Pashto, Persian)0.00
(c) Germanic1 (English)0.02
2. Dravidian


(Coorgi/Kodagu, Gondi, Jatapu, Kannada, Khond/Kondh, Kisan, Kolami, Konda, Koya, Kui, Kurukh/Oraon, Malayalam, Malto, Parji, Tamil, Telugu, Tulu)

3. Austro-Asiatic


(Bhumij, Gadaba, Ho, Juang, Kharia, Khasi, Koda/Kora, Korku, Korwa, Munda, Mundari, Nicobarese, Santali, Savara)

4. Tibeto-Burmese


(Adi, Anal, Angami, Ao, Balti, Bhotia, Bodo, Chakesang, Chakru/Chokri, Chang, Deori, Dimasa, Gangte, Garo, Halam, Hmar, Kabui, Karbi/Mikir, Khezha, Khiemnungan, Kinnauri, Koch, Kom, Konyak, Kuki, Ladakhi, Lahauli, Lakher, Lalung, Lepcha, Liangmei, Limbu, Lotha, Lushai/Mizo, Manipuri, Maram, Maring, Miri/Mishing, Mishmi, Mogh, Monpa, Nissi/Dafla, Nocte, Paite, Pawi, Phom, Pochury, Rabha, Rai, Rengma, Sangtam, Sema, Sherpa, Simte, Tamang, Tangkhul, Tangsa, Thado, Tibetan, Tripuri, Vaiphei, Wancho, Yimchungre, Zeliang, Zemi, Zou)

5. Semito-Hamitic1 (Arabic/Arbi) 0.01


Diversity of Tribal Languages

It is mainly the tribal languages which add up more to India's linguistic diversity. There are tribal languages with speakers large in number.  But there are also many more tribal languages whose number of speakers left is very less even with a single soul.

According to the finding of 'People of India' project conducted by the anthropological survey of India, the tribal communities speaking Indo-Aryan are 163, Dravidian 107, Tibeto-Burman 143, Austro-Asiatic 30 and Andamanese 4. In spite of being so many in numbers, the these languages have very few speakers. And there is a gradual decline in the number of speakers of these languages. One strong reason for this can be their pressure of becoming bilingual or multilingual in order to sustain their identity. In order to compete in the race of dominance the speakers of tribal languages are shifting to non-tribal mother tongues. Basically the tribal communities living in the central belt are more prone to assimilation which is basically a non-tribal or Hindi dominant area. The case is somewhat stable in the  north-eastern region of India. They are quite conscious of their ancestral identity.

Scheduled Languages

Total Scheduled Languages22
Indo-Aryan branch of the Indo-European family


(Assamese, Bengali, Dogri, Gujarati, Hindi, Kashmiri, Konkani, Maithili, Marathi, Nepali, Oriya, Punjabi, Sanskrit, Sindhi, Urdu)

Austro-Asiatic family1 (Santali)
Dravidian family4 (Kannada, Malayalam, Tamil, Telugu)
Tibeto-Burmese family

2 (Bodo, Manipuri)


