Natural Language Processing Class 10
Natural Language Processing, or NLP, is the sub-field of AI that is focused on enabling computers to
understand and process human languages. 
Natural Language Processing is all about how machines try to understand and interpret human
language and operate accordingly.
Applications of Natural Language Processing
Some of the applications of Natural Language Processing which are used in the real-life scenario:
1. Automatic Summarization: It is required when we need to access a specific and important piece of information from a huge knowledge base as well as to understand the emotional meanings within the information, such as in collecting data from social media.
2. Sentiment Analysis: The goal of sentiment analysis is to identify sentiment among several posts. Companies use Natural Language Processing applications, such as sentiment analysis, to identify opinion and sentiment online to help them understand what customers think about their products and services.
3. Text Classification: Text classification help to assign predefined categories to a document and organize it to help you find the information you need or simplify some activities. For example, an application of text categorization is spam filtering in email.
4. Virtual Assistants: These days Google Assistant, Cortana, Siri, Alexa, etc have become an integral part of our lives. We can talk to them but they also make our lives easier by keeping notes of our tasks, make calls for us, send messages and a lot more.
5. Chatbots: One of the most common applications of Natural Language Processing is a chatbot. There are a lot of chatbots available. for example
Mitsuku Bot
https://www.pandorabots.com/mitsuku/
CleverBot
https://www.cleverbot.com/
Jabberwacky
http://www.jabberwacky.com/
There are two types of chat bot available around us
- Script-bot
- Smart-bot
Difference between Script-bot and Smart-bot
| Script-bot | Smart-bot | 
| Script bots are less flexible and powerful | Smart-bots are more flexible and powerful | 
| Script bots work around a script which is programmed in them | Smart bots learn with more data. | 
| Limited functionality | Wide functionality | 
Natural Language Processing Class 10
Human Language VS Computer Language
Humans communicate through language which we process all the time. As a person speaks, the sound travels and enters the listener’s eardrum. This sound then converted into neuron impulse and transported to the brain for processing. After processing, the brain gains understanding around the meaning of sound.
The computer understands the language of numbers. Everything that is sent to the machine has to be converted to numbers. And while typing, if a single mistake is made, the computer throws an error and does not process that part. The communications made by the machines are very basic and simple.
Difficulties faced by machine to understand human language :
1. Arrangement of the words and meaning : There are rules in human language which provide structure to a language. There are nouns, verbs, adverbs, adjectives. A word can be a noun at one time and an adjective some other time.
2. Multiple meanings of a word : In natural language, a word can have multiple meanings and the meanings fit into the statement according to the context of it.
3. Perfect Syntax, no Meaning : Sometimes, a statement can have a perfectly correct syntax but it does not mean anything. For example, take a look at this statement:
Chickens feed extravagantly while the moon drinks tea.
This statement is correct grammatically but does not make any sense.
Natural Language Processing Class 10
How NLP makes it possible for the machines to understand and speak just like humans?
We all know that the language of computers is Numerical, so the very first step that comes to our mind is to convert our language to numbers. This conversion happen in various steps which are given below.
1. Text Normalisation : In Text Normalisation, we undergo several steps to normalise the text to a lower level. Text Normalisation helps in cleaning up the textual data in such a way that it comes down to a level where its complexity is lower than the actual data. Steps of Text Normalisation are:
a. Sentence Segmentation: In Sentence segmentation, the whole corpus(the whole textual data from all the documents) is divided into sentences.
b. Tokenisation : After segmenting the sentences, each sentence is then further divided into tokens. Tokens is a term used for any word or number or special character occurring in a sentence.
c. Removing Stopwords, Special Characters and Numbers : In this step, the tokens which are not necessary are removed from the token list.
d. Converting text to a common case : After the stopwords removal, we convert the whole text into a similar case, preferably lower case.
e. Stemming : It is the process in which the affixes of words are removed and the words we get after removing affix are called stem which may or may not be meaningful.
f. Lemmatization : In both stemming and lemmatization we remove the affixes of words but the difference between them is that in lemmatization, the word we get after affix removal (also known as lemma) is a meaningful one. It takes longer time to execute than stemming.
2. Bag of Words : In bag of words, we get the occurrences of each word and construct the vocabulary for the corpus. Bag of words gives us two things:
- A vocabulary of words for the corpus
- The frequency of these words (number of times it has occurred in the whole corpus).
Here is the step-by-step approach to implement bag of words algorithm:
- Text Normalisation: Collect data and pre-process it.
- Create Dictionary: Make a list of all the unique words occurring in the corpus. (Vocabulary)
- Create document vectors: For each document in the corpus, find out how many times the word from the unique list of words has occurred.
- Create document vectors for all the documents.
3. TFIDF: Term Frequency & Inverse Document Frequency
TFIDF helps un in identifying the value for each word. Let us understand each term one by one.
a. Term Frequency : Term frequency is the frequency of a word in one document. Term frequency can easily be found from the document vector table (created in Bag of Words steps)
In above table document frequency of ‘aman’, ‘anil’, ‘went’, ‘to’ and ‘a’ is 2 as they have occurred in two documents. Rest of them occurred in just one document so their document frequency is 1.
b. Inverse Document Frequency : Document Frequency is the number of documents in which the word occurs irrespective of how many times it has occurred in those documents. To obtain the inverse document frequency, we need to put the document frequency in the denominator while the total number of documents is the numerator.
Finally, the formula of TFIDF for any word W becomes:
TFIDF(W) = TF(W) * log( IDF(W) )
After applying the above formula, the words have been converted to numbers. These numbers are the values of each for each document.
Summary of the concept :
- Words that occur in all the documents with high term frequencies have the least values and are considered to be the stopwords.
- For a word to have high TFIDF value, the word needs to have a high term frequency but less document frequency.
- These values help the computer understand which words are to be considered while processing the natural language. The higher the value, the more important the word is for a given corpus.
Applications of TFIDF : TFIDF is commonly used in the Natural Language Processing domain. Some of its applications are:
- Document Classification : Helps in classifying the type and genre of a document.
- Topic Modelling : It helps in predicting the topic for a corpus.
- Information Retrieval System : To extract the important information out of a corpus
- Stop word filtering : Helps in removing the unnecessary words out of a text body.
Disclaimer : I tried to give you the correct “Natural Language Processing Class 10 Notes” , but if you feel that there is/are mistakes in “Natural Language Processing Class 10 Notes” given above, you can directly contact me at csiplearninghub@gmail.com. The above “Natural Language Processing Class 10 Notes” are created for practice of students and the entire content is from CBSE Study material. Screenshots used in above article “Natural Language Processing Class 10 Notes” is taken from CBSE study material.
IMPORTANT LINKS Class 10
AI CBSE Sample Question Paper 2022
AI CBSE Sample Question Paper 2022 Marking Scheme
AI Project Cycle – NOTES
AI Project Cycle – MCQ
Introduction to AI – NOTES
Introduction to AI – MCQ
NLP in AI – MCQ
Natural Language Processing Class 10
Natural Language Processing Class 10
Natural Language Processing Class 10
Natural Language Processing Class 10
Natural Language Processing Class 10
Natural Language Processing Class 10
Natural Language Processing Class 10
Natural Language Processing Class 10