Natural Language Processing in Python

I am using NLTK to process the language coming from Slack. Mostly a human-readable way for commands.

To install it - pip install nltk.
Then you need to do python in the terminal. It will open up the python. Do the following to install some dependencies we need.

import nltk‘punkt’)‘averaged_perceptron_tagger’)

Then here is the code snippet to extract all nouns from a sentence:

import nltk

lines = "what's of banknifty"
tokenized = nltk.word_tokenize(lines)
tagged = nltk.pos_tag(tokenized)
nouns = [word for (word, pos) in tagged if(pos[:2] == 'NN')]
print (nouns)

It will just return banknifty. Luckily, There are no FNO stocks which are non-noun.

For Further read - Penn Treebank P.O.S. Tags

Actually, here is how I checked if there is all noun amidst FNO stocks.

import nltk
import requests

positions = requests.get('').json()
endp = len(positions['data'])
for x in range(i, endp):
    lines = positions['data'][x]['symbol']
    tokenized = nltk.word_tokenize(lines)
    tagged = nltk.pos_tag(tokenized)
    nouns = [word for (word, pos) in tagged if(pos[:2] == 'NN')]
    print (nouns)

It printed -











['DABUR'] ['INFY'] ['UBL']










['MRF'] ['TITAN']






['BOSCHLTD'] ['PFC'] ['PVR']


























['M', 'M']






['AMARAJABAT'] ['L', 'TFH'] ['BHEL'] ['SAIL']







['NCC'] ['VEDL']




Apart from & sign of few stocks there are no problem, so we can write a seperate classifier for it.

Here goes the classifier too.

import nltk
import requests

positions = requests.get('').json()
endp = len(positions['data'])
for x in range(i, endp):
    lines = positions['data'][x]['symbol']    
    tokenized = nltk.word_tokenize(lines)
    tagged = nltk.pos_tag(tokenized)
    nouns = [word for (word, pos) in tagged if(pos[:2] == 'NN')]
        var = str(nouns[0] +'&' +nouns[1])
        var = str(nouns[0])


It confirms that our vendatta is correct. It posts true for all data

Then the terms like ltp, price are noun. So we need to add another classifier towards that.

tags = ["ltp", "price"]

for tag in tags:
    if tag in str2:
        str2 = str2.replace(tag, '')