dexter
September 28, 2019, 6:01pm
1
I am using NLTK to process the language coming from Slack. Mostly a human-readable way for commands.
To install it - pip install nltk.
Then you need to do python
in the terminal. It will open up the python. Do the following to install some dependencies we need.
import nltk
nltk.download(‘punkt’)
nltk.download(‘averaged_perceptron_tagger’)
Then here is the code snippet to extract all nouns from a sentence:
import nltk
lines = "what's of banknifty"
tokenized = nltk.word_tokenize(lines)
tagged = nltk.pos_tag(tokenized)
nouns = [word for (word, pos) in tagged if(pos[:2] == 'NN')]
print (nouns)
It will just return banknifty
. Luckily, There are no FNO stocks which are non-noun.
For Further read - Penn Treebank P.O.S. Tags
dexter
September 28, 2019, 6:09pm
2
Actually, here is how I checked if there is all noun amidst FNO stocks.
import nltk
import requests
positions = requests.get('https://www.nseindia.com/live_market/dynaContent/live_watch/stock_watch/foSecStockWatch.json').json()
i=0
endp = len(positions['data'])
for x in range(i, endp):
lines = positions['data'][x]['symbol']
tokenized = nltk.word_tokenize(lines)
tagged = nltk.pos_tag(tokenized)
nouns = [word for (word, pos) in tagged if(pos[:2] == 'NN')]
print (nouns)
It printed -
['IDEA']
['MANAPPURAM'] ['MFSL'] ['BHARTIARTL'] ['COLPAL']
['SUNTV'] ['ICICIPRULI'] ['BATAINDIA'] ['EQUITAS'] ['BAJFINANCE'] ['MINDTREE'] ['BERGEPAINT']
['GODREJCP'] ['SIEMENS']
['PAGEIND'] ['ITC'] ['MUTHOOTFIN'] ['BAJAJFINSV'] ['PIDILITIND'] ['BIOCON'] ['KOTAKBANK'] ['RELIANCE'] ['IOC']
['TATAGLOBAL']
['VOLTAS'] ['HEXAWARE'] ['CIPLA'] ['MARICO'] ['CASTROLIND'] ['PETRONET'] ['NTPC'] ['NMDC'] ['AXISBANK'] ['NIITTECH']
['NATIONALUM'] ['HINDPETRO'] ['HDFCBANK'] ['LICHSGFIN']
['FEDERALBNK'] ['PNB']
['TORNTPOWER'] ['GMRINFRA']
['DABUR'] ['INFY'] ['UBL']
['CHOLAFIN']
['WIPRO']
['ULTRACEMCO']
['BANKBARODA']
['UPL'] ['HCLTECH'] ['SBIN'] ['TORNTPHARM']
['ASIANPAINT'] ['ICICIBANK']
['TATAELXSI']
['SHREECEM']
['DIVISLAB']
['MRF'] ['TITAN']
['BAJAJ-AUTO'] ['EICHERMOT'] ['ACC'] ['SRF']
['POWERGRID']
['INFRATEL'] ['EXIDEIND']
['OIL']
['BHARATFORG'] ['SRTRANSFIN']
['BOSCHLTD'] ['PFC'] ['PVR']
['HAVELLS'] ['TATACHEM']
['MGL']
['UJJIVAN']
['HINDUNILVR']
['HEROMOTOCO'] ['LT'] ['CADILAHC'] ['TECHM'] ['MCDOWELL-N']
['MARUTI']
['JUSTDIAL']
['TCS']
['HDFC']
['UNIONBANK'] ['DRREDDY']
['NESTLEIND'] ['LUPIN'] ['CANBK']
['TVSMOTOR']
['AMBUJACEM']
['IGL']
['GAIL']
['CESC'] ['APOLLOHOSP']
['DLF'] ['APOLLOTYRE'] ['JSWSTEEL'] ['MOTHERSUMI']
['COALINDIA']
['ADANIPOWER'] ['ASHOKLEY']
['RECLTD']
['CENTURYTEX'] ['RAMCOCEM']
['INDIGO'] ['M', 'MFIN'] ['BALKRISIND'] ['TATAPOWER'] ['JUBLFOOD']
['BANKINDIA']
['SUNPHARMA']
['AUROPHARMA']
['M', 'M']
['ADANIENT'] ['HINDALCO'] ['GRASIM']
['BEL'] ['ESCORTS']
['BRITANNIA'] ['CUMMINSIND']
['BPCL']
['IDFCFIRSTB'] ['ADANIPORTS']
['AMARAJABAT'] ['L', 'TFH'] ['BHEL'] ['SAIL']
['TATAMOTORS'] ['GLENMARK']
['TATAMTRDVR']
['NBCC']
['ONGC']
['CONCOR'] ['ZEEL'] ['TATASTEEL'] ['INDUSINDBK']
['YESBANK']
['NCC'] ['VEDL']
['RBLBANK']
['JINDALSTEL']
['IBULHSGFIN'] ['DISHTV'] ['PEL'] ['STAR']
Apart from &
sign of few stocks there are no problem, so we can write a seperate classifier for it.
dexter
September 28, 2019, 6:21pm
3
Here goes the classifier too.
import nltk
import requests
positions = requests.get('https://www.nseindia.com/live_market/dynaContent/live_watch/stock_watch/foSecStockWatch.json').json()
i=0
endp = len(positions['data'])
for x in range(i, endp):
lines = positions['data'][x]['symbol']
tokenized = nltk.word_tokenize(lines)
tagged = nltk.pos_tag(tokenized)
nouns = [word for (word, pos) in tagged if(pos[:2] == 'NN')]
try:
var = str(nouns[0] +'&' +nouns[1])
except:
var = str(nouns[0])
if(str(var)==str(lines)):print("true")
It confirms that our vendatta is correct. It posts true
for all data
dexter
September 28, 2019, 6:45pm
4
Then the terms like ltp, price
are noun. So we need to add another classifier towards that.
tags = ["ltp", "price"]
for tag in tags:
if tag in str2:
str2 = str2.replace(tag, '')