How to install/configure NLTK with window system. You can follow below steps :
First of check which Python version has been installed
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJJoooAuiph-oRF37oVX1XcVDdDlvtWI8d6VGso9l4xKANXGn951HcpC1c8BA-EQW_DTiKJmTZAAsHL9k3vwVyJ8orQ7oWDZOmmxwPY_CDq0iXVcZtyw02MUBnbNCXOIBrHTiDS-56hiA/s640/Capture2.PNG)
If you couldn't configured yet, then you can download (https://www.python.org/downloads) & install first from given path.
Let's start installation process. Let install dumpy using following command
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhKi9lpb82FpY6kDUftedwgJRguv__nLESMyeh9WeeBclsrDnK0wa-dMap3Rk4MehcLKjP8doCOIw-lLlfKbypg7IAhlC5uNz9r5B8-457JzCpVuuqIzmeKC2nAjgmH3hgP1H1YqARwWDI/s640/Capture3.PNG)
Then, install NLTK using following command
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-We_f8ExAh_iX22jPTQk-E_hJ8EBUgDLcyGe_sh-efmMnV6YcjEA65tG9FXWQPuzYj4vC36-3ESPhCqfjJOWl2TiL1GlwSiaCh0LJVdICqD6mknde3ps0U-6juS2jr8souOU0kEky11Y/s640/Capture4.PNG)
Now, download NLTK packages using below command
>>> import nltk
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEicKKWhV0W-BPsSV8E6nwYf65HvG75QtCLGhjmergQIZQKhOLBGvc6v4m08TaqxT1D30KrCnZ1DT04v7Qf_mF9ehcjKpP4P320VF06xR0soBL3Wo2-MjjbCR0GFa1-JrjPVq7glML2wReg/s640/Capture8.PNG)
Once you fire download command then it will open a installer dialog from where you have to select your packages to install. You can select all to download all package.
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjCVsCy0dad3bL_iZxUNNjHZrO3LZWGo2v7sNGyi5aJrI62e8csFtVbqpfN5uQxh-DNOncXZt4dgfZMJJNcJ4cFNlG2NyqXriwRWhSnGWc-b3mWxN-qR7wI1AXT_zk7k8Xgx6iD1oT2Cfk/s640/Capture.PNG)
Once download complete. It will show like this
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh1vDoUmNOUvnOk9hHRtCI8abaEHKguDqrZz0qdbw7BgM23nKd1BaHZlXHtzHR0rfFsNne8_U52iLf3yqrCkvn0kJ4eAqS4P2o5wKKBMnh3drtaFP7b6dx3deUXqAj0LK71d6VQTSjCyZQ/s640/Capture1.PNG)
Once installed all packages then you click on close to return to the command prompt.
prompt will show true
Now, you can verify whether it's successfully install or not.
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgyKeRVTEhqDZF8i728zlfYJ_lndfiJc4Pxw_xumv0fFvty246lBQp4W9UmjbX2z0h6dP1C6IhlfYYLzFyowACEkN-kFo_TB315mrshMnoeS47t27omUlqeZjVwyc5OahYfI3uQ_NprTRg/s640/Capture7.PNG)
Yeah! It's work great!. You can post your query or installing issue!
First of check which Python version has been installed
If you couldn't configured yet, then you can download (https://www.python.org/downloads) & install first from given path.
Let's start installation process. Let install dumpy using following command
Then, install NLTK using following command
Now, download NLTK packages using below command
>>> import nltk
Once you fire download command then it will open a installer dialog from where you have to select your packages to install. You can select all to download all package.
Once download complete. It will show like this
Once installed all packages then you click on close to return to the command prompt.
prompt will show true
Now, you can verify whether it's successfully install or not.
Yeah! It's work great!. You can post your query or installing issue!
Important Links
ReplyDeleteUse for grammar detection
http://rwet.decontextualize.com/book/textblob/
Beautiful parse
https://www.dataquest.io/blog/web-scraping-tutorial-python/
Beautiful Soup Collecting data
https://www.dataquest.io/blog/web-scraping-beautifulsoup/
Sample for website scraping
ReplyDeleteimport urllib3
import nltk
from bs4 import BeautifulSoup
from nltk.collections import *
import requests
import dateparser
prefixes = ["jan.", "feb.", "mar.", "apr.", "may.", "jun.", "jul.", "sept.", "oct.", "nov.", "dec.", "january", "february", "march", "april", "may", "june", "july", "august", "september", "october", "november", "december"]
#Fucaton will check wheter it's month valid data or not eg. "Feb. 1:"
#Return 1 = success, 0 = fail
def check_start_with_syntax(p_tags_text):
if p_tags_text.lower().startswith(tuple(prefixes)):
return 1
else:
return 0
#Fuction will parse the date e.g Feb.1, July 29/30
#return start date yyyy/mm/dd hh:mm:ss end date yyyy/mm/dd hh:mm:ss
def prase_event_date(p_tags_text):
event_parse_dt = 'un-formated'
event_parse_dt = p_tags_text[0:p_tags_text.index(':')]
#Check for start date & end date
#Let check date with this formate 29/30
if(event_parse_dt.find('/')==-1):
return dateparser.parse(event_parse_dt), dateparser.parse(event_parse_dt)
else:
event_parse_month_temp = str(event_parse_dt[0:p_tags_text.index(' ')]).strip()
event_parse_days_temp = str(event_parse_dt.replace(event_parse_month_temp,'')).strip()
event_parse_day_split = event_parse_days_temp.split('/')
return dateparser.parse(str(event_parse_month_temp+' '+event_parse_day_split[0])), dateparser.parse(str(event_parse_month_temp+' '+event_parse_day_split[1]))
url = "https://calendar.html"
request = requests.get(url)
#This will print
#print(request)
#This will print the status code >> 200
#print(request.status_code)
reponse_data = request.text
soup = BeautifulSoup(reponse_data,"html.parser")
#Get full article
article_containers = soup.find_all('div',class_ = 'article-content')
print(len(article_containers))
#Get all praragraph
article_all_paragraph = soup.find_all('p')
print(len(article_all_paragraph))
#Find all P
for p_tags in soup.find_all('p'):
#result = check_start_with_syntax(p_tags.get_text())
if(check_start_with_syntax(str(p_tags.get_text()).strip())==1):
print('\n')
start_date, end_date = prase_event_date(str(p_tags.get_text()).strip())
print(start_date)
print(end_date)
print(p_tags.get_text())
else:
#Nothing to do
nothing = 'nothing to do'
import urllib3
ReplyDeleteimport nltk
from bs4 import BeautifulSoup
from nltk.collections import *
import requests
import dateparser
prefixes = ["jan.", "feb.", "mar.", "apr.", "may.", "jun.", "jul.", "sept.", "oct.", "nov.", "dec.", "january", "february", "march", "april", "may", "june", "july", "august", "september", "october", "november", "december"]
#Fucaton will check wheter it's month valid data or not eg. "Feb. 1:"
#Return 1 = success, 0 = fail
def check_start_with_syntax(p_tags_text):
if p_tags_text.lower().startswith(tuple(prefixes)):
return 1
else:
return 0
#Fuction will parse the date e.g Feb.1, July 29/30
#return start date yyyy/mm/dd hh:mm:ss end date yyyy/mm/dd hh:mm:ss
def prase_event_date(p_tags_text):
event_parse_dt = 'un-formated'
print(p_tags_text)
if(p_tags_text.find(':')==-1):
return dateparser.parse(event_parse_dt), dateparser.parse(event_parse_dt)
else:
event_parse_dt = p_tags_text[0:p_tags_text.find(':')]
#Check for start date & end date
#Let check date with this formate 29/30
if(event_parse_dt.find('/')==-1):
return dateparser.parse(event_parse_dt), dateparser.parse(event_parse_dt)
else:
event_parse_month_temp = str(event_parse_dt[0:p_tags_text.index(' ')]).strip()
event_parse_days_temp = str(event_parse_dt.replace(event_parse_month_temp,'')).strip()
event_parse_day_split = event_parse_days_temp.split('/')
return dateparser.parse(str(event_parse_month_temp+' '+event_parse_day_split[0])), dateparser.parse(str(event_parse_month_temp+' '+event_parse_day_split[1]))
#Let code begin
filepath = 'InputText.txt'
with open(filepath) as fp:
line = fp.readline()
while line:
line = fp.readline()
if(len(line)!=0):
#print(line.strip())
if(check_start_with_syntax(str(line).strip())==1):
print('\n')
start_date, end_date = prase_event_date(str(line).strip())
print(start_date)
print(end_date)
print(str(line).strip())
else:
#Nothing to do
nothing = 'nothing to do'
url = "https://www.space.com/32286-space-calendar.html"
request = requests.get(url)
#This will print
print(request)
#This will print the status code >> 200
#print(request.status_code)
reponse_data = request.text
soup = BeautifulSoup(reponse_data,"html.parser")
#Get full article
article_containers = soup.find_all('div',class_ = 'article-content')
print(len(article_containers))
#Get all praragraph
article_all_paragraph = soup.find_all('p')
print(len(article_all_paragraph))
#Find all P
#for p_tags in soup.find_all('p'):
# #result = check_start_with_syntax(p_tags.get_text())
# if(check_start_with_syntax(str(p_tags.get_text()).strip())==1):
# print('\n')
# start_date, end_date = prase_event_date(str(p_tags.get_text()).strip())
# print(start_date)
# print(end_date)
# print(p_tags.get_text())
# else:
# #Nothing to do
# nothing = 'nothing to do'
How to parse words
ReplyDeleteimport nltk
from nltk.util import ngrams
from collections import Counter
from itertools import chain
wordSet="gmt a.m at"
n = 1
ngrams1= ngrams(wordSet.split(" "), n)
#Let code begin
filepath = 'InputText.txt'
with open(filepath) as fp:
line = fp.readline()
while line:
line = fp.readline()
if(len(line)!=0):
#print(line.strip())
ngrams2= ngrams(line.strip().split(" "), n)
counter= Counter(chain(ngrams2,ngrams1))
print([k[0] for k,v in counter.items() if v>1])