RE: Review Request of Python Code

Joaquin Alzola Thu, 10 Mar 2016 11:29:45 -0800

SQL doesn't allow decimal numbers for LIMIT.
Use decimal numbers it still work but is the proper way.

Then clean up a bit your code and remove the commented lines #

-----Original Message-----
From: Python-list 
[mailto:python-list-bounces+joaquin.alzola=lebara....@python.org] On Behalf Of 
subhabangal...@gmail.com
Sent: 10 March 2016 18:12
To: python-list@python.org
Subject: Re: Review Request of Python Code

On Wednesday, March 9, 2016 at 9:49:17 AM UTC+5:30, subhaba...@gmail.com wrote:
> Dear Group,
>
> I am trying to write a code for pulling data from MySQL at the backend and 
> annotating words and trying to put the results as separated sentences with 
> each line. The code is generally running fine but I am feeling it may be 
> better in the end of giving out sentences, and for small data sets it is okay 
> but with 50,000 news articles it is performing dead slow. I am using 
> Python2.7.11 on Windows 7 with 8GB RAM.
>
> I am trying to copy the code here, for your kind review.
>
> import MySQLdb
> import nltk
> def sql_connect_NewTest1():
>     db = MySQLdb.connect(host="localhost",
>                      user="*****",
>                      passwd="*****",
>                      db="abcd_efgh")
>     cur = db.cursor()
>     #cur.execute("SELECT * FROM newsinput limit 0,50000;") #REPORTING RUNTIME 
> ERROR
>     cur.execute("SELECT * FROM newsinput limit 0,50;")
>     dict_open=open("/python27/NewTotalTag.txt","r") #OPENING THE DICTIONARY 
> FILE
>     dict_read=dict_open.read()
>     dict_word=dict_read.split()
>     a4=dict_word #Assignment for code.
>     list1=[]
>     flist1=[]
>     nlist=[]
>     for row in cur.fetchall():
>         #print row[2]
>         var1=row[3]
>         #print var1 #Printing lines
>         #var2=len(var1) # Length of file
>         var3=var1.split(".") #SPLITTING INTO LINES
>         #print var3 #Printing The Lines
>         #list1.append(var1)
>         var4=len(var3) #Number of all lines
>         #print "No",var4
>         for line in var3:
>             #print line
>             #flist1.append(line)
>             linew=line.split()
>             for word in linew:
>                 if word in a4:
>                     windex=a4.index(word)
>                     windex1=windex+1
>                     word1=a4[windex1]
>                     word2=word+"/"+word1
>                     nlist.append(word2)
>                     #print list1
>                     #print nlist
>                 elif word not in a4:
>                     word3=word+"/"+"NA"
>                     nlist.append(word3)
>                     #print list1
>                     #print nlist
>                 else:
>                     print "None"
>
>     #print "###",flist1
>     #print len(flist1)
>     #db.close()
>     #print nlist
>     lol = lambda lst, sz: [lst[i:i+sz] for i in range(0, len(lst), sz)] 
> #TRYING TO SPLIT THE RESULTS AS SENTENCES
>     nlist1=lol(nlist,7)
>     #print nlist1
>     for i in nlist1:
>         string1=" ".join(i)
>         print i
>         #print string1
>
>
> Thanks in Advance.

****************************************************************************
Dear Group,

Thank you all, for your kind time and all suggestions in helping me.

Thank you Steve for writing the whole code. It is working full and fine. But 
speed is still an issue. We need to speed up.

Inada I tried to change to
cur = db.cursor(MySQLdb.cursors.SSCursor) but my System Admin said that may not 
be an issue.

Freidrich, my problem is I have a big text repository of .txt files in MySQL in 
the backend. I have another list of words with their possible tags. The tags 
are not conventional Parts of Speech(PoS) tags,  and bit defined by others.
The code is expected to read each file and its each line.
On reading each line it will scan the list for appropriate tag, if it is found 
it would assign, else would assign NA.
The assignment should be in the format of /tag, so that if there is a string of 
n words, it should look like, w1/tag w2/tag w3/tag w4/tag ....wn/tag,

where tag may be tag in the list or NA as per the situation.

This format is taken because the files are expected to be tagged in Brown 
Corpus format. There is a Python Library named NLTK.
If I want to save my data for use with their models, I need some 
specifications. I want to use it as Tagged Corpus format.

Now the tagged data coming out in this format, should be one tagged sentences 
in each new line or a lattice.

They expect the data to be saved in .pos format but presently I am not doing in 
this code, I may do that later.

Please let me know if I need to give any more information.

Matt, thank you for if...else suggestion, the data of NewTotalTag.txt is like a 
simple list of words with unconventional tags, like,

w1 tag1
w2 tag2
w3 tag3
...
...
w3  tag3

like that.

Regards,
Subhabrata

--
https://mail.python.org/mailman/listinfo/python-list
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.
-- 
https://mail.python.org/mailman/listinfo/python-list

RE: Review Request of Python Code

Reply via email to