grouping array

2005-09-29 Thread pkilambi
hi if I have an array

say x = [[2,2,0,0,1,1],
 [1,1,0,0,1,1],
 [1,1,0,0,1,1]]
I basically want to group regions that are non zero like I want to get
the coordinates of non zero regions..as (x1,y1,x2,y2)
[(0,0,2,1),(0,4,2,5)] which show the top left(x1,y1) and bottom
right(x2,y2) corners of each group.hope i am clear.

Thanks

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: grouping array

2005-09-29 Thread pkilambi
sure:
basically I am looking for clustering of non zero groups in that 2D
list...so in the above array the first non zero cluster is 2,2 in row
0, 1,1 in row 1 and 1,1 in row 1 so if we think of this as one group we
have the first element of the group is at (0,0) in the list and last is
at (2,1) in the list so we have that group represented as (0,0,2,1)
similarly the second non group...and finally we get the result list
with the location of that group in the whole list as
[(0,0,2,1),(0,4,2,5)...]

hope I am clear.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: grouping array

2005-09-29 Thread pkilambi
1. why are you creating an Image object here? cant this be done by
handling lists?
2. What exactly is getprojection doing?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: grouping array

2005-09-30 Thread pkilambi
fredrick's solutions seems to be more closer to what I was looking
for.But I am still not sure if that could be done without the use of
Image module.
Also in your solution I cannot follow this
[[1, 1, 2, 1, 2, 0],
   [2, 0, 0, 2, 0, 1],
   [1, 2, 2, 0, 2, 0],
   [0, 1, 0, 0, 0, 0],
   [2, 0, 0, 1, 1, 0],
   [2, 2, 2, 0, 1, 0]]
  >>> print "\n".join(str(reg) for reg in getregions(x))
  [(0, 1), (0, 0), (0, 2), (1, 0), (0, 3), (2, 0), (1, 3), (0, 4), (2,
1), (3,
1), (2, 2)]
  [(5, 4), (4, 4), (4, 3)]
  [(5, 0), (5, 1), (4, 0), (5, 2)]
  [(1, 5)]
  [(2, 4)]
This is kind of confusing...could you please correlate the grid to the
result and explain

-- 
http://mail.python.org/mailman/listinfo/python-list


Searching files in directories

2005-10-14 Thread pkilambi
can anyone help me with this...

I want to search for a list for files in a given directory and if it
exists copy them to destination directory

so what i am looking for is :

file = 'file1.txt'
source_directory  = '/tmp/source/'
destination_directory = '/tmp/destination/'

so If the file exists in source_directory cp that file to the
destination_directory..

hope I am clear

-- 
http://mail.python.org/mailman/listinfo/python-list


help make it faster please

2005-11-10 Thread pkilambi
I wrote this  function which does the following:
after readling lines from file.It splits and finds the  word occurences
through a hash table...for some reason this is quite slow..can some one
help me make it faster...
f = open(filename)
lines = f.readlines()
def create_words(lines):
cnt = 0
spl_set = '[",;<>{}_&?!():-[\.=+*\t\n\r]+'
for content in lines:
words=content.split()
countDict={}
wordlist = []
for w in words:
w=string.lower(w)
if w[-1] in spl_set: w = w[:-1]
if w != '':
if countDict.has_key(w):
countDict[w]=countDict[w]+1
else:
countDict[w]=1
wordlist = countDict.keys()
wordlist.sort()
cnt += 1
if countDict != {}:
for word in wordlist: print (word+' '+
str(countDict[word])+'\n')

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: help make it faster please

2005-11-10 Thread pkilambi
Oh sorry indentation was messed here...the
wordlist = countDict.keys()
wordlist.sort()
should be outside the word loop now
def create_words(lines):
cnt = 0
spl_set = '[",;<>{}_&?!():-[\.=+*\t\n\r]+'
for content in lines:
words=content.split()
countDict={}
wordlist = []
for w in words:
w=string.lower(w)
if w[-1] in spl_set: w = w[:-1]
if w != '':
if countDict.has_key(w):
countDict[w]=countDict[w]+1
else:
countDict[w]=1
wordlist = countDict.keys()
wordlist.sort()
cnt += 1
if countDict != {}:
for word in wordlist: print (word+' '+
str(countDict[word])+'\n')

ok now this is the correct question I am asking...

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: help make it faster please

2005-11-10 Thread pkilambi
Actually I create a seperate wordlist for each so called line.Here line
I mean would be a paragraph in future...so I will have to recreate the
wordlist for each loop

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: help make it faster please

2005-11-10 Thread pkilambi
ok this sounds much better..could you tell me what to do if I want to
leave characters like @ in words.So I would like to consider this as a
part of word

-- 
http://mail.python.org/mailman/listinfo/python-list


ignore specific data

2005-11-21 Thread pkilambi
Hi I need help. What I want to do is If I read a file with some text
content...
I would like to ignore a block of lines and consider the rest..
so if the block starts with

"start of block."
fjesdgsdhfgdlgjklfjdgkd
jhcsdfskdlgjkljgkfdjkgj
"end of block"

I want to ignore this while processing the file .This block could
appear anywhere in the file.It could at the start or end or even middle
of file content.

Hope I'm clear...

somethin like

f = open("file")
clean_data = ignore_block(f)

here ignore_data should filter the block

def ignore_data(f):
   .
   return data # may be an array of remaining lines...

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ignore specific data

2005-11-21 Thread pkilambi
thanks for that. But this will check for the exact content of the
"start of block.." or "end of block". How about if the content is
anywhere in the line?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ignore specific data

2005-11-21 Thread pkilambi
I tried the solutions you provided..these are not as robust as i
thought would be...
may be i should put the problem more clearly...

here it goes

I have a bunch of documents and each document has a header which is
common to all files. I read each file process it and compute the
frequency of words in each file. now I want to ignore the header in
each file. It is easy if the header is always at the top. but
apparently its not. it could be at the bottom as well. So I want a
function which goes through the file content and ignores the common
header and return the remaining text to compute the frequencies..Also
the header is not just one line..it includes licences and all other
stuff and may be 50 to 60 lines as well..This "remove_header" has to be
much more efficient as the files may be huge. As this is a very small
part of the whole problem i dont want this to slow down my entire
code...

-- 
http://mail.python.org/mailman/listinfo/python-list