Sorting a multidimensional array by multiple keys

2007-03-31 Thread Rehceb Rotkiv
Hello everyone,

can I sort a multidimensional array in Python by multiple sort keys? A 
litte code sample would be nice!

Thx,
Rehceb

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sorting a multidimensional array by multiple keys

2007-03-31 Thread Rehceb Rotkiv
> If you want a good answer you have to give me/us more details, and an
> example too.

OK, here is some example data:

reaction is BUT by the
sodium , BUT it is
sea , BUT it is
this manner BUT the dissolved
pattern , BUT it is
rapid , BUT it is

As each line consists of 5 words, I would break up the data into an array 
of five-field-arrays (Would you use lists or tuples or a combination in 
Python?). The word "BUT" would be in the middle, with two fields/words 
left and two fields/words right of it. I then want to sort this list by

- field 3
- field 4
- field 1
- field 0

in this hierarchy. This is the desired result:  

pattern , BUT it is
rapid , BUT it is
sea , BUT it is
sodium , BUT it is
reaction is BUT by the
this manner BUT the dissolved

The first 4 lines all could not be sorted by fields 3 & 4, as they are 
identical ("it", "is"), so they have been sorted first by field 1 (which 
is also identical: ",") and then by field 0:

pattern
rapid
sea
sodium

I hope I have explained this in an understandable way. It would be cool 
if you could show me how this can be done in Python!

Regards,
Rehceb
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sorting a multidimensional array by multiple keys

2007-03-31 Thread Rehceb Rotkiv
Wait, I made a mistake. The correct result would be

reaction is BUT by the
pattern , BUT it is
rapid , BUT it is
sea , BUT it is
sodium , BUT it is
this manner BUT the dissolved

because "by the" comes before and "the dissolved" after "it is". Sorry 
for the confusion.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sorting a multidimensional array by multiple keys

2007-03-31 Thread Rehceb Rotkiv
Thank you all for your helpful solutions!

Regards,
Rehceb
-- 
http://mail.python.org/mailman/listinfo/python-list


Unicode list

2007-03-31 Thread Rehceb Rotkiv
Hello,

I have this little grep-like program:

++snip++
#!/usr/bin/python

import sys
import re

pattern = sys.argv[1]
inputfile = file(sys.argv[2], 'r')

for line in inputfile:
matches = re.findall(pattern, line)
if matches:
print matches
++snip++

Like this, the program prints some characters as strange escape 
sequences, which is due to the input file being encoded in utf-8: When I 
convert "re.findall..." to a string and wrap an "unicode()" around it, 
the matches get printed correctly. Is it possible to make "matches" 
unicode without saving it as a single string first? The function "unicode
()" seems only to work for strings. Or is there a general way of telling 
Python to abandon the ancient and evil land of iso-8859 for good and use 
utf-8 only?

Regards,
Rehceb
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Unicode list

2007-04-01 Thread Rehceb Rotkiv
> When printing a list, the individual elements are converted with repr(),
> not with str(). For a string object, repr() adds escape codes for all
> bytes that are not printable ASCII characters.

Thanks Martin, you're right, it were the repr() calls that messed up the 
output. Iterating the array like you proposed is even 1/100s faster ;)

Regards,
Rehceb
-- 
http://mail.python.org/mailman/listinfo/python-list


Overlapping matches

2007-04-01 Thread Rehceb Rotkiv
In the re documentation, it says that the matching functions return "non-
overlapping" matches only, but I also need overlapping ones. Does anyone 
know how this can be done?

Regards,
Rehceb Rotkiv 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Overlapping matches

2007-04-01 Thread Rehceb Rotkiv
Both methods work well, thank you!
-- 
http://mail.python.org/mailman/listinfo/python-list


Checking whether list element exists

2007-04-07 Thread Rehceb Rotkiv
I want to check whether, for example, the element myList[-3] exists. So 
far I did it like this:

index = -3
if len(myList) >= abs(index):
print myList[index]

Another idea I had was to (ab-?)use the try...except structure:

index = -3
try:
print myList[index]
except:
print "Does not exist!"

Is it ok to use try...except for the test or is it bad coding style? Or 
is there another, more elegant method than these two?

Regards,
Rehceb
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Checking whether list element exists

2007-04-07 Thread Rehceb Rotkiv
> In general case it won't work, because lists accept negative indexes:
> http://docs.python.org/lib/typesseq.html, 3rd note.

Yes, I know! I _want_ the "3rd last list element", i.e. list[-3]. But it 
may be that the list does not have 3 elements. In this case, list[-3] 
will throw an error, cf.:

>>> arr = ['a','b','c']
>>> print arr[-3]
a
>>> print arr[-4]
Traceback (most recent call last):
  File "", line 1, in ?
IndexError: list index out of range
>>> 

I thought maybe I could catch the error with try...except so that I do 
not need the if-test, but I don't know whether this is proper usage of 
the try...except structure.
-- 
http://mail.python.org/mailman/listinfo/python-list


Unicode problem

2007-04-07 Thread Rehceb Rotkiv
Please have a look at this little script:

#!/usr/bin/python
import sys
import codecs
fileHandle = codecs.open(sys.argv[1], 'r', 'utf-8')
fileString = fileHandle.read()
print fileString

if I call it from a Bash shell like this

$ ./test.py testfile.utf8.txt

it works just fine, but when I try to pipe the output to another process 
("|") or into a file (">"), e.g. like this

$ ./test.py testfile.utf8.txt | cat

I get an error:

Traceback (most recent call last):
  File "./test.py", line 6, in ?
print fileString
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in 
position 538: ordinal not in range(128)

I absolutely don't know what's the problem here, can you help?

Thanks,
Rehceb
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Checking whether list element exists

2007-04-07 Thread Rehceb Rotkiv
Thanks for your many helpful tips!

Rehceb Rotkiv
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Unicode problem

2007-04-08 Thread Rehceb Rotkiv
On Sat, 07 Apr 2007 12:46:49 -0700, Gabriel Genellina wrote:

> You have to encode the Unicode object explicitely: print
> fileString.encode("utf-8")
> (or any other suitable one; I said utf-8 just because you read the input
> file using that)

Thanks! That's a nice little stumbling block for a newbie like me ;) Is 
there a way to make utf-8 the default encoding for every string, so that 
I do not have to encode each string explicitly?
-- 
http://mail.python.org/mailman/listinfo/python-list