Newbie problem with Python pandas

2013-01-06 Thread RueTheDay
I'm working my way through the examples in the O'Reilly book Python For 
Data Analysis and have encountered a snag.

The following code is supposed to analyze some web server log data and 
produces aggregate counts by client operating system.

###
import json # used to process json records
from pandas import DataFrame, Series
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

path = '/home/rich/code/sample.txt'
records = [json.loads(line) for line in open(path)] #read in records one 
line at a time
frame = DataFrame(records)

cframe = frame[frame.a.notnull()]
operating_system = np.where(cframe['a'].str.contains
('Windows'),'Windows', 'Not Windows')
by_tz_os = cframe.groupby(['tz', operating_system])
agg_counts = by_tz_os.size().unstack().fillna(0)
indexer = agg_counts.sum(1).argsort()
count_subset = agg_counts.take(indexer)[-10:]
print count_subset


I am getting the following error when running on Python 2.7 on Ubuntu 
12.04:

>>
Traceback (most recent call last):
  File "./lp1.py", line 12, in 
operating_system = np.where(cframe['a'].str.contains
('Windows'),'Windows', 'Not Windows')
AttributeError: 'Series' object has no attribute 'str'
>>>

Note that I was able to get the code to work fine on Windows 7, so this 
appears to be specific to Linux.

A little Googling showed others have encountered this problem and 
suggested replacing the np.where with a find, as so:


operating_system = ['Windows' if a.find('Windows') > 0 else 'Not Windows' 
for a in cframe['a']]


This appears to solve the first problem, but then it fails on the next 
line with:


Traceback (most recent call last):
  File "./lp1.py", line 14, in 
by_tz_os = cframe.groupby(['tz', operating_system])
  File "/usr/lib/pymodules/python2.7/pandas/core/generic.py", line 133, 
in groupby
sort=sort)
  File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 522, 
in groupby
return klass(obj, by, **kwds)
  File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 115, 
in __init__
level=level, sort=sort)
  File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 705, 
in _get_groupings
ping = Grouping(group_axis, gpr, name=name, level=level, sort=sort)
  File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 600, 
in __init__
self.grouper = self.index.map(self.grouper)
  File "/usr/lib/pymodules/python2.7/pandas/core/index.py", line 591, in 
map
return self._arrmap(self.values, mapper)
  File "generated.pyx", line 1141, in pandas._tseries.arrmap_int64 
(pandas/src/tseries.c:40593)
TypeError: 'list' object is not callable
>

The problem looks to be with the pandas module and appears to be Linux-
specific.

Any ideas?  I'm pulling my hair out over this.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie problem with Python pandas

2013-01-06 Thread RueTheDay
On Sun, 06 Jan 2013 08:05:59 -0800, Miki Tebeka wrote:

> On Sunday, January 6, 2013 5:57:17 AM UTC-8, RueTheDay wrote:
>> I am getting the following error when running on Python 2.7 on Ubuntu
>> 12.04:
>> >>>>>>
>> >>>>>>
>> AttributeError: 'Series' object has no attribute 'str'
> I would *guess* that  you have an older version of pandas on your Linux
> machine.
> Try "print(pd.__version__)" to see which version you have.
> 
> Also, trying asking over at
> https://groups.google.com/forum/?fromgroups=#!forum/pydata which is more
> dedicated to pandas.

Thank you!  That was it.  I had 0.7 installed (the latest in the Ubuntu 
repository).  I downloaded and manually installed 0.10 and now it's 
working.  Coincidentally, this also fixed a problem I was having with 
running a matplotlib plot function against a pandas Data Frame (worked 
with some chart types but not others).

I'm starting to understand why people rely on easy_install and pip.  
Thanks again.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie problem with Python pandas

2013-01-06 Thread RueTheDay
On Sun, 06 Jan 2013 11:45:34 -0500, Roy Smith wrote:

> In article <_dudnttyxduonxtnnz2dnuvz_ocdn...@giganews.com>,
>  RueTheDay  wrote:
> 
>> On Sun, 06 Jan 2013 08:05:59 -0800, Miki Tebeka wrote:
>> 
>> > On Sunday, January 6, 2013 5:57:17 AM UTC-8, RueTheDay wrote:
>> >> I am getting the following error when running on Python 2.7 on
>> >> Ubuntu 12.04:
>> >> >>>>>>
>> >> >>>>>>
>> >> AttributeError: 'Series' object has no attribute 'str'
>> > I would *guess* that  you have an older version of pandas on your
>> > Linux machine.
>> > Try "print(pd.__version__)" to see which version you have.
>> > 
>> > Also, trying asking over at
>> > https://groups.google.com/forum/?fromgroups=#!forum/pydata which is
>> > more dedicated to pandas.
>> 
>> Thank you!  That was it.  I had 0.7 installed (the latest in the Ubuntu
>> repository).  I downloaded and manually installed 0.10 and now it's
>> working.  Coincidentally, this also fixed a problem I was having with
>> running a matplotlib plot function against a pandas Data Frame (worked
>> with some chart types but not others).
>> 
>> I'm starting to understand why people rely on easy_install and pip.
>> Thanks again.
> 
> Yeah, Ubuntu is a bit of a mess when it comes to pandas and the things
> it depends on.  Apt gets you numpy 1.4.1, which is really old.  Pandas
> won't even install on top of it.
> 
> I've got pandas (and numpy, and scipy, and matplotlib) running on a
> Ubuntu 12.04 box.  I installed everything with pip.  My problem at this
> point, however, is I want to replicate that setup in EMR (Amazon's
> Elastic Map-Reduce).  In theory, I could just run "pip install numpy" in
> my mrjob.conf bootstrap, but it's a really long install process,
> building a lot of stuff from source.  Not the kind of thing you want to
> put in a bootstrap for an ephemeral instance.
> 
> Does anybody know where I can find a debian package for numpy 1.6?

Go here:

http://neuro.debian.net/index.html#how-to-use-this-repository

and add one their repositories to your sources.

Then you can do use apt-get to install ALL the latest packages on your 
Ubuntu box - numpy, scipy, pandas, matplotlib, statsmodels, etc.

I wish I found this a few days ago.
-- 
http://mail.python.org/mailman/listinfo/python-list