Newbie problem with Python pandas
I'm working my way through the examples in the O'Reilly book Python For Data Analysis and have encountered a snag. The following code is supposed to analyze some web server log data and produces aggregate counts by client operating system. ### import json # used to process json records from pandas import DataFrame, Series import pandas as pd import matplotlib.pyplot as plt import numpy as np path = '/home/rich/code/sample.txt' records = [json.loads(line) for line in open(path)] #read in records one line at a time frame = DataFrame(records) cframe = frame[frame.a.notnull()] operating_system = np.where(cframe['a'].str.contains ('Windows'),'Windows', 'Not Windows') by_tz_os = cframe.groupby(['tz', operating_system]) agg_counts = by_tz_os.size().unstack().fillna(0) indexer = agg_counts.sum(1).argsort() count_subset = agg_counts.take(indexer)[-10:] print count_subset I am getting the following error when running on Python 2.7 on Ubuntu 12.04: >> Traceback (most recent call last): File "./lp1.py", line 12, in operating_system = np.where(cframe['a'].str.contains ('Windows'),'Windows', 'Not Windows') AttributeError: 'Series' object has no attribute 'str' >>> Note that I was able to get the code to work fine on Windows 7, so this appears to be specific to Linux. A little Googling showed others have encountered this problem and suggested replacing the np.where with a find, as so: operating_system = ['Windows' if a.find('Windows') > 0 else 'Not Windows' for a in cframe['a']] This appears to solve the first problem, but then it fails on the next line with: Traceback (most recent call last): File "./lp1.py", line 14, in by_tz_os = cframe.groupby(['tz', operating_system]) File "/usr/lib/pymodules/python2.7/pandas/core/generic.py", line 133, in groupby sort=sort) File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 522, in groupby return klass(obj, by, **kwds) File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 115, in __init__ level=level, sort=sort) File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 705, in _get_groupings ping = Grouping(group_axis, gpr, name=name, level=level, sort=sort) File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 600, in __init__ self.grouper = self.index.map(self.grouper) File "/usr/lib/pymodules/python2.7/pandas/core/index.py", line 591, in map return self._arrmap(self.values, mapper) File "generated.pyx", line 1141, in pandas._tseries.arrmap_int64 (pandas/src/tseries.c:40593) TypeError: 'list' object is not callable > The problem looks to be with the pandas module and appears to be Linux- specific. Any ideas? I'm pulling my hair out over this. -- http://mail.python.org/mailman/listinfo/python-list
Re: Newbie problem with Python pandas
On Sun, 06 Jan 2013 08:05:59 -0800, Miki Tebeka wrote: > On Sunday, January 6, 2013 5:57:17 AM UTC-8, RueTheDay wrote: >> I am getting the following error when running on Python 2.7 on Ubuntu >> 12.04: >> >>>>>> >> >>>>>> >> AttributeError: 'Series' object has no attribute 'str' > I would *guess* that you have an older version of pandas on your Linux > machine. > Try "print(pd.__version__)" to see which version you have. > > Also, trying asking over at > https://groups.google.com/forum/?fromgroups=#!forum/pydata which is more > dedicated to pandas. Thank you! That was it. I had 0.7 installed (the latest in the Ubuntu repository). I downloaded and manually installed 0.10 and now it's working. Coincidentally, this also fixed a problem I was having with running a matplotlib plot function against a pandas Data Frame (worked with some chart types but not others). I'm starting to understand why people rely on easy_install and pip. Thanks again. -- http://mail.python.org/mailman/listinfo/python-list
Re: Newbie problem with Python pandas
On Sun, 06 Jan 2013 11:45:34 -0500, Roy Smith wrote: > In article <_dudnttyxduonxtnnz2dnuvz_ocdn...@giganews.com>, > RueTheDay wrote: > >> On Sun, 06 Jan 2013 08:05:59 -0800, Miki Tebeka wrote: >> >> > On Sunday, January 6, 2013 5:57:17 AM UTC-8, RueTheDay wrote: >> >> I am getting the following error when running on Python 2.7 on >> >> Ubuntu 12.04: >> >> >>>>>> >> >> >>>>>> >> >> AttributeError: 'Series' object has no attribute 'str' >> > I would *guess* that you have an older version of pandas on your >> > Linux machine. >> > Try "print(pd.__version__)" to see which version you have. >> > >> > Also, trying asking over at >> > https://groups.google.com/forum/?fromgroups=#!forum/pydata which is >> > more dedicated to pandas. >> >> Thank you! That was it. I had 0.7 installed (the latest in the Ubuntu >> repository). I downloaded and manually installed 0.10 and now it's >> working. Coincidentally, this also fixed a problem I was having with >> running a matplotlib plot function against a pandas Data Frame (worked >> with some chart types but not others). >> >> I'm starting to understand why people rely on easy_install and pip. >> Thanks again. > > Yeah, Ubuntu is a bit of a mess when it comes to pandas and the things > it depends on. Apt gets you numpy 1.4.1, which is really old. Pandas > won't even install on top of it. > > I've got pandas (and numpy, and scipy, and matplotlib) running on a > Ubuntu 12.04 box. I installed everything with pip. My problem at this > point, however, is I want to replicate that setup in EMR (Amazon's > Elastic Map-Reduce). In theory, I could just run "pip install numpy" in > my mrjob.conf bootstrap, but it's a really long install process, > building a lot of stuff from source. Not the kind of thing you want to > put in a bootstrap for an ephemeral instance. > > Does anybody know where I can find a debian package for numpy 1.6? Go here: http://neuro.debian.net/index.html#how-to-use-this-repository and add one their repositories to your sources. Then you can do use apt-get to install ALL the latest packages on your Ubuntu box - numpy, scipy, pandas, matplotlib, statsmodels, etc. I wish I found this a few days ago. -- http://mail.python.org/mailman/listinfo/python-list