I'm working my way through the examples in the O'Reilly book Python For Data Analysis and have encountered a snag.
The following code is supposed to analyze some web server log data and produces aggregate counts by client operating system. ################### import json # used to process json records from pandas import DataFrame, Series import pandas as pd import matplotlib.pyplot as plt import numpy as np path = '/home/rich/code/sample.txt' records = [json.loads(line) for line in open(path)] #read in records one line at a time frame = DataFrame(records) cframe = frame[frame.a.notnull()] operating_system = np.where(cframe['a'].str.contains ('Windows'),'Windows', 'Not Windows') by_tz_os = cframe.groupby(['tz', operating_system]) agg_counts = by_tz_os.size().unstack().fillna(0) indexer = agg_counts.sum(1).argsort() count_subset = agg_counts.take(indexer)[-10:] print count_subset #################### I am getting the following error when running on Python 2.7 on Ubuntu 12.04: >>>>>> Traceback (most recent call last): File "./lp1.py", line 12, in <module> operating_system = np.where(cframe['a'].str.contains ('Windows'),'Windows', 'Not Windows') AttributeError: 'Series' object has no attribute 'str' >>>>>>> Note that I was able to get the code to work fine on Windows 7, so this appears to be specific to Linux. A little Googling showed others have encountered this problem and suggested replacing the np.where with a find, as so: ######## operating_system = ['Windows' if a.find('Windows') > 0 else 'Not Windows' for a in cframe['a']] ######## This appears to solve the first problem, but then it fails on the next line with: >>>>>>>> Traceback (most recent call last): File "./lp1.py", line 14, in <module> by_tz_os = cframe.groupby(['tz', operating_system]) File "/usr/lib/pymodules/python2.7/pandas/core/generic.py", line 133, in groupby sort=sort) File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 522, in groupby return klass(obj, by, **kwds) File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 115, in __init__ level=level, sort=sort) File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 705, in _get_groupings ping = Grouping(group_axis, gpr, name=name, level=level, sort=sort) File "/usr/lib/pymodules/python2.7/pandas/core/groupby.py", line 600, in __init__ self.grouper = self.index.map(self.grouper) File "/usr/lib/pymodules/python2.7/pandas/core/index.py", line 591, in map return self._arrmap(self.values, mapper) File "generated.pyx", line 1141, in pandas._tseries.arrmap_int64 (pandas/src/tseries.c:40593) TypeError: 'list' object is not callable >>>>>>>>> The problem looks to be with the pandas module and appears to be Linux- specific. Any ideas? I'm pulling my hair out over this. -- http://mail.python.org/mailman/listinfo/python-list