Pandas GroupBy does not behave consistently

David Shi via Python-list Sun, 15 May 2016 09:18:02 -0700

Hello, Michael,
Pandas GroupBy does not behave consistently.
Last time, when we had conversation, I used grouby.  It works well.
Now, I thought to re-write the program, so that I can end up with a clean 
script.
But, the problem is that a lot of columns are missing after groupby application.
Any idea?
Regards.
David

    On Saturday, 14 May 2016, 17:00, Michael Selik <[email protected]> 
wrote:

 This StackOverflow question was the first search result when I Googled for 
"Python why is there a little 
u"http://stackoverflow.com/questions/11279331/what-does-the-u-symbol-mean-in-front-of-string-values
On Sat, May 14, 2016, 11:40 AM David Shi <[email protected]> wrote:

Hello, Michael,
Why there is a little u ?  u'ID',?
Why can be done to it?  How to handle such objects?
Can it be turn into list easily?
Regards.
David 

    On Saturday, 14 May 2016, 15:34, Michael Selik <[email protected]> 
wrote:

 You might also be interested in "Python for Data Analysis" for a thorough 
discussion of Pandas.http://shop.oreilly.com/product/0636920023784.do

On Sat, May 14, 2016 at 10:29 AM Michael Selik <[email protected]> wrote:

David, it sounds like you'll need a thorough introduction to the basics of 
Python.Check out the tutorial: https://docs.python.org/3/tutorial/
On Sat, May 14, 2016 at 6:19 AM David Shi <[email protected]> wrote:

Hello, Michael,
I discovered that the problem is "two columns of data are put together" and 
"are recognised as one column".
This is very strange.  I would like to understand the subject well.
And, how many ways are there to investigate into the nature of objects 
dynamically?
Some object types only get shown as an object.  Are there anything to be typed 
in Python, to reveal objects.
Regards.
David 

    On Saturday, 14 May 2016, 4:30, Michael Selik <[email protected]> 
wrote:

 What were you hoping to get from ``df[0]``?When you say it "yields nothing" do 
you mean it raised an error? What was the error message?
Have you tried a Google search for "pandas set 
index"?http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.set_index.html

On Fri, May 13, 2016 at 11:18 PM David Shi <[email protected]> wrote:

Hello, Michael,
I tried to discover the problem.
df[0]   yields nothingdf[1]  yields nothingdf[2] yields nothing
However, df[3] gives the following:sid
-9223372036854775808          NaN
 1                      133738.70
 4                      295256.11
 5                      137733.09
 6                      409413.58
 8                      269600.97
 9                       12852.94
Can we split this back to normal?  or turn it into a dictionary, so that I can 
put values back properly.
I like to use sid as index, some way.
Regards.
David 

    On Friday, 13 May 2016, 22:58, Michael Selik <[email protected]> 
wrote:

 What have code you tried? What error message are you receiving?
On Fri, May 13, 2016, 5:54 PM David Shi <[email protected]> wrote:

Hello, Michael,
How to convert a float type column into an integer or label or string type? 

    On Friday, 13 May 2016, 22:02, Michael Selik <[email protected]> 
wrote:

 To clarify that you're specifying the index as a label, use df.iloc
    >>> df = pd.DataFrame({'X': range(4)}, index=list('abcd'))    >>> df       
X    a  0    b  1    c  2    d  3    >>> df.loc['a']    X    0    Name: a, 
dtype: int64    >>> df.iloc[0]    X    0    Name: a, dtype: int64
On Fri, May 13, 2016 at 4:54 PM David Shi <[email protected]> wrote:

Dear Michael,
To avoid complication, I only groupby using one column.
It is OK now.  But, how to refer to new row index?  How do I use floating index?
Float64Index([ 1.0,  4.0,  5.0,  6.0,  8.0,  9.0, 10.0, 11.0, 12.0, 13.0, 16.0,
              17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0,
              28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0, 35.0, 36.0, 37.0, 38.0,
              39.0, 40.0, 41.0, 42.0, 44.0, 45.0, 46.0, 47.0, 48.0, 49.0, 50.0,
              51.0, 53.0, 54.0, 55.0, 56.0],
             dtype='float64', name=u'StateFIPS')
Regards.
David 

    On Friday, 13 May 2016, 21:43, Michael Selik <[email protected]> 
wrote:

 Here's an example.
    >>> import pandas as pd    >>> df = pd.DataFrame({'group': list('AB') * 2, 
'data': range(4)}, index=list('wxyz'))    >>> df       data group    w     0    
 A    x     1     B    y     2     A    z     3     B    >>> df = 
df.reset_index()    >>> df      index  data group    0     w     0     A    1   
  x     1     B    2     y     2     A    3     z     3     B    >>> 
df.groupby('group').max()          index  data    group    A         y     2    
B         z     3
If that doesn't help, you'll need to explain what you're trying to accomplish 
in detail -- what variables you started with, what transformations you want to 
do, and what variables you hope to have when finished.
On Fri, May 13, 2016 at 4:36 PM David Shi <[email protected]> wrote:

Hello, Michael,
I changed groupby with one column.
The index is different.
Index([   u'AL',    u'AR',    u'AZ',    u'CA',    u'CO',    u'CT',    u'DC',
          u'DE',    u'FL',    u'GA',    u'IA',    u'ID',    u'IL',    u'IN',
          u'KS',    u'KY',    u'LA',    u'MA',    u'MD',    u'ME',    u'MI',
          u'MN',    u'MO',    u'MS',    u'MT',    u'NC',    u'ND',    u'NE',
          u'NH',    u'NJ',    u'NM',    u'NV',    u'NY',    u'OH',    u'OK',
          u'OR',    u'PA',    u'RI',    u'SC',    u'SD', u'State',    u'TN',
          u'TX',    u'UT',    u'VA',    u'VT',    u'WA',    u'WI',    u'WV',
          u'WY'],
      dtype='object', name=0)
How to use this index?
Regards.
David 

    On Friday, 13 May 2016, 21:19, David Shi <[email protected]> wrote:

 Hello, Michael,
I typed in df.index
I got the followingMultiIndex(levels=[[1.0, 4.0, 5.0, 6.0, 8.0, 9.0, 10.0, 
11.0, 12.0, 13.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 
26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0, 35.0, 36.0, 37.0, 38.0, 
39.0, 40.0, 41.0, 42.0, 44.0, 45.0, 46.0, 47.0, 48.0, 49.0, 50.0, 51.0, 53.0, 
54.0, 55.0, 56.0], [u'AL', u'AR', u'AZ', u'CA', u'CO', u'CT', u'DC', u'DE', 
u'FL', u'GA', u'IA', u'ID', u'IL', u'IN', u'KS', u'KY', u'LA', u'MA', u'MD', 
u'ME', u'MI', u'MN', u'MO', u'MS', u'MT', u'NC', u'ND', u'NE', u'NH', u'NJ', 
u'NM', u'NV', u'NY', u'OH', u'OK', u'OR', u'PA', u'RI', u'SC', u'SD', u'State', 
u'TN', u'TX', u'UT', u'VA', u'VT', u'WA', u'WI', u'WV', u'WY']],
           labels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48], [0, 2, 1, 3, 4, 5, 7, 6, 8, 9, 
11, 12, 13, 10, 14, 15, 16, 19, 18, 17, 20, 21, 23, 22, 24, 27, 31, 28, 29, 30, 
32, 25, 26, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 45, 44, 46, 48, 47, 49]],
           names=[u'StateFIPS', 0])Regards.
David 

    On Friday, 13 May 2016, 21:11, David Shi <[email protected]> wrote:

 Dear Michael,
I have done a number of operation in between.
Providing that information does not help you
How to reset index after grouping and various operations is of interest.
How to type in a command to find out its current dataframe?
Regards.
David 

    On Friday, 13 May 2016, 20:58, Michael Selik <[email protected]> 
wrote:

 Just in case I misunderstood, why don't you make a little example of before 
and after the grouping? This mailing list does not accept attachments, so 
you'll have to make do with pasting a few rows of comma-separated or 
tab-separated values.
On Fri, May 13, 2016 at 3:56 PM Michael Selik <[email protected]> wrote:

In order to preserve your index after the aggregation, you need to make sure it 
is considered a data column (via reset_index) and then choose how your 
aggregation will operate on that column.
On Fri, May 13, 2016 at 3:29 PM David Shi <[email protected]> wrote:

Hello, Michael,
Why reset_index before grouping?
Regards.
David 

  On Friday, 13 May 2016, 17:57, Michael Selik <[email protected]> wrote:

On Fri, May 13, 2016 at 12:27 PM David Shi via Python-list 
<[email protected]> wrote:

I lost my indexes after grouping in Pandas.
I managed to rest_index and got back the index column.
But How can I get back a index row?

Was the grouping an aggregation? If so, the original indexes are meaningless. 
What you could do is reset_index before the grouping and when you aggregate 
decide how to handle the formerly-known-as-index column (min, max, mean, ?).

-- 
https://mail.python.org/mailman/listinfo/python-list

Pandas GroupBy does not behave consistently

Reply via email to