On 2018-05-14 07:05, zljubi...@gmail.com wrote: > Hi, > > I have dataframe with CRM_assetID column as category dtype: > > df.info() > > <class 'pandas.core.frame.DataFrame'> > RangeIndex: 1435952 entries, 0 to 1435951 > Data columns (total 75 columns): > startTime 1435952 non-null object > CRM_assetID 1435952 non-null category > > searching a dataframe for each of three categories: > > df[df.CRM_assetID == 'V1254748'].shape > (35, 75) > df[df.CRM_assetID == 'V805722'].shape > (45, 75) > df[df.CRM_assetID == 'V1105400'].shape > (34, 75) > > > len(df.CRM_assetID.cat.categories.isin(['V1254748', 'V805722', 'V1105400'])) > > Why this len is not equal to 114 (35 + 45 + 34)? > > Regards.
Hello- First, this is a general Python group; not everyone here is necessarily an expert in or user of Pandas. In the future you might have more success with the pydata mailing list/group. When you say that `len(df.CRM_assetID.cat.categories.isin(['V1254748', 'V805722', 'V1105400']))` is not equal to 114, it would be helpful to say what this length actually is. Your usage of `df.CRM_assetID.cat.categories` refers to the *unique categories in that column*, not the actual values in that column. Presumably you have more categories in that column than the three you are checking with `isin`, since you are checking the length of a boolean vector that signifies whether each distinct category is in that list. MMR... -- https://mail.python.org/mailman/listinfo/python-list