On 2018-05-14 07:05, zljubi...@gmail.com wrote:
> Hi,
> I have dataframe with CRM_assetID column as category dtype:
> df.info()
> <class 'pandas.core.frame.DataFrame'>
> RangeIndex: 1435952 entries, 0 to 1435951
> Data columns (total 75 columns):
> startTime                            1435952 non-null object
> CRM_assetID                          1435952 non-null category
> searching a dataframe for each of three categories:
> df[df.CRM_assetID == 'V1254748'].shape
> (35, 75)
> df[df.CRM_assetID == 'V805722'].shape
> (45, 75)
> df[df.CRM_assetID == 'V1105400'].shape
> (34, 75)
> len(df.CRM_assetID.cat.categories.isin(['V1254748', 'V805722', 'V1105400']))
> Why this len is not equal to 114 (35 + 45 + 34)?
> Regards.


First, this is a general Python group; not everyone here is necessarily
an expert in or user of Pandas. In the future you might have more
success with the pydata mailing list/group.

When you say that `len(df.CRM_assetID.cat.categories.isin(['V1254748',
'V805722', 'V1105400']))` is not equal to 114, it would be helpful to
say what this length actually is.

Your usage of `df.CRM_assetID.cat.categories` refers to the *unique
categories in that column*, not the actual values in that column.
Presumably you have more categories in that column than the three you
are checking with `isin`, since you are checking the length of a boolean
vector that signifies whether each distinct category is in that list.


