On 2018-05-15 06:23, Zoran Ljubišić wrote: > Matt, > > thanks for the info about pydata mailing group. I didn't know it exists. > Because comp.lang.python is not appropriate group for this question, I > will continue our conversation on gmail. > > I have put len(df.CRM_assetID.cat > <http://df.CRM_assetID.cat>.categories.isin(['V1254748', 'V805722', > 'V1105400'])) = 55418 in next message, after I noticed that this > information is missing. > > If I want to select all rows that have categories from the list, how > to do that? > > Regards, > > Zoran >
Hi Zoran- (Including python-list again, for lack of a reason not to. This conversation is still relevant and appropriate for the general Python mailing list -- I just meant that the pydata list likely has many more Pandas users/experts, so you're more likely to get a better answer, faster, from a more specialized group.) Selecting all rows that have categories is a bit simpler than what you are doing -- your issue is that you are working with the *set of distinct categories*, and not the actual vector of categories corresponding to your data. You can select items you're interested in with something like the following: """ In [1]: import pandas as pd In [2]: s = pd.Series(['apple', 'banana', 'apple', 'pear', 'banana', 'cherry', 'pear', 'cherry']).astype('category') In [3]: s Out[3]: 0 apple 1 banana 2 apple 3 pear 4 banana 5 cherry 6 pear 7 cherry dtype: category Categories (4, object): [apple, banana, cherry, pear] In [4]: s.isin({'apple', 'pear'}) Out[4]: 0 True 1 False 2 True 3 True 4 False 5 False 6 True 7 False dtype: bool In [5]: s.loc[s.isin({'apple', 'pear'})] Out[5]: 0 apple 2 apple 3 pear 6 pear dtype: category Categories (4, object): [apple, banana, cherry, pear] """ (Note that I'm also passing a set to `isin` instead of a list -- this doesn't matter when looking for two or three values, but if you're passing 1000 values to `isin`, or 10_000, or 1_000_000, then linear-time membership testing can start to become an issue.) You are accessing the vector of the *unique categories* in that column, like """ In [6]: s.cat.categories Out[6]: Index(['apple', 'banana', 'cherry', 'pear'], dtype='object') In [7]: s.cat.categories.isin({'apple', 'pear'}) Out[7]: array([ True, False, False, True]) """ The vector `s.cat.categories` has one element for each distinct category in your column, and your column apparently contains 55418 different categories. MMR... -- https://mail.python.org/mailman/listinfo/python-list