[ https://issues.apache.org/jira/browse/BEAM-12169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Beam JIRA Bot updated BEAM-12169: --------------------------------- Labels: dataframe-api (was: dataframe-api stale-assigned) > Allow non-deferred column operations on categorical columns > ----------------------------------------------------------- > > Key: BEAM-12169 > URL: https://issues.apache.org/jira/browse/BEAM-12169 > Project: Beam > Issue Type: Improvement > Components: dsl-dataframe, sdk-py-core > Reporter: Brian Hulette > Priority: P3 > Labels: dataframe-api > Time Spent: 6h 50m > Remaining Estimate: 0h > > There are several operations that we currently disallow because they produce > a variable set of columns in the output based on the data > (non-deferred-columns). However, for some dtypes (categorical, boolean) we > can easily enumerate all the possible values that will be seen at execution > time, so we can predict the columns that will be seen. > Note we still can't implement these operations 100% correctly, as pandas will > typically only create columns for the values that are {_}observed{_}, while > we'd have to create a column for every possible value. > We should allow these operations in these special cases. > Operations in this category: > - DataFrame.unstack, Series.unstack (can work if unstacked level is a > categorical or boolean column) > - Series.str.get_dummies > - Series.str.split > - Series.str.rsplit > - DataFrame.pivot > - DataFrame.pivot_table -- This message was sent by Atlassian Jira (v8.20.7#820007)