Hi all,

I've a requirement to split a column and fetch only the description where I
have numbers appended before that for some rows whereas other rows have
only the description -

Eg - (Description is the column header)

*Description*
Inventory Tree
Products
1. AT&T Services
2. Accessories
4. Miscellaneous
5. Service Center Items
3. 3rd Party Services
Integrated Service
6. Demo Devices
IT Department
Merchandising
2A. Impulse
2. Handsets
3. Strategic Products
1. Opportunities
5. Features
6. Rate Plans
7. Other AT&T Incentives
8. Wired
4. Rate Plan Tier Change
Integration SKUs (March 2016)
9. Financing SKUs
1. Smartphone

>From the above, I only want the words of description and remove the
numbers. But since they're abrupt, it is difficult to use a single logic to
have it done.

Trying with the following options -

1) Using split by *fullstop*, together -

*split_col = split(CategoryInp_DF['description'], '.')*

*CategoryInp_DF= CategoryInp_DF.withColumn('unneeded',
split_col.getItem(0))*
*CategoryInp_DF= CategoryInp_DF.withColumn('description',
split_col.getItem(1))*


Result -> But both the output columns come as nulls.

2) Using split by *fullstop and space*, together -

*split_col = split(CategoryInp_DF['description'], '. ')*

*CategoryInp_DF= CategoryInp_DF.withColumn('unneeded',
split_col.getItem(0))*
*CategoryInp_DF= CategoryInp_DF.withColumn('description',
split_col.getItem(1))*

Result -> This is perfectly working for the numbered rows, but is
unnecessarily splitting the non-numbered rows too.

Any help would be greatly appreciated.


Thanks,
Aakash.

Reply via email to