Hello, Given a dateframe with trips made by employees of different companies, I am trying to generate a new dataframe with only the company names. I am looking to combine the overlapping travel times from employees of the SAME company into a single row. If there are no overlapping travel times, then that row just transfers over as-is. When there are overlapping travel times, then the following will happen:
--The name field is removed b/c that is no longer relevant (company name stays), the Depart date will be the earliest date of any of the trip dates regardless of the employee, the Return date will be the latest date of any of the trip dates regardless of the employee, the charges for the trip will be summed For example, if trips had dates 01/01/20 - 01/31/20, 01/15/20 - 02/15/20, 02/01-20 - 02/28/20, then all three would be combined. The starting date will be 1/1/20 and ending as of 2/28/20. Basically, the company was on that trip from start to finish… kinda like a relay run handing off the baton. Also, the charges will be summed for each of those trips and transferred over to the single row. Here is the starting dataframe code/output (note: the row order is typically not already sorted by company name as in this example): import pandas as pd emp_trips = {'Name': ['Bob','Joe','Sue','Jack', 'Henry', 'Frank', 'Lee', 'Jack'], 'Company': ['ABC', 'ABC', 'ABC', 'HIJ', 'HIJ', 'DEF', 'DEF', 'DEF'], 'Depart' : ['01/01/2020', '01/01/2020', '01/06/2020', '01/01/2020', '05/01/2020', '01/13/2020', '01/12/2020', '01/14/2020'], 'Return' : ['01/31/2020', '02/15/2020', '02/20/2020', '03/01/2020', '05/05/2020', '01/15/2020', '01/30/2020', '02/02/2020'], 'Charges': [10.10, 20.25, 30.32, 40.00, 50.01, 60.32, 70.99, 80.87] } df = pd.DataFrame(emp_trips, columns = ['Name', 'Company', 'Depart', 'Return', 'Charges']) # Convert to date format df['Return']= pd.to_datetime(df['Return']) df['Depart']= pd.to_datetime(df['Depart']) Name Company Depart Return Charges0 Bob ABC 2020-01-01 2020-01-31 10.101 Joe ABC 2020-01-01 2020-02-15 20.252 Sue ABC 2020-01-06 2020-02-20 30.323 Jack HIJ 2020-01-01 2020-03-01 40.004 Henry HIJ 2020-05-01 2020-05-05 50.015 Frank DEF 2020-01-13 2020-01-15 60.326 Lee DEF 2020-01-12 2020-01-30 70.997 Jack DEF 2020-01-14 2020-02-02 80.87 And, here is the desired/generated dataframe: Company Depart Return Charges0 ABC 01/01/2020 02/20/2020 60.671 HIJ 01/01/2020 03/01/2020 40.002 HIJ 05/01/2020 05/05/2020 50.013 DEF 01/12/2020 02/02/2020 212.18 I have been trying to use a combination of sorting and grouping but the best I've achieved is reordering the dataframe. Even though I am able to sort/group based on values, I still run into the issues of finding overlapping date ranges and pulling out all trips based on a single company per aggregate/overlapping date range. Thank you in advance for any help! Aaron -- https://mail.python.org/mailman/listinfo/python-list