i gave it a shot, see attached Am Mi., 3. Juni 2020 um 23:38 Uhr schrieb Aaron <aaron.christen...@gmail.com >:
> Hello, > > Given a dateframe with trips made by employees of different companies, I am > trying to generate a new dataframe with only the company names. I am > looking to combine the overlapping travel times from employees of the SAME > company into a single row. If there are no overlapping travel times, then > that row just transfers over as-is. When there are overlapping travel > times, then the following will happen: > > --The name field is removed b/c that is no longer relevant (company name > stays), the Depart date will be the earliest date of any of the trip dates > regardless of the employee, the Return date will be the latest date of any > of the trip dates regardless of the employee, the charges for the trip will > be summed > > For example, if trips had dates 01/01/20 - 01/31/20, 01/15/20 - 02/15/20, > 02/01-20 - 02/28/20, then all three would be combined. The starting date > will be 1/1/20 and ending as of 2/28/20. Basically, the company was on > that trip from start to finish… kinda like a relay run handing off the > baton. Also, the charges will be summed for each of those trips and > transferred over to the single row. > > Here is the starting dataframe code/output (note: the row order is > typically not already sorted by company name as in this example): > > import pandas as pd > > > emp_trips = {'Name': ['Bob','Joe','Sue','Jack', 'Henry', 'Frank', > 'Lee', 'Jack'], > 'Company': ['ABC', 'ABC', 'ABC', 'HIJ', 'HIJ', 'DEF', 'DEF', > 'DEF'], > 'Depart' : ['01/01/2020', '01/01/2020', '01/06/2020', > '01/01/2020', '05/01/2020', '01/13/2020', '01/12/2020', '01/14/2020'], > 'Return' : ['01/31/2020', '02/15/2020', '02/20/2020', > '03/01/2020', '05/05/2020', '01/15/2020', '01/30/2020', '02/02/2020'], > 'Charges': [10.10, 20.25, 30.32, 40.00, 50.01, 60.32, 70.99, 80.87] > } > > df = pd.DataFrame(emp_trips, columns = ['Name', 'Company', 'Depart', > 'Return', 'Charges']) > # Convert to date format > df['Return']= pd.to_datetime(df['Return']) > df['Depart']= pd.to_datetime(df['Depart']) > > Name Company Depart Return Charges0 Bob ABC > 2020-01-01 2020-01-31 10.101 Joe ABC 2020-01-01 2020-02-15 > 20.252 Sue ABC 2020-01-06 2020-02-20 30.323 Jack HIJ > 2020-01-01 2020-03-01 40.004 Henry HIJ 2020-05-01 2020-05-05 > 50.015 Frank DEF 2020-01-13 2020-01-15 60.326 Lee DEF > 2020-01-12 2020-01-30 70.997 Jack DEF 2020-01-14 2020-02-02 > 80.87 > > And, here is the desired/generated dataframe: > > Company Depart Return Charges0 ABC 01/01/2020 > 02/20/2020 60.671 HIJ 01/01/2020 03/01/2020 40.002 HIJ > 05/01/2020 05/05/2020 50.013 DEF 01/12/2020 02/02/2020 > 212.18 > > I have been trying to use a combination of sorting and grouping but > the best I've achieved is reordering the dataframe. Even though I am > able to sort/group based on values, I still run into the issues of > finding overlapping date ranges and pulling out all trips based on a > single company per aggregate/overlapping date range. > > Thank you in advance for any help! > > Aaron > -- > https://mail.python.org/mailman/listinfo/python-list > -- Regards, Joseph Pareti - Artificial Intelligence consultant Joseph Pareti's AI Consulting Services https://www.joepareti54-ai.com/ cell +49 1520 1600 209 cell +39 339 797 0644 -- https://mail.python.org/mailman/listinfo/python-list