Sayth Renshaw <flebber.c...@gmail.com> writes: > Hi > > I am importing 4 columns into a dataframe from a spreadsheet. > > My goal is to create a 5th column with TRUE or False if column 4 (str) > matches column 3. > > Trying to leverage this answer https://stackoverflow.com/a/35940955/461887 > > This is my code > > import pandas as pd > > xls = pd.ExcelFile("Melbourne.xlsx") > df = xls.parse('Sheet1', skiprows= 4) > df1 = df[['UID','Name','New Leader','Current Team', 'New Team']] > df1['Difference'] = df1['Current > Team'].str.lower().str.replace('s/+',"") == df1['New > Team'].str.lower().str.replace('s/+',"") > > Which gives this error > > C:\Users\u369811\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:6: > SettingWithCopyWarning: > A value is trying to be set on a copy of a slice from a DataFrame.
I think what happens is that df1 is not a copy of the subset of df that you want, but it is a VIEW on df instead. This means that df1 shares memory with df (for memory savings reasons). But if you would change this view by adding a column, where should it be put? In df? If so, where? So the correct way to do this is to make df1 a copy rather than a view. df1 = df.loc[:, ('UID','Name','New Leader','Current Team', 'New Team')] And than it should work. Except that the str.replace is wrong for what you want. It just replaces the literal string "s/+" with an empty string instead of white space. This was wrong in the stackoverflow post. To replace whitespace it should be str.replace('\\s+',"", regex=True). But simpler is to use str.strip(): df1['Difference'] = df1['Current Team'].str.lower().str.strip() == df1['New Team'].str.lower().str.strip() -- Piet van Oostrum <pie...@vanoostrum.org> WWW: http://piet.vanoostrum.org/ PGP key: [8DAE142BE17999C4] -- https://mail.python.org/mailman/listinfo/python-list