Thank you all for your help and patience.

I’have done table(duplicated(df1[, c("firm","year")])) as William Dunlap suggested and I find repeated rows in df1.
R is always right!

I really believed that my data could not be repeated lines. I now have another problem which is to discover why this happened with my data, but this has nothing to do with the R!

Thank you again and again,

Cecília Carmo
Universidade de Aveiro
Portugal


Em Sun, 22 Aug 2010 13:15:36 -0700
 "William Dunlap" <wdun...@tibco.com> escreveu:
-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Cecilia Carmo
Sent: Sunday, August 22, 2010 10:24 AM
To: Erik Iverson
Cc: r-help@r-project.org; Hadley Wickham
Subject: Re: [R] problems with merge() - the output has many repeated lines

I have done
intersect(names(df1), names(df2))
[1] "firm" "year"

This is the key I used to merge
merge(df1,df2,by=c("firm","year"))

And there is just one row firm/year in df1 that matches with another firm/year row in df2. Df1 has more firm/year rows than df2, and them don't match with none in df2.

To get to the bottom of this you may have to show
us some of the relevant rows of data (80000 rows
per dataset would be a lot to mailout).  For starters
it would be nice to see the output of str(df1)
  str(df2)
  str(m) # where m is merge(df1,df2)
Then it would nice to see the output of
  table(duplicated(df1[, c("firm","year")]))
and the same for df2 and m.

You said you saw many repeated rows in the output of
merge(df1,df2,...), which I am calling 'm'. Say the i'th row is one of the repeated rows. What are the outputs of df1[ df1$firm==m$firm[i] & df1$year==m$year[i], ,drop=FALSE] df2[ df2$firm==m$firm[i] & df2$year==m$year[i], ,drop=FALSE]
  m[ m$firm==m$firm[i] & m$year==m$year[i], ,drop=FALSE]
?

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
Cecília

Em Sun, 22 Aug 2010 12:09:57 -0500
  Erik Iverson <er...@ccbr.umn.edu> escreveu:
> Cecilia -
> >Find what columns you're matching on, > > intersect(names(df1), names(df2)), > > Maybe that will shed some light on the issue. > > On 08/22/2010 12:02 PM, Cecilia Carmo wrote: >> Thanks, but I don't have multiple matches and the lines >>repeated in the
>> final dataframe are exactly equal in all columns.
>>
>> Cecília
>>
>> Sat, 21 Aug 2010 10:58:53 -0500
>> Hadley Wickham <had...@rice.edu> escreveu:
>>> You may find a close reading of ?merge helpful, >>>particularly this >>> sentence: "If there is more than one match, all possible >>> matches contribute one row each" (so check that you >>>don't have
>>> multiple matches).
>>>
>>> Hadley
>>>
>>> On Sat, Aug 21, 2010 at 10:45 AM, Cecilia Carmo >>><cecilia.ca...@ua.pt>
>>> wrote:
>>>> Hi everyone,
>>>>
>>>> I have been merging many big dataframes (about 80000 >>>>rows each) and I
>>>> never
>>>> had this problem, but now it happened to me and I want >>>>to know if
>>>> someone
>>>> knows what could be happening.
>>>> The final dataframe has many rows, an impossible number! >>>>I have done >>>> edit(dataframe) and I saw that there are many repeated >>>>rows (all equal).
>>>>
>>>> Thanks for any help,
>>>>
>>>> Cecília Carmo
>>>> Universidade de Aveiro
>>>> Portugal
>>>>
>>>> ______________________________________________
>>>> R-help@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, >>>>reproducible code.
>>>>
>>>
>>>
>>>
>>> --
>>> Assistant Professor / Dobelman Family Junior Chair
>>> Department of Statistics / Rice University
>>> http://had.co.nz/
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, >>reproducible code.
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to