Aehm, adding on this: I incorrectly *assumed* without testing that rounding would help; it doesn't:
ecdf(round(test2,0)) # a rounding that is way too rough for my application... #Error in xy.coords(x, y) : 'x' and 'y' lengths differ Digging deeper: The initially mentioned call to unique() is not very helpful, as test2 is a data frame, so I get what I deserve, an unchanged data frame with 1 row. Still, the issue remains and can even be simplified further: > ecdf(data.frame(a=3, b=4)) Empirical CDF Call: ecdf(data.frame(a = 3, b = 4)) x[1:2] = 3, 4 works ok, but > ecdf(data.frame(a=3, b=3)) Error in xy.coords(x, y) : 'x' and 'y' lengths differ doesn't (same for a=b=1 or 2, so likely the same for any a=b). Instead, > ecdf(c(a=3, b=3)) Empirical CDF Call: ecdf(c(a = 3, b = 3)) x[1:1] = 3 does the trick. From ?ecdf, I get that x should be a numeric vector - apparently, my misuse of the function by applying it to a row of a data frame (i.e. a data frame with one row). In all my other (dozens of) cases that worked ok, though but not for this particular one. A simple unlist() helps: > ecdf(unlist(data.frame(a=3, b=3))) Empirical CDF Call: ecdf(unlist(data.frame(a = 3, b = 3))) x[1:1] = 3 Yet, I'm even more confused than before: in my other data, there were also duplicated values in the vector (1-row-data frame), and it never caused any issue. For this particular example, it does. I must be missing something fundamental... Michael > -----Original Message----- > From: Meyners, Michael > Sent: Montag, 8. Juni 2015 12:02 > To: 'r-help@r-project.org' > Subject: mismatch between match and unique causing ecdf (well, > approxfun) to fail > > All, > > I encountered the following issue with ecdf which was originally on a vector > of length 10,000, but I have been able to reduce it to a minimal reproducible > example (just to avoid questions why I'd want to do this for a vector of > length 2...): > > test2 = structure(list(X817 = 3.39824670255344, X4789 = 3.39824670255344), > .Names = c("X817", "X4789"), row.names = 74L, class = "data.frame") > ecdf(test2) > > # Error in xy.coords(x, y) : 'x' and 'y' lengths differ > > In an attempt to track this down, it occurs that > > unique(test2) > # X817 X4789 > #74 3.398247 3.398247 > > while > > match(test2, unique(test2)) > #[1] 1 1 > > matches both values to the first one. This causes a hiccup in the call to > ecdf, > as this uses (an equivalent to) a call to approxfun with x = test2 and y = > cumsum(tabulate(match(test2, unique(test2)))), the latter now containing > one entry less than the former, so xy.coords fails. > > I understand that the issue should be somehow related to FAQ 7.31, but I > would have hoped that unique and match would be using the same precision > and hence both or neither would consider the two values identical, but not > one match while unique doesn't. > > Last but not least, it doesn't really cause an issue on my end (other than > breaking my code and hence out of a loop at first place...); rounding will > help > w/o noteworthy changes to the outcome, so no need to propose a > workaround :-) I'd rather like to raise the issue and learn whether there is a > purpose for this behavior, and/or whether there is a generic fix to this, or > whether I am completely missing something. > > Version info (under Windows 7): > R version 3.2.0 (2015-04-16) -- "Full of Ingredients" > Platform: x86_64-w64-mingw32/x64 (64-bit) > > Cheers, Michael ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.