Can you create an issue for the nrow error? That’s almost certainly a bug.
— John On May 22, 2014, at 6:03 AM, Mike Innes <[email protected]> wrote: > Link: > http://stackoverflow.com/questions/23806758/julia-dataframes-problems-with-split-apply-combine-strategy > > I definitely agree that having a greater presence on SO would be useful, so > it might be best to answer there (sorry I can't be more directly helpful, OP) > > > On 22 May 2014 13:56, Paulo Castro <[email protected]> wrote: > I made this question on StackOverflow, but I think I will get better results > posting it here. We should use that platform more, so Julia is more exposed > to R/Python/Matlab users needing something like it. > > I have some data (from a R course assignment, but that doesn't matter) that I > want to use split-apply-combine strategy, but I'm having some problems. The > data is on a DataFrame, called outcome, and each line represents a Hospital. > Each column has an information about that hospital, like name, location, > rates, etc. > > My objective is to obtain the Hospital with the lowest "Mortality by Heart > Attack Rate" of each State. > > I was playing around with some strategies, and got a problem using the by > function: > > best_heart_rate(df) = sort(df, cols = :Mortality)[end,:] > > > best_hospitals = by(hospitals, :State, best_heart_rate) > > > > > The idea was to split the hospitals DataFrame by State, sort each of the > SubDataFrames by Mortality Rate, get the lowest one, and combine the lines in > a new DataFrame > > But when I used this strategy, I got: > > ERROR: no method nrow(SubDataFrame{Array{Int64,1}}) > > > in sort at /home/paulo/.julia/v0.3/DataFrames/src/dataframe/sort.jl:311 > > > in sort at /home/paulo/.julia/v0.3/DataFrames/src/dataframe/sort.jl:296 > > > in f at none:1 > in based_on at > /home/paulo/.julia/v0.3/DataFrames/src/groupeddataframe/grouping.jl:144 > > > in by at > /home/paulo/.julia/v0.3/DataFrames/src/groupeddataframe/grouping.jl:202 > > > > I suppose the nrow function is not implemented for SubDataFrames for a good > reason, so I gave up from this strategy. Then I used a nastier code: > > > > best_heart_rate(df) = (df[sortperm(df[:,:Mortality] , rev=true), :])[1,:] > > > best_hospitals = by(hospitals, :State, best_heart_rate) > > > > Seems to work. But now there is a NA problem: how can I remove the rows from > the SubDataFrames that have NA on the Mortality column? Is there a better > strategy to accomplish my objective? > >
