Re: [julia-users] DataFrames: Problems with Split-Apply-Combine strategy

John Myles White Thu, 22 May 2014 10:02:33 -0700

Can you create an issue for the nrow error? That’s almost certainly a bug.


 — John

On May 22, 2014, at 6:03 AM, Mike Innes <[email protected]> wrote:

> Link: 
> http://stackoverflow.com/questions/23806758/julia-dataframes-problems-with-split-apply-combine-strategy
> 
> I definitely agree that having a greater presence on SO would be useful, so 
> it might be best to answer there (sorry I can't be more directly helpful, OP)
> 
> 
> On 22 May 2014 13:56, Paulo Castro <[email protected]> wrote:
> I made this question on StackOverflow, but I think I will get better results 
> posting it here. We should use that platform more, so Julia is more exposed 
> to R/Python/Matlab users needing something like it.
> 
> I have some data (from a R course assignment, but that doesn't matter) that I 
> want to use split-apply-combine strategy, but I'm having some problems. The 
> data is on a DataFrame, called outcome, and each line represents a Hospital. 
> Each column has an information about that hospital, like name, location, 
> rates, etc.
> 
> My objective is to obtain the Hospital with the lowest "Mortality by Heart 
> Attack Rate" of each State.
> 
> I was playing around with some strategies, and got a problem using the by 
> function:
> 
> best_heart_rate(df) = sort(df, cols = :Mortality)[end,:] 
> 
> 
> best_hospitals = by(hospitals, :State, best_heart_rate)
> 
> 
> 
> 
> The idea was to split the hospitals DataFrame by State, sort each of the 
> SubDataFrames by Mortality Rate, get the lowest one, and combine the lines in 
> a new DataFrame
> 
> But when I used this strategy, I got:
> 
> ERROR: no method nrow(SubDataFrame{Array{Int64,1}})
> 
> 
>  in sort at /home/paulo/.julia/v0.3/DataFrames/src/dataframe/sort.jl:311
> 
> 
>  in sort at /home/paulo/.julia/v0.3/DataFrames/src/dataframe/sort.jl:296
> 
> 
>  in f at none:1
>  in based_on at 
> /home/paulo/.julia/v0.3/DataFrames/src/groupeddataframe/grouping.jl:144
> 
> 
>  in by at 
> /home/paulo/.julia/v0.3/DataFrames/src/groupeddataframe/grouping.jl:202
> 
> 
> 
> I suppose the nrow function is not implemented for SubDataFrames for a good 
> reason, so I gave up from this strategy. Then I used a nastier code:
> 
> 
> 
> best_heart_rate(df) = (df[sortperm(df[:,:Mortality] , rev=true), :])[1,:]
> 
> 
> best_hospitals = by(hospitals, :State, best_heart_rate)
> 
> 
> 
> Seems to work. But now there is a NA problem: how can I remove the rows from 
> the SubDataFrames that have NA on the Mortality column? Is there a better 
> strategy to accomplish my objective?
> 
>

Re: [julia-users] DataFrames: Problems with Split-Apply-Combine strategy

Reply via email to