subject:"Re\: Using Percentile in Spark SQL"

Re: Using Percentile in Spark SQL

2019-11-11 Thread Jerry Vinokurov

I don't think the Spark configuration is what you want to focus on. It's hard to say without knowing the specifics of the job or the data volume, but you should be able to accomplish this with the percent_rank function in SparkSQL and a smart partitioning of the data. If your data has a lot of skew

Re: Using Percentile in Spark SQL

2019-11-11 Thread Tzahi File

Currently, I'm using the percentile approx function with Hive. I'm looking for a better way to run this function or another way to get the same result with spark, but faster and not using gigantic instances.. I'm trying to optimize this job by changing the Spark configuration. If you have any idea

Re: Using Percentile in Spark SQL

2019-11-11 Thread Muthu Jayakumar

If you would require higher precision, you may have to write a custom udaf. In my case, I ended up storing the data as a key-value ordered list of histograms. Thanks Muthu On Mon, Nov 11, 2019, 20:46 Patrick McCarthy wrote: > Depending on your tolerance for error you could also use > percentile

Re: Using Percentile in Spark SQL

2019-11-11 Thread Patrick McCarthy

Depending on your tolerance for error you could also use percentile_approx(). On Mon, Nov 11, 2019 at 10:14 AM Jerry Vinokurov wrote: > Do you mean that you are trying to compute the percent rank of some data? > You can use the SparkSQL percent_rank function for that, but I don't think > that's

Re: Using Percentile in Spark SQL

2019-11-11 Thread Jerry Vinokurov

Do you mean that you are trying to compute the percent rank of some data? You can use the SparkSQL percent_rank function for that, but I don't think that's going to give you any improvement over calling the percentRank function on the data frame. Are you currently using a user-defined function for

Re: Using Percentile in Spark SQL

Re: Using Percentile in Spark SQL

Re: Using Percentile in Spark SQL

Re: Using Percentile in Spark SQL

Re: Using Percentile in Spark SQL

5 matches

Site Navigation

Mail list logo

Footer information