date:20191127

RE: Enabling fully disaggregated shuffle on Spark

2019-11-27 Thread Prakhar Jain

Great work Ben. At Microsoft, we are also working on disaggregating shuffle from Spark. Please add me to the invite. From: Felix Cheung Sent: 21 November 2019 07:07 To: Ben Sidhom ; John Zhuge Cc: bo yang ; Amogh Margoor ; Ryan Blue ; Ben Sidhom ; Spark Dev List ; Christopher Crosbie ; Grisel

Re: Loose the requirement of "median" of the SQL metrics

2019-11-27 Thread Jungtaek Lim

Ah yes, right I forgot about the existence. Thanks! I'm aware of some implementations for approximate calculations (I guess what we say approximate median is approximate percentile with 50%) but I didn't know about implementation details like supporting accumulative. Given current source values of

Re: [DISCUSS] PostgreSQL dialect

2019-11-27 Thread Dongjoon Hyun

+1 Bests, Dongjoon. On Tue, Nov 26, 2019 at 3:52 PM Takeshi Yamamuro wrote: > Yea, +1, that looks pretty reasonable to me. > > Here I'm proposing to hold off the PostgreSQL dialect. Let's remove it > from the codebase before it's too late. Curently we only have 3 features > under PostgreSQL dia

Re: Loose the requirement of "median" of the SQL metrics

2019-11-27 Thread Sean Owen

Yep, that's clear. That's a reasonable case. There are already approximate median computations that can be done cumulatively as you say, implemented in Spark. I think it's reasonable to consider this for performance, as it can be faster with just a small error tolerance. But yeah up to you if you h

Re: Loose the requirement of "median" of the SQL metrics

2019-11-27 Thread Jungtaek Lim

Thanks all for providing inputs! Maybe I wasn't clear about my intention. The issue I focus on is; there're plenty of metrics being defined in a stage for SQL, and each metric has values for each task and being grouped later to calculate aggregated values. (e.g. metric for "elapsed time" is shown

Debug "Java gateway process exited before sending the driver its port number"

2019-11-27 Thread Li Jin

Dear Spark devs, I am debugging a weird "Java gateway process exited before sending the driver its port number" when creating SparkSession with pyspark. I am running the following simple code with pytest: " from pyspark.sql import SparkSession def test_spark(): spark = SparkSession.builde

Re: Loose the requirement of "median" of the SQL metrics

2019-11-27 Thread Sean Owen

How big is the overhead, at scale? If it has a non-trivial effect for most jobs, I could imagine reusing the existing approximate quantile support to more efficiently find a pretty-close median. On Wed, Nov 27, 2019 at 3:55 AM Jungtaek Lim wrote: > > Hi Spark devs, > > The change might be specifi

Re: Loose the requirement of "median" of the SQL metrics

2019-11-27 Thread Mayur Rustagi

Another option could be to use a sketch to get approx median(extendable to quantiles as well) for a large number of tasks sketch would give accurate value as tasks are few, for larger task the benefit will be good. Regards, Mayur Rustagi Ph: +1 (650) 937 9673 http://www.sigmoid.com

Loose the requirement of "median" of the SQL metrics

2019-11-27 Thread Jungtaek Lim

Hi Spark devs, The change might be specific to the SQLAppStatusListener, but given it may change the value of metric being shown in UI, so would like to hear some voices on this. When we aggregate the SQL metric between tasks, we apply "sum", "min", "median", "max", which all are cumulative excep

RE: Enabling fully disaggregated shuffle on Spark

Re: Loose the requirement of "median" of the SQL metrics

Re: [DISCUSS] PostgreSQL dialect

Re: Loose the requirement of "median" of the SQL metrics

Re: Loose the requirement of "median" of the SQL metrics

Debug "Java gateway process exited before sending the driver its port number"

Re: Loose the requirement of "median" of the SQL metrics

Re: Loose the requirement of "median" of the SQL metrics

Loose the requirement of "median" of the SQL metrics

9 matches

Site Navigation

Mail list logo

Footer information