Re: [DISCUSS] Remove multiple workers on the same host support from Standalone backend

2020-03-13 Thread Sean Owen
Do you really need a new cluster per user? and if so, why specify N workers > M machines? I am not seeing a need for that. I don't even think 2 workers on the same host makes sense, as they are both managing the same resources; it only exists for test purposes AFAICT. What you are trying to do sou

Re: [DISCUSS] Remove multiple workers on the same host support from Standalone backend

2020-03-13 Thread Andrew Melo
Hi Xingbo, Sean, On Fri, Mar 13, 2020 at 12:31 PM Xingbo Jiang wrote: > Andrew, could you provide more context of your use case please? Is it like > you deploy homogeneous containers on hosts with available resources, and > each container launches one worker? Or you deploy workers directly on ho

[SPARK-25299] A Discussion About Shuffle Metadata Tracking

2020-03-13 Thread Matt Cheah
Hi everyone, A working group in the community have been having ongoing discussions regarding how we can allow for flexible storage solutions for shuffle data that is compatible with containerized systems, more resilient to node failures, and can support disaggregated storage architectures. One

Re: [DISCUSS] Remove multiple workers on the same host support from Standalone backend

2020-03-13 Thread Xingbo Jiang
Andrew, could you provide more context of your use case please? Is it like you deploy homogeneous containers on hosts with available resources, and each container launches one worker? Or you deploy workers directly on hosts thus you could have multiple workers from the same application on the same

Re: [DISCUSS] Remove multiple workers on the same host support from Standalone backend

2020-03-13 Thread Sean Owen
You have multiple workers in one Spark (standalone) app? this wouldn't prevent N apps from each having a worker on a machine. On Fri, Mar 13, 2020 at 11:51 AM Andrew Melo wrote: > > Hello, > > On Fri, Feb 28, 2020 at 13:21 Xingbo Jiang wrote: >> >> Hi all, >> >> Based on my experience, there is

Re: [DISCUSS] Remove multiple workers on the same host support from Standalone backend

2020-03-13 Thread Andrew Melo
Hello, On Fri, Feb 28, 2020 at 13:21 Xingbo Jiang wrote: > Hi all, > > Based on my experience, there is no scenario that necessarily requires > deploying multiple Workers on the same node with Standalone backend. A > worker should book all the resources reserved to Spark on the host it is > laun

[DISCUSS] Null-handling of primitive-type of untyped Scala UDF in Scala 2.12

2020-03-13 Thread wuyi
Hi all, I'd like to raise a discussion here about null-handling of primitive-type of untyped Scala UDF [ udf(f: AnyRef, dataType: DataType) ]. After we switch to Scala 2.12 in 3.0, the untyped Scala UDF is broken because now we can't use reflection to get the parameter types of the Scala lambda. T