date:20181017

Structured Streaming with Watermark

2018-10-17 Thread sandeep_katta

I am trying to test the water mark concept in structured streaming using the below program import java.sql.Timestamp import org.apache.spark.sql.functions.{col, expr} import org.apache.spark.sql.streaming.Trigger val lines_stream = spark.readStream. format("kafka"). opt

[MLlib] PCA Aggregator

2018-10-17 Thread Matt Saunders

I built an Aggregator that computes PCA on grouped datasets. I wanted to use the PCA functions provided by MLlib, but they only work on a full dataset, and I needed to do it on a grouped dataset (like a RelationalGroupedDataset). So I built a little Aggregator that can do that, here’s an example o

Re: Starting to make changes for Spark 3 -- what can we delete?

2018-10-17 Thread DB Tsai

I'll +1 on removing those legacy mllib code. Many users are confused about the APIs, and some of them have weird behaviors (for example, in gradient descent, the intercept is regularized which supports not to). DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc > O

Re: moving the spark jenkins job builder repo from dbricks --> spark

2018-10-17 Thread shane knapp

On Wed, Oct 17, 2018 at 10:25 AM Yin Huai wrote: > Shane, Thank you for initiating this work! Can we do an audit of jenkins > users and trim down the list? > > re pruning external (spark-specific) users w/shell and jenkins login access: we can absolutely do this. limiting logins for EECS studen

Re: moving the spark jenkins job builder repo from dbricks --> spark

2018-10-17 Thread Yin Huai

Shane, Thank you for initiating this work! Can we do an audit of jenkins users and trim down the list? Also, for packaging jobs, those branch snapshot jobs are active (for example, https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/ for publishing snapsh

Re: some doubt on code understanding

2018-10-17 Thread Sandeep Katta

:) thanks I am wondering how did I miss that :) :) On Wed, 17 Oct 2018 at 21:58, Sean Owen wrote: > "/" is integer division, so "x / y * y" is not x, but more like the > biggest multiple of y that's <= x. > On Wed, Oct 17, 2018 at 11:25 AM Sandeep Katta > wrote: > > > > Hi Guys, > > > > I am tr

Re: some doubt on code understanding

2018-10-17 Thread Sean Owen

"/" is integer division, so "x / y * y" is not x, but more like the biggest multiple of y that's <= x. On Wed, Oct 17, 2018 at 11:25 AM Sandeep Katta wrote: > > Hi Guys, > > I am trying to understand structured streaming code flow by doing so I came > across below code flow > > def nextBatchTime(

Re: some doubt on code understanding

2018-10-17 Thread Reynold Xin

Rounding. On Wed, Oct 17, 2018 at 6:25 PM Sandeep Katta < sandeep0102.opensou...@gmail.com> wrote: > Hi Guys, > > I am trying to understand structured streaming code flow by doing so I > came across below code flow > > def nextBatchTime(now: Long): Long = { > if (intervalMs == 0) now else now /

some doubt on code understanding

2018-10-17 Thread Sandeep Katta

Hi Guys, I am trying to understand structured streaming code flow by doing so I came across below code flow def nextBatchTime(now: Long): Long = { if (intervalMs == 0) now else now / intervalMs * intervalMs + intervalMs } else part could also have been written as now + intervalMs is there a

Re: Starting to make changes for Spark 3 -- what can we delete?

2018-10-17 Thread Erik Erlandson

My understanding was that the legacy mllib api was frozen, with all new dev going to ML, but it was not going to be removed. Although removing it would get rid of a lot of `OldXxx` shims. On Wed, Oct 17, 2018 at 12:55 AM Marco Gaido wrote: > Hi all, > > I think a very big topic on this would be:

Re: Hadoop 3 support

2018-10-17 Thread Hyukjin Kwon

See the discussion at https://github.com/apache/spark/pull/21588 2018년 10월 17일 (수) 오전 5:06, t4 님이 작성: > has anyone got spark jars working with hadoop3.1 that they can share? i am > looking to be able to use the latest hadoop-aws fixes from v3.1 > > > > -- > Sent from: http://apache-spark-develop

Re: Starting to make changes for Spark 3 -- what can we delete?

2018-10-17 Thread Marco Gaido

Hi all, I think a very big topic on this would be: what do we want to do with the old mllib API? For long I have been told that it was going to be removed on 3.0. Is this still the plan? Thanks, Marco Il giorno mer 17 ott 2018 alle ore 03:11 Marcelo Vanzin ha scritto: > Might be good to take a

Structured Streaming with Watermark

[MLlib] PCA Aggregator

Re: Starting to make changes for Spark 3 -- what can we delete?

Re: moving the spark jenkins job builder repo from dbricks --> spark

Re: moving the spark jenkins job builder repo from dbricks --> spark

Re: some doubt on code understanding

Re: some doubt on code understanding

Re: some doubt on code understanding

some doubt on code understanding

Re: Starting to make changes for Spark 3 -- what can we delete?

Re: Hadoop 3 support

Re: Starting to make changes for Spark 3 -- what can we delete?

12 matches

Site Navigation

Mail list logo

Footer information