if RDDs from same DStream not guaranteed to run on same worker, then the question becomes:
is it possible to specify an unlimited duration in ssc to have a continuous stream (as opposed to discretized). say, we have a per node streaming engine (built-in checkpoint and recovery) we'd like to integrate with spark streaming. can we have a never-ending batch (or RDD) this way? On Mon, Sep 28, 2015 at 4:31 PM, <mailer-dae...@apache.org> wrote: > Hi. This is the qmail-send program at apache.org. > I'm afraid I wasn't able to deliver your message to the following > addresses. > This is a permanent error; I've given up. Sorry it didn't work out. > > <u...@spark.apache.org>: > Must be sent from an @apache.org address or a subscriber address or an > address in LDAP. > > --- Below this line is a copy of the message. > > Return-Path: <renyixio...@gmail.com> > Received: (qmail 95559 invoked by uid 99); 28 Sep 2015 23:31:46 -0000 > Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) > by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Sep 2015 23:31:46 > +0000 > Received: from localhost (localhost [127.0.0.1]) > by spamd3-us-west.apache.org (ASF Mail Server at > spamd3-us-west.apache.org) with ESMTP id 96E361809BA > for <u...@spark.apache.org>; Mon, 28 Sep 2015 23:31:45 +0000 (UTC) > X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org > X-Spam-Flag: NO > X-Spam-Score: 3.129 > X-Spam-Level: *** > X-Spam-Status: No, score=3.129 tagged_above=-999 required=6.31 > tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, > FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=3, > RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] > autolearn=disabled > Authentication-Results: spamd3-us-west.apache.org (amavisd-new); > dkim=pass (2048-bit key) header.d=gmail.com > Received: from mx1-us-west.apache.org ([10.40.0.8]) > by localhost (spamd3-us-west.apache.org [10.40.0.10]) > (amavisd-new, port 10024) > with ESMTP id FAGoohFE7Y7A for <u...@spark.apache.org>; > Mon, 28 Sep 2015 23:31:44 +0000 (UTC) > Received: from mail-la0-f51.google.com (mail-la0-f51.google.com > [209.85.215.51]) > by mx1-us-west.apache.org (ASF Mail Server at > mx1-us-west.apache.org) with ESMTPS id 2ED40204C9 > for <u...@spark.apache.org>; Mon, 28 Sep 2015 23:31:44 +0000 (UTC) > Received: by labzv5 with SMTP id zv5so32919088lab.1 > for <u...@spark.apache.org>; Mon, 28 Sep 2015 16:31:42 -0700 (PDT) > DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; > d=gmail.com; s=20120113; > > h=mime-version:in-reply-to:references:date:message-id:subject:from:to > :cc:content-type; > bh=F36l+I4dfDHTL7nQ0K9mAW4aVtPpVpYc0rWWpPNjt4c=; > > b=QfRdLEWf4clJqwkZSH7n0oHjXLNifWdhYxvCDZ+P37oSfM0vd/8Bx962hTflRQkD1q > > 2B3go7g8bpnQlhZgMRrZfT6hk7vUtNA3lOZjYeN+cPyoVRaBwm3LIID5vF4cw5hFAWaM > > LUenU7E7b9kJY8JkyhIfpya8CLKz+Yo6EjCv3W6BAvv2YiNPgbOQkpx7u8y9dw0kHGig > > 1hv37Ey/DZpoKCgbSesv+sztYslevu+VBgxHFkveEyxH1saRr6OqTM7fnL2E6fP4E8qu > > W81G1ZfNW1Pp9i5IcCb/9S7YTZDnBlUj4yROsOfNANRGmed71QpQD9l8NnAQXmeqoeNF > SyEg== > MIME-Version: 1.0 > X-Received: by 10.25.213.75 with SMTP id m72mr4047578lfg.17.1443483102618; > Mon, 28 Sep 2015 16:31:42 -0700 (PDT) > Received: by 10.25.207.18 with HTTP; Mon, 28 Sep 2015 16:31:42 -0700 (PDT) > In-Reply-To: <CAPn6-YTZo+-A1HThmBO4mKO0sELrvxP7DZEpg3GoWN=Qz= > 2...@mail.gmail.com> > References: < > cangsv6-k+33gvgtiynwhz2gsbudf_wwwnazvupbqe8qdcg_...@mail.gmail.com> > <CAPn6-YQ3Q-=HMrqz5FLLPx_HmjmHkHP7cwsPYvsxw-tb7a8P= > g...@mail.gmail.com> > <CAPn6-YTZo+-A1HThmBO4mKO0sELrvxP7DZEpg3GoWN=Qz= > 2...@mail.gmail.com> > Date: Mon, 28 Sep 2015 16:31:42 -0700 > Message-ID: < > cangsv69hyqbbvb8_8zshstlrpdy-37fjnwyvxce-xf7dphq...@mail.gmail.com> > Subject: Re: Spark streaming DStream state on worker > From: Renyi Xiong <renyixio...@gmail.com> > To: Shixiong Zhu <zsxw...@gmail.com> > Cc: "u...@spark.apache.org" <u...@spark.apache.org> > Content-Type: multipart/alternative; boundary=001a11411fde922c170520d71928 > > --001a11411fde922c170520d71928 > Content-Type: text/plain; charset=UTF-8 > > you answered my question I think that RDDs from same DStream not guaranteed > to run on same worker > > On Thu, Sep 24, 2015 at 1:51 AM, Shixiong Zhu <zsxw...@gmail.com> wrote: > > > +user, -dev > > > > It's not clear about `compute` in your question. There are two `compute` > > here. > > > > 1. DStream.compute: it always runs in the driver, and all RDDs are > created > > in the driver. E.g., > > > > DStream.foreachRDD(rdd => rdd.count()) > > > > "rdd.count()" is called in the driver. > > > > 2. RDD.compute: this will run in the executor and the location is not > > guaranteed. E.g., > > > > DStream.foreachRDD(rdd => rdd.foreach { v => > > println(v) > > }) > > > > "println(v)" is called in the executor. > > > > > > Best Regards, > > Shixiong Zhu > > > > 2015-09-17 3:47 GMT+08:00 Renyi Xiong <renyixio...@gmail.com>: > > > >> Hi, > >> > >> I want to do temporal join operation on DStream across RDDs, my question > >> is: Are RDDs from same DStream always computed on same worker (except > >> failover) ? > >> > >> thanks, > >> Renyi. > >> > > > > > > > > --001a11411fde922c170520d71928 > Content-Type: text/html; charset=UTF-8 > Content-Transfer-Encoding: quoted-printable > > <div dir=3D"ltr">you answered my question I think that RDDs from same > DStre= > am not guaranteed to run on same worker</div><div > class=3D"gmail_extra"><br= > ><div class=3D"gmail_quote">On Thu, Sep 24, 2015 at 1:51 AM, Shixiong Zhu > <= > span dir=3D"ltr"><<a href=3D"mailto:zsxw...@gmail.com" > target=3D"_blank"= > >zsxw...@gmail.com</a>></span> wrote:<br><blockquote > class=3D"gmail_quot= > e" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc > solid;padding-left:1ex">= > <div dir=3D"ltr"><div class=3D"gmail_quote"><div dir=3D"ltr"><div>+user, > -d= > ev</div><div><div class=3D"h5"><div><br></div><div>It's not clear > about= > `compute` in your question. There are two `compute` > here.</div><div><br></= > div><div>1. DStream.compute: it always runs in the driver, and all RDDs > are= > created in the driver. > E.g.,=C2=A0</div><div><br></div><div>DStream.foreac= > hRDD(rdd =3D> > rdd.count())</div><div><br></div><div>"rdd.count()&qu= > ot; is called in the driver.</div><div><br></div><div>2. RDD.compute: this > = > will run in the executor and the location is not guaranteed. > E.g.,</div><di= > v><br></div><div>DStream.foreachRDD(rdd =3D> rdd.foreach { v > =3D></di= > v><div>=C2=A0 =C2=A0 > println(v)</div><div>})<br></div><div><br></div><div>&= > quot;println(v)" is called in the > executor.</div><br></div></div></div= > ><div><div class=3D"h5"><div class=3D"gmail_extra"><br > clear=3D"all"><div><= > div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div > = > dir=3D"ltr"><p>Best Regards,</p><div>Shixiong > Zhu</div></div></div></div></= > div></div></div></div></div></div><div><div> > <br><div class=3D"gmail_quote">2015-09-17 3:47 GMT+08:00 Renyi Xiong <span > = > dir=3D"ltr"><<a href=3D"mailto:renyixio...@gmail.com" > target=3D"_blank">= > renyixio...@gmail.com</a>></span>:<br><blockquote > class=3D"gmail_quote" = > style=3D"margin:0 0 0 .8ex;border-left:1px #ccc > solid;padding-left:1ex"><di= > v dir=3D"ltr"><div>Hi,</div><div><br></div><div>I want to do=C2=A0temporal > = > join operation on DStream across RDDs, my question is: Are RDDs from same > D= > Stream always computed on same worker (except failover) > ?</div><div><br></d= > iv><div>thanks,</div><div>Renyi.</div></div> > </blockquote></div><br></div></div></div> > </div></div></div><br></div> > </blockquote></div><br></div> > > --001a11411fde922c170520d71928-- >