Re: Get only updated RDDs from or after updateStateBykey

2015-09-24 Thread Bin Wang
} > > > Best Regards, > Shixiong Zhu > > 2015-09-24 17:42 GMT+08:00 Bin Wang : > >> It seems like a work around. But I don't know how to get the database >> connection from the working nodes. >> >> Shixiong Zhu 于2015年9月24日周四 下午5:37写道: >>

Re: Get only updated RDDs from or after updateStateBykey

2015-09-24 Thread Bin Wang
} > > and use this overload: > > def updateStateByKey[S: ClassTag]( > updateFunc: (Seq[V], Option[S]) => Option[S], > partitioner: Partitioner > ): DStream[(K, S)] > > There is a JIRA: https://issues.apache.org/jira/browse/SPARK-2629 but > doesn't

Re: Get only updated RDDs from or after updateStateBykey

2015-09-24 Thread Bin Wang
livered soon. > > Best Regards, > Shixiong Zhu > > 2015-09-24 13:45 GMT+08:00 Bin Wang : > >> I've read the source code and it seems to be impossible, but I'd like to >> confirm it. >> >> It is a very useful feature. For example, I need to store the state

Get only updated RDDs from or after updateStateBykey

2015-09-23 Thread Bin Wang
I've read the source code and it seems to be impossible, but I'd like to confirm it. It is a very useful feature. For example, I need to store the state of DStream into my database, in order to recovery them from next redeploy. But I only need to save the updated ones. Save all keys into database

Re: Checkpoint directory structure

2015-09-23 Thread Bin Wang
vide the logs on when and how you are seeing this error? > > On Wed, Sep 23, 2015 at 6:32 PM, Bin Wang wrote: > >> BTW, I just kill the application and restart it. Then the application >> cannot recover from checkpoint because of some lost of RDD. So I'm wonder, &

Re: Checkpoint directory structure

2015-09-23 Thread Bin Wang
BTW, I just kill the application and restart it. Then the application cannot recover from checkpoint because of some lost of RDD. So I'm wonder, if there are some failure in the application, won't it possible not be able to recovery from checkpoint? Bin Wang 于2015年9月23日周三 下午6:58写道: &g

Checkpoint directory structure

2015-09-23 Thread Bin Wang
I find the checkpoint directory structure is like this: -rw-r--r-- 1 root root 134820 2015-09-23 16:55 /user/root/checkpoint/checkpoint-144299850 -rw-r--r-- 1 root root 134768 2015-09-23 17:00 /user/root/checkpoint/checkpoint-144299880 -rw-r--r-- 1 root root 134895 2015-0

Re: Why there is no snapshots for 1.5 branch?

2015-09-22 Thread Bin Wang
Azuryy Yu > > > > On Sep 22, 2015, at 13:36, Bin Wang wrote: > > However I find some scripts in dev/audit-release, can I use them? > > Bin Wang 于2015年9月22日周二 下午1:34写道: > >> No, I mean push spark to my private repository. Spark don't have a >> build.sbt

Re: Why there is no snapshots for 1.5 branch?

2015-09-21 Thread Bin Wang
However I find some scripts in dev/audit-release, can I use them? Bin Wang 于2015年9月22日周二 下午1:34写道: > No, I mean push spark to my private repository. Spark don't have a > build.sbt as far as I see. > > Fengdong Yu 于2015年9月22日周二 下午1:29写道: > >> Do you mean you want to pu

Re: Why there is no snapshots for 1.5 branch?

2015-09-21 Thread Bin Wang
ld.sb: > > publishTo := { > val nexus = "https://YOUR_PRIVATE_REPO_HOSTS/"; > if (version.value.endsWith("SNAPSHOT")) > Some("snapshots" at nexus + "content/repositories/snapshots") > else > Some("releases" at

Re: Why there is no snapshots for 1.5 branch?

2015-09-21 Thread Bin Wang
f exactly what is in that build readily available, > not just somewhat arbitrary JARs. > > On Mon, Sep 21, 2015 at 9:57 PM, Bin Wang wrote: > >> But I cannot find 1.5.1-SNAPSHOT either at >> https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-core

Re: Why there is no snapshots for 1.5 branch?

2015-09-21 Thread Bin Wang
APSHOT -- soon to be 1.5.1 release > candidates and then the 1.5.1 release. > > On Mon, Sep 21, 2015 at 9:51 PM, Bin Wang wrote: > >> I'd like to use some important bug fixes in 1.5 branch and I look for the >> apache maven host, but don't find any snapshot for 1.5

Why there is no snapshots for 1.5 branch?

2015-09-21 Thread Bin Wang
I'd like to use some important bug fixes in 1.5 branch and I look for the apache maven host, but don't find any snapshot for 1.5 branch. https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-core_2.10/1.5.0-SNAPSHOT/ I can find 1.4.X and 1.6.0 versions, why there is no snap

Re: QueueStream doesn't support checkpoint makes it difficult to do unit test

2015-09-17 Thread Bin Wang
Never mind. I've found a PR and it merged: https://github.com/apache/spark/pull/8624/commits Bin Wang 于2015年9月17日周四 下午4:50写道: > I'm using spark streaming and use updateStateByKey, which forced to use > checkpoint. In my unit test, I create a queueStream to test. But in spark &g

QueueStream doesn't support checkpoint makes it difficult to do unit test

2015-09-17 Thread Bin Wang
I'm using spark streaming and use updateStateByKey, which forced to use checkpoint. In my unit test, I create a queueStream to test. But in spark 1.5, QueueStream will throw an exception while use it with checkpoint, it makes difficult to do unit test. Is there an option to disable this? Though I k

Re: Optimize the first map reduce of DStream

2015-03-24 Thread Bin Wang
Hungary, 2475 Kápolnásnyék, Kossuth 6/a > > elte: HSKSJZ (ZVZOAAI.ELTE) > > 2015-03-24 11:55 GMT+01:00 Arush Kharbanda : > >> The block size is configurable and that way I think you can reduce the >> block interval, to keep the block in memory only for the limiter interv

Optimize the first map reduce of DStream

2015-03-24 Thread Bin Wang
o keep all the batch data in the memory. Something like a pipeline should be OK. Is it difficult to implement on top of the current implementation? Thanks. --- Bin Wang