Correct way to use spark streaming with apache zeppelin

2016-03-11 Thread trung kien
Hi all, I've just viewed some Zeppenlin's videos. The intergration between Zeppenlin and Spark is really amazing and i want to use it for my application. In my app, i will have a Spark streaming app to do some basic realtime aggregation ( intermediate data). Then i want to use Zeppenlin to do som

Re: Correct way to use spark streaming with apache zeppelin

2016-03-12 Thread trung kien
analytics" -- do you mean build a report or dashboard that automatically updates as new data comes in? -- Chris Miller On Sat, Mar 12, 2016 at 3:13 PM, trung kien wrote: > Hi all, > > I've just viewed some Zeppenlin's videos. The intergration between > Zeppenlin and

Re: Correct way to use spark streaming with apache zeppelin

2016-03-13 Thread trung kien
gt; trigger your code to push out an updated value to any clients via the >>> websocket. You could use something like a Redis pub/sub channel to trigger >>> the web app to notify clients of an update. >>> >>> There are about 5 million other ways you could d

Spark Streaming - Kafka Direct Approach: re-compute from specific time

2016-05-25 Thread trung kien
Hi all, Is there any way to re-compute using Spark Streaming - Kafka Direct Approach from specific time? In some cases, I want to re-compute again from specific time (e.g beginning of day)? is that possible? -- Thanks Kien

Re: Spark Streaming - Kafka Direct Approach: re-compute from specific time

2016-05-25 Thread trung kien
a kafka > improvement proposal for it but it has gotten pushed back to at least > 0.10.1 > > If you want to do this kind of thing, you will need to maintain your > own index from time to offset. > > On Wed, May 25, 2016 at 8:15 AM, trung kien wrote: > > Hi all, > > &g

Re: Spark Streaming - Kafka Direct Approach: re-compute from specific time

2016-05-25 Thread trung kien
Ah right i see. Thank you very much. On May 25, 2016 11:11 AM, "Cody Koeninger" wrote: > There's an overloaded createDirectStream method that takes a map from > topicpartition to offset for the starting point of the stream. > > On Wed, May 25, 2016 at 9:59 AM, trung

Re: Spark Streaming data checkpoint performance

2015-11-07 Thread trung kien
Hmm, Seems it just do a trick. Using this method, it's very hard to recovery from failure, since we don't know which batch have been done. I really want to maintain the whole running stats in memory to archive full failure-tolerant. I just wonder if the performance of data checkpoint is that bad

RDD partition after calling mapToPair

2015-11-21 Thread trung kien
Hi all, I am having problem of understanding how RDD will be partitioned after calling mapToPair function. Could anyone give me more information about parititoning in this function? I have a simple application doing following job: JavaPairInputDStream messages = KafkaUtils.createDirectStream(...

Re: RDD partition after calling mapToPair

2015-11-24 Thread trung kien
On Nov 23, 2015 12:26 AM, "Cody Koeninger" wrote: >> >>> Spark direct stream doesn't have a default partitioner. >>> >>> If you know that you want to do an operation on keys that are already >>> partitioned by kafka, just use mapPartitions or fo

Kubernetes security context when submitting job through k8s servers

2018-07-09 Thread trung kien
Dear all, Is there any way to includes security context ( https://kubernetes.io/docs/tasks/configure-pod-container/security-context/) when submitting job through k8s servers? I'm trying to first spark jobs on Kubernetes through spark-submit: bin/spark-submit --master k8s://https://API_SERVERS --

Re: Kubernetes security context when submitting job through k8s servers

2018-07-09 Thread trung kien
a custom SecurityContext > of the driver/executor pods. This will be supported by the solution to > https://issues.apache.org/jira/browse/SPARK-24434. > > On Mon, Jul 9, 2018 at 2:06 PM trung kien wrote: > >> Dear all, >> >> Is there any way to includes security cont