from:"anshu shukla"

Corelation between 2 consecutive RDDs in Dstream

2015-11-20 Thread anshu shukla

Regards, Anshu Shukla

Moving avg in saprk streaming

2015-11-19 Thread anshu shukla

Any formal way to do moving avg over fixed window duration . I calculated a simple moving average by creating a count stream and a sum stream; then joined them and finally calculated the mean. This was not per time window since time periods were part of the tuples. -- Thanks & Regards, A

REST api to avoid spark context creation

2015-10-18 Thread anshu shukla

creation for every Job. 2-Can i have something like pool to serve requests . -- Thanks & Regards, Anshu Shukla

Resource allocation in SPARK streaming

2015-09-01 Thread anshu shukla

I am not much clear about resource allocation (CPU/CORE/Thread level allocation) as per the parallelism by setting number of cores in spark standalone mode . Any guidelines for that . -- Thanks & Regards, Anshu Shukla

Setting number of CORES from inside the Topology (JAVA code )

2015-08-26 Thread anshu shukla

if(toponame.equals("IdentityTopology")) { sparkConf.setExecutorEnv("SPARK_WORKER_CORES","1"); } -- Thanks & Regards, Anshu Shukla

Re: Graceful shutdown for Spark Streaming

2015-07-30 Thread anshu shukla

ingContext.stop() > > On Wed, Jul 29, 2015 at 6:55 PM, anshu shukla > wrote: > >> If we want to stop the application after fix-time period , how it will >> work . (How to give the duration in logic , in my case sleep(t.s.) is not >> working .) So i used to kill coarseG

Re: Graceful shutdown for Spark Streaming

2015-07-29 Thread anshu shukla

>> >> Does sbin/stop-all.sh stop the context gracefully? How is it done? Is >> there a signal sent to the driver process? >> >> For EMR, is there a way how to terminate an EMR cluster with Spark >> Streaming graceful shutdown? >> >> Thanks! >> >> >> > -- Thanks & Regards, Anshu Shukla

Parallelism of Custom receiver in spark

2015-07-25 Thread anshu shukla

1 - How to increase the level of *parallelism in spark streaming custom RECEIVER* . 2 - Will ssc.receiverstream(/**anything //) will *delete the data stored in spark memory using store(s) * logic . -- Thanks & Regards, Anshu Shukla

ReceiverStream SPARK not able to cope up with 20,000 events /sec .

2015-07-25 Thread anshu shukla

(","); } String s1=MsgIdAddandRemove.addMessageId(tuple.toString(),msgId); store(s1); } -- Thanks & Regards, Anshu Shukla

Re: Ordering of Batches in Spark streaming

2015-07-12 Thread anshu shukla

Anyone who can give some highlight over HOW SPARK DOES *ORDERING OF BATCHES * . On Sat, Jul 11, 2015 at 9:19 AM, anshu shukla wrote: > Thanks Ayan , > > I was curious to know* how Spark does it *.Is there any *Documentation* > where i can get the detail about that . Will you

Rdd partitioning

2015-07-11 Thread anshu shukla

operations likereduceByKey and join, the largest number of partitions in a parent RDD. For operations likeparallelize with no parent RDDs, it depends on the cluster manager - - *Others: total number of cores on all executor nodes or 2, whichever is larger* -- Thanks & Regards, Anshu Shukla

Re: Ordering of Batches in Spark streaming

2015-07-10 Thread anshu shukla

rtitions like a normal RDD, so following > rdd.zipWithIndex should give a wy to order them by the time they are > received. > > On Sat, Jul 11, 2015 at 12:50 PM, anshu shukla > wrote: > >> Hey , >> >> Is there any *guarantee of fix ordering among the batches/RDDs*

Ordering of Batches in Spark streaming

2015-07-10 Thread anshu shukla

1.4 in this context. Any Comments please !! -- Thanks & Regards, Anshu Shukla

Error while taking union

2015-07-08 Thread anshu shukla

Hi all , I want to create union of 2 DStreams , in one of them *RDD is created per 1 second* , other is having RDD generated by reduceByWindowandKey with *duration set to 60 sec.* (slide duration also 60 sec .) - Main idea is to do some analysis for every minute data and emitting union

Requirement failed: Some of the DStreams have different slide durations

2015-07-08 Thread anshu shukla

union$1.apply(DStream.scala:849) at org.apache.spark.streaming.dstream.DStream$$anonfun$union$1.apply(DStream.scala:849) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)* -- Thanks & Regards, Anshu Shukla

Re: Applying functions over certain count of tuples .

2015-06-30 Thread anshu shukla

> > The edge case is when the final grouping doesn't have exactly 5 items, if > that matters. > > On Mon, Jun 29, 2015 at 3:57 PM, anshu shukla > wrote: > >> I want to apply some logic on the basis of a FIX count of number of >> tuples i

Applying functions over certain count of tuples .

2015-06-29 Thread anshu shukla

I want to apply some logic on the basis of a FIX count of number of tuples in each RDD . *suppose emit one rdd for every 5 tuple of previous RDD . * -- Thanks & Regards, Anshu Shukla

Re: Parsing a tsv file with key value pairs

2015-06-25 Thread anshu shukla

gt; After following operation : > > split_lines = lines.map(_.split("\t")) > > what should I do to read the key values in dictionary? > > > Thanks > > Ravikant > > > -- Thanks & Regards, Anshu Shukla

Re: Loss of data due to congestion

2015-06-24 Thread anshu shukla

Thaks, I am talking about streaming. On 25 Jun 2015 5:37 am, "ayan guha" wrote: > Can you elaborate little more? Are you talking about receiver or streaming? > On 24 Jun 2015 23:18, "anshu shukla" wrote: > >> How spark guarantees that no RDD will fail /lost

Loss of data due to congestion

2015-06-24 Thread anshu shukla

How spark guarantees that no RDD will fail /lost during its life cycle . Is there something like ask in storm or its does it by default . -- Thanks & Regards, Anshu Shukla

Re: Multiple executors writing file using java filewriter

2015-06-23 Thread anshu shukla

Thanks alot , Because i just want to log timestamp and unique message id and not full RDD . On Tue, Jun 23, 2015 at 12:41 PM, Akhil Das wrote: > Why don't you do a normal .saveAsTextFiles? > > Thanks > Best Regards > > On Mon, Jun 22, 2015 at 11:55 PM, anshu shukla

Calculating tuple count /input rate with time

2015-06-23 Thread anshu shukla

, Void>() { @Override public Void call(JavaRDD stringJavaRDD) throws Exception { System.out.println(System.currentTimeMillis()+",spoutstringJavaRDD," + stringJavaRDD.count() ); return null; } }); -- Thanks & Regards, Anshu Shukla

Re: Multiple executors writing file using java filewriter

2015-06-22 Thread anshu shukla

: > Is spoutLog just a non-spark file writer? If you run that in the map call > on a cluster its going to be writing in the filesystem of the executor its > being run on. I'm not sure if that's what you intended. > > On Mon, Jun 22, 2015 at 1:35 PM, anshu shukla > wro

Re: Multiple executors writing file using java filewriter

2015-06-22 Thread anshu shukla

e.printStackTrace(); throw e; } System.out.println("msgid,"+msgId); return msgeditor.addMessageId(v1,msgId); } }); -- Thanks & Regards, Anshu Shukla On Mon, Jun 22, 2015 at 10:50 PM, anshu shukla wrote: > Can not we write some data to a

Multiple executors writing file using java filewriter

2015-06-22 Thread anshu shukla

Can not we write some data to a txt file in parallel with multiple executors running in parallel ?? -- Thanks & Regards, Anshu Shukla

Re: Using Accumulators in Streaming

2015-06-22 Thread anshu shukla

e 2015 at 21:32, Will Briggs wrote: > >> It sounds like accumulators are not necessary in Spark Streaming - see >> this post ( >> http://apache-spark-user-list.1001560.n3.nabble.com/Shared-variable-in-Spark-Streaming-td11762.html) >> for more details. >> >&g

Using Accumulators in Streaming

2015-06-21 Thread anshu shukla

In spark Streaming ,Since we are already having Streaming context , which does not allows us to have accumulators .We have to get sparkContext for initializing accumulator value . But having 2 spark context will not serve the problem . Please Help !! -- Thanks & Regards, Anshu Shukla

Updation of Static variable inside foreachRDD method

2015-06-21 Thread anshu shukla

etMessageId(s)); //System.out.println(msgeditor.getMessageId(s)); } return null; } }); -- Thanks & Regards, Anshu Shukla

Fwd: Verifying number of workers in Spark Streaming

2015-06-20 Thread anshu shukla

not able figure out that my job is using all workers or not . -- Thanks & Regards, Anshu Shukla SERC-IISC

Verifying number of workers in Spark Streaming

2015-06-20 Thread anshu shukla

How to know that In stream Processing over the cluster of 8 machines all the machines/woker nodes are being used (my cluster have 8 slaves ) . -- Thanks & Regards, Anshu Shukla

Re: Assigning number of workers in spark streaming

2015-06-19 Thread anshu shukla

in the online documentation. > > http://spark.apache.org/docs/latest/submitting-applications.html > > On Fri, Jun 19, 2015 at 2:29 PM, anshu shukla > wrote: > >> Hey , >> *[For Client Mode]* >> >> 1- Is there any way to assign the number of workers

Assigning number of workers in spark streaming

2015-06-19 Thread anshu shukla

litter->>wordcount>>statistical analysis} then on how many workers it will be scheduled . -- Thanks & Regards, Anshu Shukla SERC-IISC

Re: Latency between the RDD in Streaming

2015-06-19 Thread anshu shukla

cessed, > isnt it? > > > On Thu, Jun 18, 2015 at 2:28 PM, anshu shukla > wrote: > >> Thanks alot , But i have already tried the second way ,Problem with >> that is that how to identify the particular RDD from source to sink (as we >> can do by passing a msg i

Re: Latency between the RDD in Streaming

2015-06-18 Thread anshu shukla

rrent timestamp in the records, and then compare them with the timestamp > at the final step of the records. Assuming the executor and driver clocks > are reasonably in sync, this will measure the latency between the time is > received by the system and the result from the record is available.

Re: Latency between the RDD in Streaming

2015-06-18 Thread anshu shukla

ind "what" among RDD? > > On Thu, Jun 18, 2015 at 11:24 AM, anshu shukla > wrote: > >> Is there any fixed way to find among RDD in stream processing systems , >> in the Distributed set-up . >> >> -- >> Thanks & Regards, >> Anshu Shukla >> > > -- Thanks & Regards, Anshu Shukla

Latency between the RDD in Streaming

2015-06-18 Thread anshu shukla

Is there any fixed way to find among RDD in stream processing systems , in the Distributed set-up . -- Thanks & Regards, Anshu Shukla

Implementing and Using a Custom Actor-based Receiver

2015-06-17 Thread anshu shukla

Is there any good sample code in java to implement *Implementing and Using a Custom Actor-based Receiver .* -- Thanks & Regards, Anshu Shukla

Problem: Custom Receiver for getting events from a Dynamic Queue

2015-06-15 Thread anshu shukla

Auto-generated method stub //System.out.println("Called IN SPOUT### "); try { this.eventQueue.put(event); } catch (InterruptedException e) { // TODO Auto-generated catch block e.printStackTrace(); } } } -- Thanks & Regards, Anshu Shukla

Using queueStream

2015-06-15 Thread anshu shukla

JavaDStream inputStream = ssc.queueStream(rddQueue); Can this rddQueue be of dynamic type in nature .If yes then how to make it run untill rddQueue is not finished . Any other way to get rddQueue from a dynamically updatable Normal Queue . -- Thanks & Regards, SERC-IISC Anshu Shukla

Union of two Diff. types of DStreams

2015-06-14 Thread anshu shukla

How to take union of JavaPairDStream and JavaDStream . *>>>>>a.union(b) is working only with Dstreams of same type.* -- Thanks & Regards, Anshu Shukla

Re: EVent generation

2015-05-12 Thread anshu shukla

; > I've had good success with splunk generator. > https://github.com/coccyx/eventgen/blob/master/README.md > > > On May 11, 2015, at 00:05, Akhil Das wrote: > > Have a look over here https://storm.apache.org/community.html > > Thanks > Best Regards >

EVent generation

2015-05-10 Thread anshu shukla

http://stackoverflow.com/questions/30149868/generate-events-tuples-using-csv-file-with-timestamps -- Thanks & Regards, Anshu Shukla

Re: Map one RDD into two RDD

2015-05-08 Thread anshu shukla

" On Fri, May 8, 2015 at 2:42 AM, anshu shukla wrote: > One of the best discussion in mailing list :-) ...Please help me in > concluding -- > > The whole discussion concludes that - > > 1- Framework does not support increasing parallelism of any task just > by an

Re: Map one RDD into two RDD

2015-05-07 Thread anshu shukla

PM > *To:* user@spark.apache.org > *Subject:* Map one RDD into two RDD > > > > Hi all, > > I have a large RDD that I map a function to it. Based on the nature of > each record in the input RDD, I will generate two types of data. I would > like to save each type into its own RDD. But I can't seem to find an > efficient way to do it. Any suggestions? > > > > Many thanks. > > > > > > Bill > > > > -- > > Many thanks. > > Bill > > > > > > -- > > Many thanks. > > Bill > > > > > > -- > > Many thanks. > > Bill > > > -- Thanks & Regards, Anshu Shukla

Predict.scala using model for clustering In reference

2015-05-07 Thread anshu shukla

rence-apps/blob/master/twitter_classifier/predict.md -- Thanks & Regards, Anshu Shukla

[no subject]

2015-05-06 Thread anshu shukla

" scalaVersion := "2.11.6" libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.3.1" -- Thanks & Regards, Anshu Shukla Indian Institute of Science

Re: Creating topology in spark streaming

2015-05-06 Thread anshu shukla

sm without a Central Scheduler > > > > *From:* Juan Rodríguez Hortalá [mailto:juan.rodriguez.hort...@gmail.com] > *Sent:* Wednesday, May 6, 2015 11:20 AM > *To:* Evo Eftimov > *Cc:* anshu shukla; ayan guha; user@spark.apache.org > > *Subject:* Re: Creating topology in spark streaming > > >

Re: Creating topology in spark streaming

2015-05-06 Thread anshu shukla

0/improvements-to-kafka-integration-of-spark-streaming.html > > Hope that helps. > > Greetings, > > Juan > > 2015-05-06 10:32 GMT+02:00 anshu shukla : > >> But main problem is how to increase the level of parallelism for any >> particular bolt logic . >

Re: Creating topology in spark streaming

2015-05-06 Thread anshu shukla

on a dstream will create another dstream. You may > want to take a look at foreachrdd? Also, kindly share your code so people > can help better > On 6 May 2015 17:54, "anshu shukla" wrote: > >> Please help guys, Even After going through all the examples given i >>

Creating topology in spark streaming

2015-05-06 Thread anshu shukla

levele of parallelism since the logic of topology is not clear . -- Thanks & Regards, Anshu Shukla Indian Institute of Sciences

Re: Event generator for SPARK-Streaming from csv

2015-05-05 Thread anshu shukla

hat helps, > > Greetings, > > Juan > > 2015-05-01 9:30 GMT+02:00 anshu shukla : > >> >> >> >> >> I have the real DEBS-TAxi data in csv file , in order to operate over it >> how to simulate a "Spout" kind of thing as event generato

spark log analyzer sample

2015-05-03 Thread anshu shukla

Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4 I am not using any hadoop facility (not even hdfs) then why it is giving this error . -- Thanks & Regards, Anshu Shukla

Fwd: Event generator for SPARK-Streaming from csv

2015-05-01 Thread anshu shukla

I have the real DEBS-TAxi data in csv file , in order to operate over it how to simulate a "Spout" kind of thing as event generator using the timestamps in CSV file. -- Thanks & Regards, Anshu Shukla

Event generator for SPARK-Streaming from csv

2015-04-29 Thread anshu shukla

I have the real DEBS-TAxi data in csv file , in order to operate over it how to simulate a "Spout" kind of thing as event generator using the timestamps in CSV file. -- SERC-IISC Thanks & Regards, Anshu Shukla

Support for Data flow graphs and not DAG only

2015-04-02 Thread anshu shukla

Hey , I didn't find any documentation regarding support for cycles in spark topology , although storm supports this using manual configuration in acker function logic (setting it to a particular count) .By cycles i doesn't mean infinite loops . -- Thanks & Regards, Anshu Shukla

55 matches

Mail list logo