Regards,
Anshu Shukla
Any formal way to do moving avg over fixed window duration .
I calculated a simple moving average by creating a count stream and a sum
stream; then joined them and finally calculated the mean. This was not per
time window since time periods were part of the tuples.
--
Thanks & Regards,
A
creation for every Job.
2-Can i have something like pool to serve requests .
--
Thanks & Regards,
Anshu Shukla
I am not much clear about resource allocation (CPU/CORE/Thread level
allocation) as per the parallelism by setting number of cores in spark
standalone mode .
Any guidelines for that .
--
Thanks & Regards,
Anshu Shukla
if(toponame.equals("IdentityTopology"))
{
sparkConf.setExecutorEnv("SPARK_WORKER_CORES","1");
}
--
Thanks & Regards,
Anshu Shukla
ingContext.stop()
>
> On Wed, Jul 29, 2015 at 6:55 PM, anshu shukla
> wrote:
>
>> If we want to stop the application after fix-time period , how it will
>> work . (How to give the duration in logic , in my case sleep(t.s.) is not
>> working .) So i used to kill coarseG
>>
>> Does sbin/stop-all.sh stop the context gracefully? How is it done? Is
>> there a signal sent to the driver process?
>>
>> For EMR, is there a way how to terminate an EMR cluster with Spark
>> Streaming graceful shutdown?
>>
>> Thanks!
>>
>>
>>
>
--
Thanks & Regards,
Anshu Shukla
1 - How to increase the level of *parallelism in spark streaming custom
RECEIVER* .
2 - Will ssc.receiverstream(/**anything //) will *delete the data
stored in spark memory using store(s) * logic .
--
Thanks & Regards,
Anshu Shukla
(",");
}
String s1=MsgIdAddandRemove.addMessageId(tuple.toString(),msgId);
store(s1);
}
--
Thanks & Regards,
Anshu Shukla
Anyone who can give some highlight over HOW SPARK DOES *ORDERING OF
BATCHES * .
On Sat, Jul 11, 2015 at 9:19 AM, anshu shukla
wrote:
> Thanks Ayan ,
>
> I was curious to know* how Spark does it *.Is there any *Documentation*
> where i can get the detail about that . Will you
operations likereduceByKey and join, the largest
number of partitions in a parent RDD. For operations likeparallelize with
no parent RDDs, it depends on the cluster manager -
- *Others: total number of cores on all executor nodes or 2, whichever
is larger*
--
Thanks & Regards,
Anshu Shukla
rtitions like a normal RDD, so following
> rdd.zipWithIndex should give a wy to order them by the time they are
> received.
>
> On Sat, Jul 11, 2015 at 12:50 PM, anshu shukla
> wrote:
>
>> Hey ,
>>
>> Is there any *guarantee of fix ordering among the batches/RDDs*
1.4 in this context.
Any Comments please !!
--
Thanks & Regards,
Anshu Shukla
Hi all ,
I want to create union of 2 DStreams , in one of them *RDD is created
per 1 second* , other is having RDD generated by reduceByWindowandKey
with *duration set to 60 sec.* (slide duration also 60 sec .)
- Main idea is to do some analysis for every minute data and emitting
union
union$1.apply(DStream.scala:849)
at
org.apache.spark.streaming.dstream.DStream$$anonfun$union$1.apply(DStream.scala:849)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)*
--
Thanks & Regards,
Anshu Shukla
>
> The edge case is when the final grouping doesn't have exactly 5 items, if
> that matters.
>
> On Mon, Jun 29, 2015 at 3:57 PM, anshu shukla
> wrote:
>
>> I want to apply some logic on the basis of a FIX count of number of
>> tuples i
I want to apply some logic on the basis of a FIX count of number of
tuples in each RDD . *suppose emit one rdd for every 5 tuple of previous
RDD . *
--
Thanks & Regards,
Anshu Shukla
gt; After following operation :
>
> split_lines = lines.map(_.split("\t"))
>
> what should I do to read the key values in dictionary?
>
>
> Thanks
>
> Ravikant
>
>
>
--
Thanks & Regards,
Anshu Shukla
Thaks,
I am talking about streaming.
On 25 Jun 2015 5:37 am, "ayan guha" wrote:
> Can you elaborate little more? Are you talking about receiver or streaming?
> On 24 Jun 2015 23:18, "anshu shukla" wrote:
>
>> How spark guarantees that no RDD will fail /lost
How spark guarantees that no RDD will fail /lost during its life cycle .
Is there something like ask in storm or its does it by default .
--
Thanks & Regards,
Anshu Shukla
Thanks alot ,
Because i just want to log timestamp and unique message id and not full
RDD .
On Tue, Jun 23, 2015 at 12:41 PM, Akhil Das
wrote:
> Why don't you do a normal .saveAsTextFiles?
>
> Thanks
> Best Regards
>
> On Mon, Jun 22, 2015 at 11:55 PM, anshu shukla
, Void>() {
@Override
public Void call(JavaRDD stringJavaRDD) throws Exception {
System.out.println(System.currentTimeMillis()+",spoutstringJavaRDD,"
+ stringJavaRDD.count() );
return null;
}
});
--
Thanks & Regards,
Anshu Shukla
:
> Is spoutLog just a non-spark file writer? If you run that in the map call
> on a cluster its going to be writing in the filesystem of the executor its
> being run on. I'm not sure if that's what you intended.
>
> On Mon, Jun 22, 2015 at 1:35 PM, anshu shukla
> wro
e.printStackTrace();
throw e;
}
System.out.println("msgid,"+msgId);
return msgeditor.addMessageId(v1,msgId);
}
});
--
Thanks & Regards,
Anshu Shukla
On Mon, Jun 22, 2015 at 10:50 PM, anshu shukla
wrote:
> Can not we write some data to a
Can not we write some data to a txt file in parallel with multiple
executors running in parallel ??
--
Thanks & Regards,
Anshu Shukla
e 2015 at 21:32, Will Briggs wrote:
>
>> It sounds like accumulators are not necessary in Spark Streaming - see
>> this post (
>> http://apache-spark-user-list.1001560.n3.nabble.com/Shared-variable-in-Spark-Streaming-td11762.html)
>> for more details.
>>
>&g
In spark Streaming ,Since we are already having Streaming context , which
does not allows us to have accumulators .We have to get sparkContext for
initializing accumulator value .
But having 2 spark context will not serve the problem .
Please Help !!
--
Thanks & Regards,
Anshu Shukla
etMessageId(s));
//System.out.println(msgeditor.getMessageId(s));
}
return null;
}
});
--
Thanks & Regards,
Anshu Shukla
not able figure out that my
job is using all workers or not .
--
Thanks & Regards,
Anshu Shukla
SERC-IISC
How to know that In stream Processing over the cluster of 8 machines
all the machines/woker nodes are being used (my cluster have 8 slaves )
.
--
Thanks & Regards,
Anshu Shukla
in the online documentation.
>
> http://spark.apache.org/docs/latest/submitting-applications.html
>
> On Fri, Jun 19, 2015 at 2:29 PM, anshu shukla
> wrote:
>
>> Hey ,
>> *[For Client Mode]*
>>
>> 1- Is there any way to assign the number of workers
litter->>wordcount>>statistical analysis}
then on how many workers it will be scheduled .
--
Thanks & Regards,
Anshu Shukla
SERC-IISC
cessed,
> isnt it?
>
>
> On Thu, Jun 18, 2015 at 2:28 PM, anshu shukla
> wrote:
>
>> Thanks alot , But i have already tried the second way ,Problem with
>> that is that how to identify the particular RDD from source to sink (as we
>> can do by passing a msg i
rrent timestamp in the records, and then compare them with the timestamp
> at the final step of the records. Assuming the executor and driver clocks
> are reasonably in sync, this will measure the latency between the time is
> received by the system and the result from the record is available.
ind "what" among RDD?
>
> On Thu, Jun 18, 2015 at 11:24 AM, anshu shukla
> wrote:
>
>> Is there any fixed way to find among RDD in stream processing systems ,
>> in the Distributed set-up .
>>
>> --
>> Thanks & Regards,
>> Anshu Shukla
>>
>
>
--
Thanks & Regards,
Anshu Shukla
Is there any fixed way to find among RDD in stream processing systems ,
in the Distributed set-up .
--
Thanks & Regards,
Anshu Shukla
Is there any good sample code in java to implement *Implementing and
Using a Custom Actor-based Receiver .*
--
Thanks & Regards,
Anshu Shukla
Auto-generated method stub
//System.out.println("Called IN SPOUT### ");
try {
this.eventQueue.put(event);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
--
Thanks & Regards,
Anshu Shukla
JavaDStream inputStream = ssc.queueStream(rddQueue);
Can this rddQueue be of dynamic type in nature .If yes then how to
make it run untill rddQueue is not finished .
Any other way to get rddQueue from a dynamically updatable Normal Queue .
--
Thanks & Regards,
SERC-IISC
Anshu Shukla
How to take union of JavaPairDStream and
JavaDStream .
*>>>>>a.union(b) is working only with Dstreams of same type.*
--
Thanks & Regards,
Anshu Shukla
;
> I've had good success with splunk generator.
> https://github.com/coccyx/eventgen/blob/master/README.md
>
>
> On May 11, 2015, at 00:05, Akhil Das wrote:
>
> Have a look over here https://storm.apache.org/community.html
>
> Thanks
> Best Regards
>
http://stackoverflow.com/questions/30149868/generate-events-tuples-using-csv-file-with-timestamps
--
Thanks & Regards,
Anshu Shukla
"
On Fri, May 8, 2015 at 2:42 AM, anshu shukla wrote:
> One of the best discussion in mailing list :-) ...Please help me in
> concluding --
>
> The whole discussion concludes that -
>
> 1- Framework does not support increasing parallelism of any task just
> by an
PM
> *To:* user@spark.apache.org
> *Subject:* Map one RDD into two RDD
>
>
>
> Hi all,
>
> I have a large RDD that I map a function to it. Based on the nature of
> each record in the input RDD, I will generate two types of data. I would
> like to save each type into its own RDD. But I can't seem to find an
> efficient way to do it. Any suggestions?
>
>
>
> Many thanks.
>
>
>
>
>
> Bill
>
>
>
> --
>
> Many thanks.
>
> Bill
>
>
>
>
>
> --
>
> Many thanks.
>
> Bill
>
>
>
>
>
> --
>
> Many thanks.
>
> Bill
>
>
>
--
Thanks & Regards,
Anshu Shukla
rence-apps/blob/master/twitter_classifier/predict.md
--
Thanks & Regards,
Anshu Shukla
"
scalaVersion := "2.11.6"
libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.3.1"
--
Thanks & Regards,
Anshu Shukla
Indian Institute of Science
sm without a Central Scheduler
>
>
>
> *From:* Juan RodrĂguez Hortalá [mailto:juan.rodriguez.hort...@gmail.com]
> *Sent:* Wednesday, May 6, 2015 11:20 AM
> *To:* Evo Eftimov
> *Cc:* anshu shukla; ayan guha; user@spark.apache.org
>
> *Subject:* Re: Creating topology in spark streaming
>
>
>
0/improvements-to-kafka-integration-of-spark-streaming.html
>
> Hope that helps.
>
> Greetings,
>
> Juan
>
> 2015-05-06 10:32 GMT+02:00 anshu shukla :
>
>> But main problem is how to increase the level of parallelism for any
>> particular bolt logic .
>
on a dstream will create another dstream. You may
> want to take a look at foreachrdd? Also, kindly share your code so people
> can help better
> On 6 May 2015 17:54, "anshu shukla" wrote:
>
>> Please help guys, Even After going through all the examples given i
>>
levele of parallelism since the logic of topology is not clear .
--
Thanks & Regards,
Anshu Shukla
Indian Institute of Sciences
hat helps,
>
> Greetings,
>
> Juan
>
> 2015-05-01 9:30 GMT+02:00 anshu shukla :
>
>>
>>
>>
>>
>> I have the real DEBS-TAxi data in csv file , in order to operate over it
>> how to simulate a "Spout" kind of thing as event generato
Exception in thread "main" java.lang.RuntimeException:
org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot
communicate with client version 4
I am not using any hadoop facility (not even hdfs) then why it is giving
this error .
--
Thanks & Regards,
Anshu Shukla
I have the real DEBS-TAxi data in csv file , in order to operate over it
how to simulate a "Spout" kind of thing as event generator using the
timestamps in CSV file.
--
Thanks & Regards,
Anshu Shukla
I have the real DEBS-TAxi data in csv file , in order to operate over it
how to simulate a "Spout" kind of thing as event generator using the
timestamps in CSV file.
--
SERC-IISC
Thanks & Regards,
Anshu Shukla
Hey ,
I didn't find any documentation regarding support for cycles in spark
topology , although storm supports this using manual configuration in
acker function logic (setting it to a particular count) .By cycles i
doesn't mean infinite loops .
--
Thanks & Regards,
Anshu Shukla
55 matches
Mail list logo