Yes we do something very similar and it's working well:
Kafka ->
Spark Streaming (write temp files, serialized RDDs) ->
Spark Batch Application (build partitioned Parquet files on HDFS; this is
needed because building Parquet files of a reasonable size is too slow for
streaming) ->
query with Spar
Partition: 4 Leader: 89 Replicas:
> 89Isr: 89
>
>
>
> *From:* Jeff Nadler [mailto:jnad...@srcginc.com]
> *Sent:* Wednesday, September 14, 2016 12:46 PM
> *To:* Rachana Srivastava
> *Cc:* user@spark.apache.org; d...@spark.apache.org
> *Subject:* Re: Not all
Have you checked your Kafka brokers to be certain that data is going to all
5 partitions?We use something very similar (but in Scala) and have no
problems.
Also you might not get the best response blasting both user+dev lists like
this. Normally you'd want to use 'user' only.
-Jeff
On Wed
ri, Sep 9, 2016 at 5:54 PM, Jeff Nadler wrote:
> Yes I'll test that next.
>
> On Sep 9, 2016 5:36 PM, "Cody Koeninger" wrote:
>
>> Does the same thing happen if you're only using direct stream plus back
>> pressure, not the receiver stream?
>>
&
Yes I'll test that next.
On Sep 9, 2016 5:36 PM, "Cody Koeninger" wrote:
> Does the same thing happen if you're only using direct stream plus back
> pressure, not the receiver stream?
>
> On Sep 9, 2016 6:41 PM, "Jeff Nadler" wrote:
>
>> Mayb
mption until it is eventually consuming 1 record / second / partition.
This happens even though there's no scheduling delay, and the
receiver-based stream does not appear to be throttled.
Anyone ever see anything like this?
Thanks!
Jeff Nadler
Aerohive Networks
Your receiver must extend Receiver[String].Try changing it to extend
Receiver[Message]?
On Mon, Oct 12, 2015 at 2:03 PM, Something Something <
mailinglist...@gmail.com> wrote:
> In my custom receiver for Spark Streaming I've code such as this:
>
> messages.toArray().foreach(ms
Gerard - any chance this is related to task locality waiting?Can you
try (just as a diagnostic) something like this, does the unexpected delay
go away?
.set("spark.locality.wait", "0")
On Tue, Oct 6, 2015 at 12:00 PM, Gerard Maas wrote:
> Hi Cody,
>
> The job is doing ETL from Kafka record
sed
> by core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
>
> FYI
>
> On Tue, Oct 6, 2015 at 10:08 AM, shahid qadri
> wrote:
>
>> hi Jeff
>> Thanks
>> More specifically i need the Rest api to submit pyspark job, can you
>> point me to Sp
Spark standalone doesn't come with a UI for submitting jobs. Some Hadoop
distros might, for example EMR in AWS has a job submit UI.
Spark submit just calls a REST api, you could build any UI you want on top
of that...
On Tue, Oct 6, 2015 at 9:37 AM, shahid qadri
wrote:
> Hi Folks
>
> How i c
While investigating performance challenges in a Streaming application using
UpdateStateByKey, I found that serialization of state was a meaningful (not
dominant) portion of our execution time.
In StateDStream.scala, serialized persistence is required:
super.persist(StorageLevel.MEMORY_ONLY_S
You can run multiple Spark clusters against one ZK cluster. Just use this
config to set independent ZK roots for each cluster:
spark.deploy.zookeeper.dir
The directory in ZooKeeper to store recovery state (default: /spark).
-Jeff
From: Sean Owen
To: Akhil Das
Cc: Michal Klos , Use
scala compiler isn't optimizing waitToPush into
a loop? Looks like tail recursion, no?
Thanks-
Jeff Nadler
mLike is
> used for Java code, return a Scala DStream is not reasonable. You can fix
> this by submitting a PR, or I can help you to fix this.
>
>
>
> Thanks
>
> Jerry
>
>
>
> *From:* Jeff Nadler [mailto:jnad...@srcginc.com]
> *Sent:* Monday, January 19,
Duration
): DStream[T] = {
dstream.reduceByWindow(reduceFunc, windowDuration, slideDuration)
}
So I'm just a noob. Is this a bug or am I missing something?
Thanks!
Jeff Nadler
15 matches
Mail list logo