Hi,
I started Spark Streaming job with 96 executors which reads from 96 Kafka
partitions and applies mapWithState on the incoming DStream.
Why would it cache only 77 partitions? Do I have to allocate more memory?
Currently each executor gets 10 GB and it is not clear why it can't cache all
96
Hi,
What is the recommended way to share datasets across multiple
spark-streaming applications, so that the incoming data can be looked up
against this shared dataset?
The shared dataset is also incrementally refreshed and stored on S3. Below
is the scenario.
Streaming App-1 consumes data from S
Hey buddy,
Thanks a TON! Issue resolved.
Thanks again,
Aakash.
On 30-Oct-2017 11:44 PM, "Hondros, Constantine (ELS-AMS)" <
c.hond...@elsevier.com> wrote:
> You should just use regexp_replace to remove all the leading number
> information (assuming it ends with a full-stop, and catering for the
Hi Gurus,
The parameter spark.yarn.executor.memoryOverhead is explained as below:
spark.yarn.executor.memoryOverhead
executorMemory * 0.10, with minimum of 384
The amount of off-heap memory (in megabytes) to be allocated per executor. This
is memory that accounts for things like VM overheads, i
You should just use regexp_replace to remove all the leading number information
(assuming it ends with a full-stop, and catering for the possibility of a
capital letter).
This is untested, but it shoud do the trick based on your examples so far:
df.withColumn(“new_column”, regexp_replace($”Desc
Hi All,
Using Spark 2.2.0 on YARN cluster.
I am running the Kafka Direct Stream wordcount example code (pasted below my
signature). My topic consists of 400 partitions. And the Spark Job tracker page
shows 26 executors to process the corresponding 400 tasks.
When I check the execution timeline
Hi all,
I've a requirement to split a column and fetch only the description where I
have numbers appended before that for some rows whereas other rows have
only the description -
Eg - (Description is the column header)
*Description*
Inventory Tree
Products
1. AT&T Services
2. Accessories
4. Misc
Hello,
I'm using Stratio/spark-rabbitmq to read messages from RabbitMQ and save to
Kafka, and I just want "commit" the RabbitMQ message when it's safe on
Kafka's broker. For efficiency propose I'm using Kafka buffer and a call
back object, which should has the RabbitMQ message Delivery Tag to
ackn
I saw that it is possible to truncate date function with MM or YY but it is not
possible to truncate by WEEK ,HOUR, MINUTE.
Am I right? Is there any objection to support it or it is just not implemented
yet.
Thanks David
Confidentiality: This communication and any attachments are intended for
On 27 Oct 2017, at 19:24, Sean Owen
mailto:so...@cloudera.com>> wrote:
Certainly, Scala 2.12 support precedes Java 9 support. A lot of the work is in
place already, and the last issue is dealing with how Scala closures are now
implemented quite different with lambdas / invokedynamic. This affe
Hi,
I have a Spark Cluster running in client mode. I programmatically submit
jobs to spark cluster. Under the hood, I am using spark-submit.
If my cluster is overloaded and I start a context, the driver JVM keeps on
waiting for executors. The executors are in waiting state because cluster
does no
Hi Nick,
Both approaches worked and I realized my silly mistake too. Thank you so
much.
@Xu, thanks for the update.
Best,
Regards,
_
*Md. Rezaul Karim*, BSc, MSc
Researcher, INSIGHT Centre for Data Analytics
National University of Ireland, Galway
IDA Business
Yes I am working on this. Sorry for late, but I will try to submit PR ASAP.
Thanks!
On Mon, Oct 30, 2017 at 5:19 PM, Nick Pentreath
wrote:
> For now, you must follow this approach of constructing a pipeline
> consisting of a StringIndexer for each categorical column. See
> https://issues.apache.
For now, you must follow this approach of constructing a pipeline
consisting of a StringIndexer for each categorical column. See
https://issues.apache.org/jira/browse/SPARK-11215 for the related JIRA to
allow multiple columns for StringIndexer, which is being worked on
currently.
The reason you're
14 matches
Mail list logo