hi Daniel,
Do you solve your problem?
I met the same problem when running massive data using reduceByKey on yarn.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-Spark-for-reduceByKey-on-on-massive-data-sets-tp5966p25023.html
Sent from the Apach
Instead of pushing your requests to the socket, why don't you push them to
a Kafka or any other message queue and use spark streaming to process them?
Thanks
Best Regards
On Mon, Oct 5, 2015 at 6:46 PM, wrote:
> Hi ,
> i am using Scala , doing a socket program to catch multiple requests at
> sa
Hi Rachana,
Are you by any chance saying something like this in your code
?
"sparkConf.setMaster("yarn-cluster");"
SparkContext is not supported with yarn-cluster mode.
I think you are hitting this bug -- >
https://issues.apache.org/jira/browse/SPARK-7504. This got fixed in
Spark-1.4.0,
I am running spark locally to understand how countByValueAndWindow works
val Array(brokers, topics) = Array("192.XX.X.XX:9092", "test1")
// Create context with 2 second batch interval
val sparkConf = new
SparkConf().setAppName("ReduceByWindowExample").setMaster("local[1,1]
Probably you have to read the source code, I am not sure if there are any .ppt
or slides.
Hao
From: VJ Anand [mailto:vjan...@sankia.com]
Sent: Monday, October 12, 2015 11:43 AM
To: Cheng, Hao
Cc: Raajay; user@spark.apache.org
Subject: Re: Join Order Optimization
Hi - Is there a design document
If the data is also on-demand, spark as back end is also good option..
Sent from Outlook Mail for Windows 10 phone
From: Akhil Das
Sent: Sunday, October 11, 2015 1:32 AM
To: unk1102
Cc: user@spark.apache.org
Subject: Re: Best practices to call small spark jobs as part of REST api
One approach
I am trying to submit a job using yarn-cluster mode using spark-submit command.
My code works fine when I use yarn-client mode.
Cloudera Version:
CDH-5.4.7-1.cdh5.4.7.p0.3
Command Submitted:
spark-submit --class "com.markmonitor.antifraud.ce.KafkaURLStreaming" \
--driver-java-options
"-Dlog4j
Hi - Is there a design document for those operations that have been
implemented in 1.4.0? if so,where can I find them
-VJ
On Sun, Oct 11, 2015 at 7:27 PM, Cheng, Hao wrote:
> Yes, I think the SPARK-2211 should be the right place to follow the CBO
> stuff, but probably that will not happen right
Yes, I think the SPARK-2211 should be the right place to follow the CBO stuff,
but probably that will not happen right away.
The jira issue introduce the statistic info can be found at:
https://issues.apache.org/jira/browse/SPARK-2393
Hao
From: Raajay [mailto:raaja...@gmail.com]
Sent: Monday, O
Hi Cheng,
Could you point me to the JIRA that introduced this change ?
Also, is this SPARK-2211 the right issue to follow for cost-based
optimization?
Thanks
Raajay
On Sun, Oct 11, 2015 at 7:57 PM, Cheng, Hao wrote:
> Spark SQL supports very basic join reordering optimization, based on the
Thank you Ted, that’s very informative; from the DB optimization point of view,
the Cost Base join re-ordering, and the multi-way joins does provide better
performance;
But from the API design point of view, 2 arguments (relation) for JOIN in the
DF API probably be enough for the multiple table
Spark SQL supports very basic join reordering optimization, based on the raw
table data size, this was added couple major releases back.
And the “EXPLAIN EXTENDED query” command is a very informative tool to verify
whether the optimization taking effect.
From: Raajay [mailto:raaja...@gmail.com]
Some weekend reading:
http://stackoverflow.com/questions/20022196/are-left-outer-joins-associative
Cheers
On Sun, Oct 11, 2015 at 5:32 PM, Cheng, Hao wrote:
> A join B join C === (A join B) join C
>
> Semantically they are equivalent, right?
>
>
>
> *From:* Richard Eggert [mailto:richard.egg...
A join B join C === (A join B) join C
Semantically they are equivalent, right?
From: Richard Eggert [mailto:richard.egg...@gmail.com]
Sent: Monday, October 12, 2015 5:12 AM
To: Subhajit Purkayastha
Cc: User
Subject: Re: Saprk 1.5 - How to join 3 RDDs in a SQL DF?
It's the same as joining 2. Join
One option is you can read the data via JDBC, however, probably it's the worst
option, as you probably need some hacky work to enable the parallel reading in
Spark SQL.
Another option is copy the hive-site.xml of your Hive Server to
$SPARK_HOME/conf, then Spark SQL will see everything that Hive
Hi,
I have created a python UDF to make an API which requires an expirying OAuth
token which requires refreshing every 600 seconds which is longer than any
given stage.
Due to the nature of threads and local state, if I use a global variable,
the variable goes out of scope regularly.
I look int
It's the same as joining 2. Join two together, and then join the third one
to the result of that.
On Oct 11, 2015 2:57 PM, "Subhajit Purkayastha" wrote:
> Can I join 3 different RDDs together in a Spark SQL DF? I can find
> examples for 2 RDDs but not 3.
>
>
>
> Thanks
>
>
>
In our case, we do not actually need partition inference so the workaround
was easy -- instead of using the path as rootpath/batch_id=333/... we
changed the paths to rootpath/333/ This works for us because we compute
the set of HDFS paths manually and then register a dataframe into a
SQLContex
Can I join 3 different RDDs together in a Spark SQL DF? I can find examples
for 2 RDDs but not 3.
Thanks
Simplest approach would be to push the streaming data (after the
computations) to a SQL-Like DB and then let your visualization piece pull
it from the DB. Another approach would be to make your visualization piece
a web-socket (If you are using D3JS etc) and then from your streaming
application you
Hi
how can we read data from external hive server. Hive server is running and I
want to read data remotely using spark. is there any example ?
thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Hive-with-apache-spark-tp25020.html
Sent from the Apache
Did you try setting the SPARK_LOCAL_IP in the conf/spark-env.sh file on
each node?
Thanks
Best Regards
On Fri, Oct 2, 2015 at 4:18 AM, markluk wrote:
> I'm running a standalone Spark cluster of 1 master and 2 slaves.
>
> My slaves file under /conf list the fully qualified domain name of the 2
>
Hello all,
I have a Spark Streaming job that run and produce results successfully.
However, after a few days the job stop producing any output. I can see the
job is still running ( polling data from Flume, completing jobs and it's
subtasks) however, it is failing to produce any output. I have to r
One approach would be to make your spark job push the computed results
(json) to a database and your reset server can pull it from there and power
the UI.
Thanks
Best Regards
On Wed, Sep 30, 2015 at 12:26 AM, unk1102 wrote:
> Hi I would like to know any best practices to call spark jobs in rest
It turns out the mesos can overwrite the OS ulimit -n setting. So we have
increased the mesos slave ulimit -n setting.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Too-many-open-files-exception-on-reduceByKey-tp2462p25019.html
Sent from the Apache Spark U
25 matches
Mail list logo