I've finally been able to pick this up again, after upgrading to Spark
1.4.1, because my code used the HiveContext, which runs fine in the REPL
(be it via Zeppelin or the shell) but won't work with spark-submit.
With 1.4.1, I hav actually managed to get a result with the Spark shell,
but after
3847
Hi Spark Users,
I am running some spark jobs which is running every hour.After running for
12 hours master is getting killed giving exception as
*java.lang.OutOfMemoryError: GC overhead limit exceeded*
It look like there is some memory issue in spark master.
Spark Master is blocker. Any one plea
Hi forum
I have just upgraded spark from 1.4.0 to 1.5.0 and am running my old (1.4.0)
jobs on 1.5.0 using 'spark://ip:7077' cluster URL. But the job does not seem
to start and errors out at server with below incompatible class exception
15/09/28 11:20:07 INFO Master: 10.0.0.195:34702 got disassoc
Is it possible to run FP-growth on stream data in its current versionor
a way around?
I mean is it possible to use/augment the old tree with the new incoming
data and find the new set of frequent patterns?
Thanks
You could try a couple of things
a) use Kafka for stream processing, store current incoming events and spark
streaming job ouput in Kafka rather than on HDFS and dual write to HDFS too
(in a micro batched mode), so every x minutes. Kafka is more suited to
processing lots of small events/
b) Coales
Hello Akhil,
I do not see how that would work for a YARN cluster mode execution
since the local directories used by the Spark executors and the Spark
driver are the local directories that are configured for YARN
(yarn.nodemanager.local-dirs). If you specify a different path with
SPARK_LOCAL_DIRS,
Hi All,
Would some expert help me some about the issue...
I shall appreciate you kind help very much!
Thank you!
Zhiliang
On Sunday, September 27, 2015 7:40 PM, Zhiliang Zhu
wrote:
Hi Alexis, Gavin,
Thanks very much for your kind comment.My spark command is :
spark-submit
Eugene,
SparkR RDD API is private for now
(https://issues.apache.org/jira/browse/SPARK-7230)
You can use SparkR::: prefix to access those private functions.
-Original Message-
From: Eugene Cao [mailto:eugene...@163.com]
Sent: Monday, September 28, 2015 8:02 AM
To: user@spark.apache.org
Error: no methods for 'textFile'
when I run the following 2nd command after SparkR initialized
sc <- sparkR.init(appName = "RwordCount")
lines <- textFile(sc, args[[1]])
But the following command works:
lines2 <- SparkR:::textFile(sc, "C:\\SelfStudy\\SPARK\\sentences2.txt")
In addition, it says
No, you would just have to do another select to pull out the fields you are
interested in.
On Sat, Sep 26, 2015 at 11:11 AM, Jerry Lam wrote:
> Hi Michael,
>
> Thanks for the tip. With dataframe, is it possible to explode some
> selected fields in each purchase_items?
> Since purchase_items is a
While case classes no longer have the 22-element limitation as of Scala
2.11, tuples are still limited to 22 elements. For various technical
reasons, this limitation probably won't be removed any time soon.
However, you can nest tuples, like case classes, in most contexts. So, the
last bit of your
I would suggest not to write small files to hdfs. rather you can hold them
in memory, maybe off heap. and then you may flush it to hdfs using another
job. similar to https://github.com/ptgoetz/storm-hdfs (not sure if spark
already has something like it)
On Sun, Sep 27, 2015 at 11:36 PM, wrote:
>
Hello,
I'm still investigating my small file generation problem generated by my Spark
Streaming jobs.
Indeed, my Spark Streaming jobs are receiving a lot of small events (avg 10kb),
and I have to store them inside HDFS in order to treat them by PIG jobs
on-demand.
The problem is the fact that I
Hi Alexis, Gavin,
Thanks very much for your kind comment.My spark command is :
spark-submit --class com.zyyx.spark.example.LinearRegression --master
yarn-client LinearRegression.jar
Both spark-shell and spark-submit will not run, all is hanging during the stage,
15/09/27 19:18:06 INFO yarn.Clie
14 matches
Mail list logo