I have a key-value RDD, key is a timestamp (femto-second resolution, so
grouping buys me nothing) and I want to reduce it in the chronological
order.
How do I do that in spark?
I am fine with reducing contiguous sections of the set separately and then
aggregating the resulting objects locally.
T
Hi,
After restarting Spark HistoryServer, it failed to come up, I checked
logs for Spark HistoryServer found following messages :'
2015-05-21 11:38:03,790 WARN org.apache.spark.scheduler.ReplayListenerBus:
Log path provided contains no log files.
2015-05-21 11:38:52,319 INFO org.apache.spark.
This got resolved after cleaning "/user/spark/applicationHistory/*"
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-HistoryServer-not-coming-up-tp22975p22981.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---
hi,
Suddenly spark jobs started failing with following error
Exception in thread "main" java.io.FileNotFoundException:
/user/spark/applicationHistory/application_1432824195832_1275.inprogress (No
such file or directory)
full trace here
[21:50:04 x...@hadoop-client01.dev:~]$ spark-submit --clas
Hi,
Is there a way to get Yarn application ID inside spark application, when
running spark Job on YARN ?
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Yarn-application-ID-for-Spark-job-on-Yarn-tp23429.html
Sent from the Apache Spark User List ma
Hi,
Our spark job on yarn suddenly started failing silently without showing
any error following is the trace.
Using properties file: /usr/lib/spark/conf/spark-defaults.conf
Adding default property:
spark.serializer=org.apache.spark.serializer.KryoSerializer
Adding default property:
spark.exec
Hi,
suddenly our spark job on yarn started failing silently without showing
any error, following is the trace in verbose mode
Using properties file: /usr/lib/spark/conf/spark-defaults.conf
Adding default property:
spark.serializer=org.apache.spark.serializer.KryoSerializer
Adding default pr
I am getting following error for simple spark job
I am running following command
/spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode
cluster --master yarn
/opt/cloudera/parcels/CDH/lib/spark/lib/spark-examples-1.2.0-cdh5.3.1-hadoop2.5.0-cdh5.3.1.jar/
but job doesn't show any p
we tried "--master yarn-client" with no different result.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/The-auxService-spark-shuffle-does-not-exist-tp23662p23689.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---
Hi we are using CDH 5.4.0 with Spark 1.5.2 (doesn't come with CDH 5.4.0)
I am following this link
https://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html to
trying to test/create new algorithm with mahout item-similarity.
I am running following command
./bin/mahout sp
Hi,
We are running Spark 1.3 on CDH 5.4.1 on top of YARN. we want to know how
do we control task timeout when node fails and task running on it should be
restarted on another node. at present job wait for approximately 10 min to
restart the task were running on failed node.
http://spark.apache.
Hi,
We have python2.6 (default) on cluster and also we have installed
python2.7.
I was looking a way to set python version in spark-submit.
anyone know how to do this ?
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/python-version-in-spark-submi
Hi,
Is there any way to make spark driver to run in side YARN containers
rather than gateway/client machine.
At present even with config parameters --master yarn & --deploy-mode
cluster driver runs on gateway/client machine.
We are on CDH 5.4.1 with YARN and Spark 1.3
any help on this ?
Th
one know is this is even possible ?
Thanks...
Roy
Hi,
I am using CDH 5.3.2 packages installation through Cloudera Manager 5.3.2
I am trying to run one spark job with following command
PYTHONPATH=~/code/utils/ spark-submit --master yarn --executor-memory 3G
--num-executors 30 --driver-memory 2G --executor-cores 2 --name=analytics
/home/abc/co
...*
*Roy*
On Mon, Mar 23, 2015 at 12:13 PM, Ted Yu wrote:
> InputSplit is in hadoop-mapreduce-client-core jar
>
> Please check that the jar is in your classpath.
>
> Cheers
>
> On Mon, Mar 23, 2015 at 8:10 AM, , Roy wrote:
>
>> Hi,
>>
>>
>> I
thanks
roy
a *netstat -pnat | grep 404* *And see what all processes are
> running.
>
> Thanks
> Best Regards
>
> On Wed, Mar 25, 2015 at 1:13 AM, , Roy wrote:
>
>> I get following message for each time I run spark job
>>
>>
>>1. 15/03/24 15:35:56 WARN Abstr
We have Spark on YARN, with Cloudera Manager 5.3.2 and CDH 5.3.2
Jobs link on spark History server doesn't open and shows following message
:
HTTP ERROR: 500
Problem accessing /history/application_1425934191900_87572. Reason:
Server Error
--
*Powered by Jetty:/
-project.guava.common.cache.LocalCache$Segment.get(LocalCache.java:2261)
at
org.spark-project.guava.common.cache.LocalCache.get(LocalCache.java:4000)
thanks
On Thu, Mar 26, 2015 at 7:27 PM, , Roy wrote:
> We have Spark on YARN, with Cloudera Manager 5.3.2 and CDH 5.3.2
>
> Jobs link on spar
use zip
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/can-t-union-two-rdds-tp22320p22321.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-
How do I build Spark SQL Avro Library for Spark 1.2 ?
I was following this https://github.com/databricks/spark-avro and was able
to build spark-avro_2.10-1.0.0.jar by simply running sbt/sbt package from
the project root.
but we are on Spark 1.2 and need compatible spark-avro jar.
Any idea how do
Hi,
We have cluster running on CDH 5.3.2 and Spark 1.2 (Which is current
version in CDH5.3.2), But We want to try Spark 1.3 without breaking existing
setup, so is it possible to have Spark 1.3 on existing setup ?
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.
Hi,
How do i get spark job progress-style report on console ?
I tried to set --conf spark.ui.showConsoleProgress=true but it
thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/spark-job-progress-style-report-on-console-tp22440.html
Sent from the Ap
Hi,
My spark job is failing with following error message
org.apache.spark.shuffle.FetchFailedException:
/mnt/ephemeral12/yarn/nm/usercache/abc/appcache/application_1429353954024_1691/spark-local-20150418132335-0723/28/shuffle_3_1_0.index
(No such file or directory)
at
org.apache.spark.s
Hi,
I have recently enable log4j.rootCategory=WARN, console in spark
configuration. but after that spark.logConf=True has becomes ineffective.
So just want to confirm if this is because log4j.rootCategory=WARN ?
Thanks
--
View this message in context:
http://apache-spark-user-list.100
Hi,
When we start spark job it start new HTTP server for each new job.
Is it possible to disable HTTP server for each job ?
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Possible-to-disable-Spark-HTTP-server-tp22772.html
Sent from the Apache Sp
I want to load hbase table into spark.
JavaPairRDD hBaseRDD =
sc.newAPIHadoopRDD(conf, TableInputFormat.class,
ImmutableBytesWritable.class, Result.class);
*when call hBaseRDD.count(),got error.*
Caused by: java.lang.IllegalStateException: The input format instance has
not been properly initiali
The problem statement and an approach to solve it using windows is
described here:
https://stackoverflow.com/questions/52509498/given-events-with-start-and-end-times-how-to-count-the-number-of-simultaneous-e
Looking for more elegant/performant solutions, if they exist. TIA !
-of-join-of-two-datasets-in-apache-spark
2. Snapshot of state with time to state with effective start and end
time:
https://stackoverflow.com/questions/53928372/given-dataset-of-state-snapshots-at-time-t-how-to-transform-it-into-dataset-with/53928400#53928400
Thanks in advance!
Roy
Thanks all, and Matei.
TL;DR of the conclusion for my particular case:
Qualitatively, while Catalyst[1] tries to mitigate learning curve and
maintenance burden, it lacks the dynamic programming approach used by
Calcite[2] and risks falling into local minima.
Quantitatively, there is no reproducibl
> also can improve the existing CBO and make it more general. The paper of
> Spark SQL was published 5 years ago. A lot of great contributions were made
> in the past 5 years.
>
> Cheers,
>
> Xiao
>
> Debajyoti Roy 于2020年1月15日周三 上午9:23写道:
>
>> Thanks all, and
Hi, anyone knows the behavior of dropping managed tables in case of
external hive meta store:
Deletion of the data (e.g. from object store) happens from Spark sql or,
the external hive metastore ?
Confused by local mode and remote mode codes.
[cid:image004.jpg@01D04629.1F451950] [cid:image005.png@01D04629.1F451950]
Hi guys~
Comparing these two architectures, why BDAS put Yarn and Mesos under the HDFS,
do you have any special consideration? Or just easy to express the AMPLab stack?
Best regards!
unsubscribe
Hi,
Is there any plan to remove the limitation mentioned below?
*Streaming aggregation doesn't support group aggregate pandas UDF *
We want to run our data modelling jobs real time using Spark 3.0 and kafka
2.4 and need to have support for custom aggregate pandas UDF on stream
windows.
Is there
36 matches
Mail list logo