from:", Roy"

How to process data in chronological order

2015-05-20 Thread roy

I have a key-value RDD, key is a timestamp (femto-second resolution, so grouping buys me nothing) and I want to reduce it in the chronological order. How do I do that in spark? I am fine with reducing contiguous sections of the set separately and then aggregating the resulting objects locally. T

Spark HistoryServer not coming up

2015-05-21 Thread roy

Hi, After restarting Spark HistoryServer, it failed to come up, I checked logs for Spark HistoryServer found following messages :' 2015-05-21 11:38:03,790 WARN org.apache.spark.scheduler.ReplayListenerBus: Log path provided contains no log files. 2015-05-21 11:38:52,319 INFO org.apache.spark.

Re: Spark HistoryServer not coming up

2015-05-21 Thread roy

This got resolved after cleaning "/user/spark/applicationHistory/*" -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-HistoryServer-not-coming-up-tp22975p22981.html Sent from the Apache Spark User List mailing list archive at Nabble.com. ---

spark java.io.FileNotFoundException: /user/spark/applicationHistory/application

2015-05-28 Thread roy

hi, Suddenly spark jobs started failing with following error Exception in thread "main" java.io.FileNotFoundException: /user/spark/applicationHistory/application_1432824195832_1275.inprogress (No such file or directory) full trace here [21:50:04 x...@hadoop-client01.dev:~]$ spark-submit --clas

Yarn application ID for Spark job on Yarn

2015-06-22 Thread roy

Hi, Is there a way to get Yarn application ID inside spark application, when running spark Job on YARN ? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Yarn-application-ID-for-Spark-job-on-Yarn-tp23429.html Sent from the Apache Spark User List ma

Spark job fails silently

2015-06-22 Thread roy

Hi, Our spark job on yarn suddenly started failing silently without showing any error following is the trace. Using properties file: /usr/lib/spark/conf/spark-defaults.conf Adding default property: spark.serializer=org.apache.spark.serializer.KryoSerializer Adding default property: spark.exec

spark on yarn failing silently

2015-06-22 Thread roy

Hi, suddenly our spark job on yarn started failing silently without showing any error, following is the trace in verbose mode Using properties file: /usr/lib/spark/conf/spark-defaults.conf Adding default property: spark.serializer=org.apache.spark.serializer.KryoSerializer Adding default pr

The auxService:spark_shuffle does not exist

2015-07-06 Thread roy

I am getting following error for simple spark job I am running following command /spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode cluster --master yarn /opt/cloudera/parcels/CDH/lib/spark/lib/spark-examples-1.2.0-cdh5.3.1-hadoop2.5.0-cdh5.3.1.jar/ but job doesn't show any p

Re: The auxService:spark_shuffle does not exist

2015-07-07 Thread roy

we tried "--master yarn-client" with no different result. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/The-auxService-spark-shuffle-does-not-exist-tp23662p23689.html Sent from the Apache Spark User List mailing list archive at Nabble.com. ---

spark-itemsimilarity No FileSystem for scheme error

2016-01-05 Thread roy

Hi we are using CDH 5.4.0 with Spark 1.5.2 (doesn't come with CDH 5.4.0) I am following this link https://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html to trying to test/create new algorithm with mahout item-similarity. I am running following command ./bin/mahout sp

how to control timeout in node failure for spark task ?

2015-09-25 Thread roy

Hi, We are running Spark 1.3 on CDH 5.4.1 on top of YARN. we want to know how do we control task timeout when node fails and task running on it should be restarted on another node. at present job wait for approximately 10 min to restart the task were running on failed node. http://spark.apache.

python version in spark-submit

2015-10-01 Thread roy

Hi, We have python2.6 (default) on cluster and also we have installed python2.7. I was looking a way to set python version in spark-submit. anyone know how to do this ? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/python-version-in-spark-submi

pyspark driver in cluster rather than gateway/client

2015-09-10 Thread roy

Hi, Is there any way to make spark driver to run in side YARN containers rather than gateway/client machine. At present even with config parameters --master yarn & --deploy-mode cluster driver runs on gateway/client machine. We are on CDH 5.4.1 with YARN and Spark 1.3 any help on this ? Th

spark-submit config via file

2017-03-24 Thread , Roy

one know is this is even possible ? Thanks... Roy

Spark error NoClassDefFoundError: org/apache/hadoop/mapred/InputSplit

2015-03-23 Thread , Roy

Hi, I am using CDH 5.3.2 packages installation through Cloudera Manager 5.3.2 I am trying to run one spark job with following command PYTHONPATH=~/code/utils/ spark-submit --master yarn --executor-memory 3G --num-executors 30 --driver-memory 2G --executor-cores 2 --name=analytics /home/abc/co

Re: Spark error NoClassDefFoundError: org/apache/hadoop/mapred/InputSplit

2015-03-23 Thread , Roy

...* *Roy* On Mon, Mar 23, 2015 at 12:13 PM, Ted Yu wrote: > InputSplit is in hadoop-mapreduce-client-core jar > > Please check that the jar is in your classpath. > > Cheers > > On Mon, Mar 23, 2015 at 8:10 AM, , Roy wrote: > >> Hi, >> >> >> I

FAILED SelectChannelConnector@0.0.0.0:4040 java.net.BindException: Address already in use

2015-03-24 Thread , Roy

thanks roy

Re: FAILED SelectChannelConnector@0.0.0.0:4040 java.net.BindException: Address already in use

2015-03-25 Thread , Roy

a *netstat -pnat | grep 404* *And see what all processes are > running. > > Thanks > Best Regards > > On Wed, Mar 25, 2015 at 1:13 AM, , Roy wrote: > >> I get following message for each time I run spark job >> >> >>1. 15/03/24 15:35:56 WARN Abstr

Spark History Server : jobs link doesn't open

2015-03-26 Thread , Roy

We have Spark on YARN, with Cloudera Manager 5.3.2 and CDH 5.3.2 Jobs link on spark History server doesn't open and shows following message : HTTP ERROR: 500 Problem accessing /history/application_1425934191900_87572. Reason: Server Error -- *Powered by Jetty:/

Re: Spark History Server : jobs link doesn't open

2015-03-26 Thread , Roy

-project.guava.common.cache.LocalCache$Segment.get(LocalCache.java:2261) at org.spark-project.guava.common.cache.LocalCache.get(LocalCache.java:4000) thanks On Thu, Mar 26, 2015 at 7:27 PM, , Roy wrote: > We have Spark on YARN, with Cloudera Manager 5.3.2 and CDH 5.3.2 > > Jobs link on spar

Re: can't union two rdds

2015-03-31 Thread roy

use zip -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/can-t-union-two-rdds-tp22320p22321.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-

Spark SQL Avro Library for 1.2

2015-04-08 Thread roy

How do I build Spark SQL Avro Library for Spark 1.2 ? I was following this https://github.com/databricks/spark-avro and was able to build spark-avro_2.10-1.0.0.jar by simply running sbt/sbt package from the project root. but we are on Spark 1.2 and need compatible spark-avro jar. Any idea how do

Spark 1.3 on CDH 5.3.1 YARN

2015-04-08 Thread roy

Hi, We have cluster running on CDH 5.3.2 and Spark 1.2 (Which is current version in CDH5.3.2), But We want to try Spark 1.3 without breaking existing setup, so is it possible to have Spark 1.3 on existing setup ? Thanks -- View this message in context: http://apache-spark-user-list.1001560.

spark job progress-style report on console ?

2015-04-09 Thread roy

Hi, How do i get spark job progress-style report on console ? I tried to set --conf spark.ui.showConsoleProgress=true but it thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-job-progress-style-report-on-console-tp22440.html Sent from the Ap

shuffle.FetchFailedException in spark on YARN job

2015-04-18 Thread roy

Hi, My spark job is failing with following error message org.apache.spark.shuffle.FetchFailedException: /mnt/ephemeral12/yarn/nm/usercache/abc/appcache/application_1429353954024_1691/spark-local-20150418132335-0723/28/shuffle_3_1_0.index (No such file or directory) at org.apache.spark.s

spark.logConf with log4j.rootCategory=WARN

2015-05-01 Thread roy

Hi, I have recently enable log4j.rootCategory=WARN, console in spark configuration. but after that spark.logConf=True has becomes ineffective. So just want to confirm if this is because log4j.rootCategory=WARN ? Thanks -- View this message in context: http://apache-spark-user-list.100

Possible to disable Spark HTTP server ?

2015-05-05 Thread roy

Hi, When we start spark job it start new HTTP server for each new job. Is it possible to disable HTTP server for each job ? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Possible-to-disable-Spark-HTTP-server-tp22772.html Sent from the Apache Sp

Error in load hbase on spark

2015-10-08 Thread Roy Wang

I want to load hbase table into spark. JavaPairRDD hBaseRDD = sc.newAPIHadoopRDD(conf, TableInputFormat.class, ImmutableBytesWritable.class, Result.class); *when call hBaseRDD.count(),got error.* Caused by: java.lang.IllegalStateException: The input format instance has not been properly initiali

Given events with start and end times, how to count the number of simultaneous events using Spark?

2018-09-26 Thread Debajyoti Roy

The problem statement and an approach to solve it using windows is described here: https://stackoverflow.com/questions/52509498/given-events-with-start-and-end-times-how-to-count-the-number-of-simultaneous-e Looking for more elegant/performant solutions, if they exist. TIA !

Spark Dataset transformations for time based events

2018-12-25 Thread Debajyoti Roy

-of-join-of-two-datasets-in-apache-spark 2. Snapshot of state with time to state with effective start and end time: https://stackoverflow.com/questions/53928372/given-dataset-of-state-snapshots-at-time-t-how-to-transform-it-into-dataset-with/53928400#53928400 Thanks in advance! Roy

Re: Why Apache Spark doesn't use Calcite?

2020-01-15 Thread Debajyoti Roy

Thanks all, and Matei. TL;DR of the conclusion for my particular case: Qualitatively, while Catalyst[1] tries to mitigate learning curve and maintenance burden, it lacks the dynamic programming approach used by Calcite[2] and risks falling into local minima. Quantitatively, there is no reproducibl

Re: Why Apache Spark doesn't use Calcite?

2020-01-15 Thread Debajyoti Roy

> also can improve the existing CBO and make it more general. The paper of > Spark SQL was published 5 years ago. A lot of great contributions were made > in the past 5 years. > > Cheers, > > Xiao > > Debajyoti Roy 于2020年1月15日周三上午9:23写道： > >> Thanks all, and

External hive metastore (remote) managed tables

2020-05-28 Thread Debajyoti Roy

Hi, anyone knows the behavior of dropping managed tables in case of external hive meta store: Deletion of the data (e.g. from object store) happens from Spark sql or, the external hive metastore ? Confused by local mode and remote mode codes.

What do you think about the level of resource manager and file system?

2015-02-11 Thread Fangqi (Roy)

[cid:image004.jpg@01D04629.1F451950] [cid:image005.png@01D04629.1F451950] Hi guys~ Comparing these two architectures, why BDAS put Yarn and Mesos under the HDFS, do you have any special consideration? Or just easy to express the AMPLab stack? Best regards!

unsubscribe

2014-05-05 Thread Shubhabrata Roy

unsubscribe

Support for group aggregate pandas UDF in streaming aggregation for SPARK 3.0 python

2020-08-11 Thread Aesha Dhar Roy

Hi, Is there any plan to remove the limitation mentioned below? *Streaming aggregation doesn't support group aggregate pandas UDF * We want to run our data modelling jobs real time using Spark 3.0 and kafka 2.4 and need to have support for custom aggregate pandas UDF on stream windows. Is there

How to process data in chronological order

Spark HistoryServer not coming up

Re: Spark HistoryServer not coming up

spark java.io.FileNotFoundException: /user/spark/applicationHistory/application

Yarn application ID for Spark job on Yarn

Spark job fails silently

spark on yarn failing silently

The auxService:spark_shuffle does not exist

Re: The auxService:spark_shuffle does not exist

spark-itemsimilarity No FileSystem for scheme error

how to control timeout in node failure for spark task ?

python version in spark-submit

pyspark driver in cluster rather than gateway/client

spark-submit config via file

Spark error NoClassDefFoundError: org/apache/hadoop/mapred/InputSplit

Re: Spark error NoClassDefFoundError: org/apache/hadoop/mapred/InputSplit

FAILED SelectChannelConnector@0.0.0.0:4040 java.net.BindException: Address already in use

Re: FAILED SelectChannelConnector@0.0.0.0:4040 java.net.BindException: Address already in use

Spark History Server : jobs link doesn't open

Re: Spark History Server : jobs link doesn't open

Re: can't union two rdds

Spark SQL Avro Library for 1.2

Spark 1.3 on CDH 5.3.1 YARN

spark job progress-style report on console ?

shuffle.FetchFailedException in spark on YARN job

spark.logConf with log4j.rootCategory=WARN

Possible to disable Spark HTTP server ?

Error in load hbase on spark

Given events with start and end times, how to count the number of simultaneous events using Spark?

Spark Dataset transformations for time based events

Re: Why Apache Spark doesn't use Calcite?

Re: Why Apache Spark doesn't use Calcite?

External hive metastore (remote) managed tables

What do you think about the level of resource manager and file system?

unsubscribe

Support for group aggregate pandas UDF in streaming aggregation for SPARK 3.0 python

36 matches

Site Navigation

Mail list logo

Footer information