from:"Anfernee Xu"

How to debug

2016-01-26 Thread Anfernee Xu

Hi, I'm using Spark 1.5.0, I wrote a custom Hadoop InputFormat to load data from 3rdparty datasource, the data type mapping has been taken care of in my code, but when I issued below query, SELECT * FROM ( SELECT count(*) as failures from test WHERE state != 'success' ) as tmp WHERE ( COALESCE(f

How to debug ClassCastException: java.lang.String cannot be cast to java.lang.Long in SparkSQL

2016-01-26 Thread Anfernee Xu

Hi, I'm using Spark 1.5.0, I wrote a custom Hadoop InputFormat to load data from 3rdparty datasource, the data type mapping has been taken care of in my code, but when I issued below query, SELECT * FROM ( SELECT count(*) as failures from test WHERE state != 'success' ) as tmp WHERE ( COALESCE(f

Application not found in Spark historyserver in yarn-client mode

2015-10-14 Thread Anfernee Xu

Hi, Here's the problem I'm facing, I have a standalone java application which is periodically submit Spark jobs to my yarn cluster, btw I'm not using 'spark-submit' or 'org.apache.spark.launcher' to submit my jobs. These jobs are successful and I can see them on Yarn RM webUI, but when I want to f

[no subject]

2015-10-15 Thread Anfernee Xu

Sorry, I have to re-send it again as I did not get the answer. Here's the problem I'm facing, I have a standalone java application which is periodically submit Spark jobs to my yarn cluster, btw I'm not using 'spark-submit' or 'org.apache.spark.launcher' to submit my jobs. These jobs are successfu

Application not found in Spark historyserver in yarn-client mode

2015-10-15 Thread Anfernee Xu

Sorry, I have to re-send it again as I did not get the answer. Here's the problem I'm facing, I'm using Spark 1.5.0 release, I have a standalone java application which is periodically submit Spark jobs to my yarn cluster, btw I'm not using 'spark-submit' or 'org.apache.spark.launcher' to submit my

Spark Streaming: how to StreamingContext.queueStream

2015-10-23 Thread Anfernee Xu

Hi, Here's my situation, I have some kind of offline dataset, but I want to form a virtual data stream feeding to Spark Streaming, my code looks like this // sort offline data by time 1) JavaRDD sortedByTime = offlineDataRDD.sortBy( ); // compute a list of JavaRDD, each element JavaRDD

Spark Streaming: how to use StreamingContext.queueStream with existing RDD

2015-10-26 Thread Anfernee Xu

Hi, Here's my situation, I have some kind of offline dataset and got them loaded them into Spark as RDD, but I want to form a virtual data stream feeding to Spark Streaming, my code looks like this // sort offline data by time, the dataset spans 2 hours 1) JavaRDD sortedByTime = offlineData

SparkSQL: What is the cost of DataFrame.registerTempTable(String)? Can I have multiple tables referencing to the same DataFrame?

2015-10-28 Thread Anfernee Xu

Hi, I just want to understand the cost of DataFrame.registerTempTable(String), is it just a trivial operation(like creating a object reference) in master(Driver) JVM? And Can I have multiple tables with different name referencing to the same DataFrame? Thanks -- --Anfernee

RDD's filter() or using 'where' condition in SparkSQL

2015-10-29 Thread Anfernee Xu

Hi, I have a pretty large data set(2M entities) in my RDD, the data has already been partitioned by a specific key, the key has a range(type in long), now I want to create a bunch of key buckets, for example, the key has range 1 -> 100, I will break the whole range into below buckets 1

Re: RDD's filter() or using 'where' condition in SparkSQL

2015-10-29 Thread Anfernee Xu

Thanks Yong for your response. Let me see if I can understand what you're suggesting, so the whole data set, when I load them into Spark(I'm using custom Hadoop InputFormat), I will add an extra field to each element in RDD, like bucket_id. For example Key: 1 - 10, bucket_id=1 11-20, bucket_id=

Millions of entities in custom Hadoop InputFormat and broadcast variable

2015-11-26 Thread Anfernee Xu

Hi Spark experts, First of all, happy Thanksgiving! The comes to my question, I have implemented custom Hadoop InputFormat to load millions of entities from my data source to Spark(as JavaRDD and transform to DataFrame). The approach I took in implementing the custom Hadoop RDD is loading all ID'

Question about yarn-cluster mode and spark.driver.allowMultipleContexts

2015-12-01 Thread Anfernee Xu

Hi, I have a doubt regarding yarn-cluster mode and spark.driver. allowMultipleContexts for below usercases. I have a long running backend server where I will create a short-lived Spark job in response to each user request, base on the fact that by default multiple Spark Context cannot be created

Re: Question about yarn-cluster mode and spark.driver.allowMultipleContexts

2015-12-01 Thread Anfernee Xu

ultipleContexts", "true") > .set("spark.driver.allowMultipleContexts", "true")) > ./core/src/test/scala/org/apache/spark/SparkContextSuite.scala > > FYI > > On Tue, Dec 1, 2015 at 3:32 PM, Anfernee Xu wrote: > >> Hi, >> >> I have

Re: Question about yarn-cluster mode and spark.driver.allowMultipleContexts

2015-12-01 Thread Anfernee Xu

PM, Anfernee Xu wrote: > > I have a long running backend server where I will create a short-lived > Spark > > job in response to each user request, base on the fact that by default > > multiple Spark Context cannot be created in the same JVM, looks like I > have > > 2 choic

Re: is Multiple Spark Contexts is supported in spark 1.5.0 ?

2015-12-04 Thread Anfernee Xu

If multiple users are looking at the same data set, then it's good choice to share the SparkContext. But my usercases are different, users are looking at different data(I use custom Hadoop InputFormat to load data from my data source based on the user input), the data might not have any overlap. F

Custom Hadoop InputSplit, Spark partitions, spark executors/task and Yarn containers

2015-09-23 Thread Anfernee Xu

Hi Spark experts, I'm coming across these terminologies and having some confusions, could you please help me understand them better? For instance I have implemented a Hadoop InputFormat to load my external data in Spark, in turn my custom InputFormat will create a bunch of InputSplit's, my questi

How to debug

How to debug ClassCastException: java.lang.String cannot be cast to java.lang.Long in SparkSQL

Application not found in Spark historyserver in yarn-client mode

[no subject]

Application not found in Spark historyserver in yarn-client mode

Spark Streaming: how to StreamingContext.queueStream

Spark Streaming: how to use StreamingContext.queueStream with existing RDD

SparkSQL: What is the cost of DataFrame.registerTempTable(String)? Can I have multiple tables referencing to the same DataFrame?

RDD's filter() or using 'where' condition in SparkSQL

Re: RDD's filter() or using 'where' condition in SparkSQL

Millions of entities in custom Hadoop InputFormat and broadcast variable

Question about yarn-cluster mode and spark.driver.allowMultipleContexts

Re: Question about yarn-cluster mode and spark.driver.allowMultipleContexts

Re: Question about yarn-cluster mode and spark.driver.allowMultipleContexts

Re: is Multiple Spark Contexts is supported in spark 1.5.0 ?

Custom Hadoop InputSplit, Spark partitions, spark executors/task and Yarn containers

16 matches

Site Navigation

Mail list logo

Footer information