date:20151027

Re: Spark Implementation of XGBoost

2015-10-27 Thread DB Tsai

Hi Meihua, For categorical features, the ordinal issue can be solved by trying all kind of different partitions 2^(q-1) -1 for q values into two groups. However, it's computational expensive. In Hastie's book, in 9.2.4, the trees can be trained by sorting the residuals and being learnt as if they

Re: [VOTE] Release Apache Spark 1.5.2 (RC1)

2015-10-27 Thread Sean Owen

Ah, good point. I also see it still reads 1.5.1. I imagine we just need another sweep to update all the version strings. On Tue, Oct 27, 2015 at 3:08 AM, Krishna Sankar wrote: > Guys, >The sc.version returns 1.5.1 in python and scala. Is anyone getting the > same results ? Probably I am doin

Re: [VOTE] Release Apache Spark 1.5.2 (RC1)

2015-10-27 Thread Reynold Xin

Yup looks like I missed that. I will build a new one. On Tuesday, October 27, 2015, Sean Owen wrote: > Ah, good point. I also see it still reads 1.5.1. I imagine we just need > another sweep to update all the version strings. > > On Tue, Oct 27, 2015 at 3:08 AM, Krishna Sankar > wrote: > >> Guy

Exception when using some aggregate operators

2015-10-27 Thread Shagun Sodhani

Hi! I was trying out some aggregate functions in SparkSql and I noticed that certain aggregate operators are not working. This includes: approxCountDistinct countDistinct mean sumDistinct For example using countDistinct results in an error saying *Exception in thread "main" org.apache.spark.sql.

Re: Exception when using some aggregate operators

2015-10-27 Thread Shagun Sodhani

Oops seems I made a mistake. The error message is : Exception in thread "main" org.apache.spark.sql.AnalysisException: undefined function countDistinct On 27 Oct 2015 15:49, "Shagun Sodhani" wrote: > Hi! I was trying out some aggregate functions in SparkSql and I noticed > that certain aggregate

Re: Exception when using some aggregate operators

2015-10-27 Thread Reynold Xin

Try count(distinct columnane) In SQL distinct is not part of the function name. On Tuesday, October 27, 2015, Shagun Sodhani wrote: > Oops seems I made a mistake. The error message is : Exception in thread > "main" org.apache.spark.sql.AnalysisException: undefined function > countDistinct > On

Re: Exception when using some aggregate operators

2015-10-27 Thread Shagun Sodhani

Will try in a while when I get back. I assume this applies to all functions other than mean. Also countDistinct is defined along with all other SQL functions. So I don't get "distinct is not part of function name" part. On 27 Oct 2015 19:58, "Reynold Xin" wrote: > Try > > count(distinct columnane

Re: Exception when using some aggregate operators

2015-10-27 Thread Shagun Sodhani

So I tried @Reynold's suggestion. I could get countDistinct and sumDistinct running but mean and approxCountDistinct do not work. (I guess I am using the wrong syntax for approxCountDistinct) For mean, I think the registry entry is missing. Can someone clarify that as well? On Tue, Oct 27, 2015 a

Re: If you use Spark 1.5 and disabled Tungsten mode ...

2015-10-27 Thread Sjoerd Mulder

I have disabled it because of it started generating ERROR's when upgrading from Spark 1.4 to 1.5.1 2015-10-27T20:50:11.574+0100 ERROR TungstenSort.newOrdering() - Failed to generate ordering, fallback to interpreted java.util.concurrent.ExecutionException: java.lang.Exception: failed to compile: o

Re: If you use Spark 1.5 and disabled Tungsten mode ...

2015-10-27 Thread Josh Rosen

Hi Sjoerd, Did your job actually *fail* or did it just generate many spurious exceptions? While the stacktrace that you posted does indicate a bug, I don't think that it should have stopped query execution because Spark should have fallen back to an interpreted code path (note the "Failed to gener

Re: If you use Spark 1.5 and disabled Tungsten mode ...

2015-10-27 Thread Sjoerd Mulder

No the job actually doesn't fail, but since our tests is generating all these stacktraces i have disabled the tungsten mode just to be sure (and don't have gazilion stacktraces in production). 2015-10-27 20:59 GMT+01:00 Josh Rosen : > Hi Sjoerd, > > Did your job actually *fail* or did it just gen

Pickle Spark DataFrame

2015-10-27 Thread agg212

Hi, I'd like to "pickle" a Spark DataFrame object and have tried the following: import pickle data = sparkContext.jsonFile(data_file) #load file with open('out.pickle', 'wb') as handle: pickle.dump(data, handle) If I convert "data" to a Pandas DataFrame (e.g.,

Re: Spark.Executor.Cores question

2015-10-27 Thread Richard Marscher

Hi Mark, if you know your cluster's number of workers and cores per worker you can set this up when you create a SparkContext and shouldn't need to tinker with the 'spark.executor.cores' setting. That setting is for running multiple executors per application per worker, which you are saying you do

Re: Spark.Executor.Cores question

2015-10-27 Thread mkhaitman

Hi Richard, Thanks for the response. I should have added that the specific case where this becomes a problem is when one of the executors for that application is lost/killed prematurely, and the application attempts to spawn up a new executor without consideration as to whether an executor alrea

Re: Spark.Executor.Cores question

2015-10-27 Thread Richard Marscher

Ah I see, that's a bit more complicated =). If it's possible, would using `spark.executor.memory` to set the available worker memory used by executors help alleviate the problem of running on a node that already has an executor on it? I would assume that would have a constant worst case overhead pe

Re: Exception when using some aggregate operators

2015-10-27 Thread Ted Yu

Have you tried using avg in place of mean ? (1 to 5).foreach { i => val df = (1 to 1000).map(j => (j, s"str$j")).toDF("a", "b").save(s"/tmp/partitioned/i=$i") } sqlContext.sql(""" CREATE TEMPORARY TABLE partitionedParquet USING org.apache.spark.sql.parquet OPTIONS ( path '/tm

Re: Spark Implementation of XGBoost

2015-10-27 Thread Meihua Wu

Hi DB Tsai, Thank you again for your insightful comments! 1) I agree the sorting method you suggested is a very efficient way to handle the unordered categorical variables in binary classification and regression. I propose we have a Spark ML Transformer to do the sorting and encoding, bringing th

Filter applied on merged Parquet shemsa with new column fails.

2015-10-27 Thread Hyukjin Kwon

When enabling mergedSchema and predicate filter, this fails since Parquet filters are pushed down regardless of each schema of the splits (or rather files). Dominic Ricard reported this issue ( https://issues.apache.org/jira/browse/SPARK-11103) Even though this would work okay by setting spark.sq

Re: Exception when using some aggregate operators

2015-10-27 Thread Shagun Sodhani

Yup avg works good. So we have alternate functions to use in place on the functions pointed out earlier. But my point is that are those original aggregate functions not supposed to be used or I am using them in the wrong way or is it a bug as I asked in my first mail. On Wed, Oct 28, 2015 at 3:20

Task not serializable exception

2015-10-27 Thread Rohith Parameshwara

I am getting this spark not serializable exception when running spark submit in standalone mode. I am trying to use spark streaming which gets its stream from kafka queues.. but it is not able to process the mapping actions on the RDDs from the stream ..the code where the serialization exception

Re: using JavaRDD in spark-redis connector

2015-10-27 Thread Rohith P

got it ..thank u... -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/using-JavaRDD-in-spark-redis-connector-tp14391p14812.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. -

Re: Spark Implementation of XGBoost

Re: [VOTE] Release Apache Spark 1.5.2 (RC1)

Re: [VOTE] Release Apache Spark 1.5.2 (RC1)

Exception when using some aggregate operators

Re: Exception when using some aggregate operators

Re: Exception when using some aggregate operators

Re: Exception when using some aggregate operators

Re: Exception when using some aggregate operators

Re: If you use Spark 1.5 and disabled Tungsten mode ...

Re: If you use Spark 1.5 and disabled Tungsten mode ...

Re: If you use Spark 1.5 and disabled Tungsten mode ...

Pickle Spark DataFrame

Re: Spark.Executor.Cores question

Re: Spark.Executor.Cores question

Re: Spark.Executor.Cores question

Re: Exception when using some aggregate operators

Re: Spark Implementation of XGBoost

Filter applied on merged Parquet shemsa with new column fails.

Re: Exception when using some aggregate operators

Task not serializable exception

Re: using JavaRDD in spark-redis connector

21 matches

Site Navigation

Mail list logo

Footer information