Hi Meihua,
For categorical features, the ordinal issue can be solved by trying
all kind of different partitions 2^(q-1) -1 for q values into two
groups. However, it's computational expensive. In Hastie's book, in
9.2.4, the trees can be trained by sorting the residuals and being
learnt as if they
Ah, good point. I also see it still reads 1.5.1. I imagine we just need
another sweep to update all the version strings.
On Tue, Oct 27, 2015 at 3:08 AM, Krishna Sankar wrote:
> Guys,
>The sc.version returns 1.5.1 in python and scala. Is anyone getting the
> same results ? Probably I am doin
Yup looks like I missed that. I will build a new one.
On Tuesday, October 27, 2015, Sean Owen wrote:
> Ah, good point. I also see it still reads 1.5.1. I imagine we just need
> another sweep to update all the version strings.
>
> On Tue, Oct 27, 2015 at 3:08 AM, Krishna Sankar > wrote:
>
>> Guy
Hi! I was trying out some aggregate functions in SparkSql and I noticed
that certain aggregate operators are not working. This includes:
approxCountDistinct
countDistinct
mean
sumDistinct
For example using countDistinct results in an error saying
*Exception in thread "main" org.apache.spark.sql.
Oops seems I made a mistake. The error message is : Exception in thread
"main" org.apache.spark.sql.AnalysisException: undefined function
countDistinct
On 27 Oct 2015 15:49, "Shagun Sodhani" wrote:
> Hi! I was trying out some aggregate functions in SparkSql and I noticed
> that certain aggregate
Try
count(distinct columnane)
In SQL distinct is not part of the function name.
On Tuesday, October 27, 2015, Shagun Sodhani
wrote:
> Oops seems I made a mistake. The error message is : Exception in thread
> "main" org.apache.spark.sql.AnalysisException: undefined function
> countDistinct
> On
Will try in a while when I get back. I assume this applies to all functions
other than mean. Also countDistinct is defined along with all other SQL
functions. So I don't get "distinct is not part of function name" part.
On 27 Oct 2015 19:58, "Reynold Xin" wrote:
> Try
>
> count(distinct columnane
So I tried @Reynold's suggestion. I could get countDistinct and sumDistinct
running but mean and approxCountDistinct do not work. (I guess I am using
the wrong syntax for approxCountDistinct) For mean, I think the
registry entry is missing. Can someone clarify that as well?
On Tue, Oct 27, 2015 a
I have disabled it because of it started generating ERROR's when upgrading
from Spark 1.4 to 1.5.1
2015-10-27T20:50:11.574+0100 ERROR TungstenSort.newOrdering() - Failed to
generate ordering, fallback to interpreted
java.util.concurrent.ExecutionException: java.lang.Exception: failed to
compile: o
Hi Sjoerd,
Did your job actually *fail* or did it just generate many spurious
exceptions? While the stacktrace that you posted does indicate a bug, I
don't think that it should have stopped query execution because Spark
should have fallen back to an interpreted code path (note the "Failed to
gener
No the job actually doesn't fail, but since our tests is generating all
these stacktraces i have disabled the tungsten mode just to be sure (and
don't have gazilion stacktraces in production).
2015-10-27 20:59 GMT+01:00 Josh Rosen :
> Hi Sjoerd,
>
> Did your job actually *fail* or did it just gen
Hi, I'd like to "pickle" a Spark DataFrame object and have tried the
following:
import pickle
data = sparkContext.jsonFile(data_file) #load file
with open('out.pickle', 'wb') as handle:
pickle.dump(data, handle)
If I convert "data" to a Pandas DataFrame (e.g.,
Hi Mark,
if you know your cluster's number of workers and cores per worker you can
set this up when you create a SparkContext and shouldn't need to tinker
with the 'spark.executor.cores' setting. That setting is for running
multiple executors per application per worker, which you are saying you
do
Hi Richard,
Thanks for the response.
I should have added that the specific case where this becomes a problem is
when one of the executors for that application is lost/killed prematurely,
and the application attempts to spawn up a new executor without
consideration as to whether an executor alrea
Ah I see, that's a bit more complicated =). If it's possible, would using
`spark.executor.memory` to set the available worker memory used by
executors help alleviate the problem of running on a node that already has
an executor on it? I would assume that would have a constant worst case
overhead pe
Have you tried using avg in place of mean ?
(1 to 5).foreach { i => val df = (1 to 1000).map(j => (j,
s"str$j")).toDF("a", "b").save(s"/tmp/partitioned/i=$i") }
sqlContext.sql("""
CREATE TEMPORARY TABLE partitionedParquet
USING org.apache.spark.sql.parquet
OPTIONS (
path '/tm
Hi DB Tsai,
Thank you again for your insightful comments!
1) I agree the sorting method you suggested is a very efficient way to
handle the unordered categorical variables in binary classification
and regression. I propose we have a Spark ML Transformer to do the
sorting and encoding, bringing th
When enabling mergedSchema and predicate filter, this fails since Parquet
filters are pushed down regardless of each schema of the splits (or rather
files).
Dominic Ricard reported this issue (
https://issues.apache.org/jira/browse/SPARK-11103)
Even though this would work okay by setting spark.sq
Yup avg works good. So we have alternate functions to use in place on the
functions pointed out earlier. But my point is that are those original
aggregate functions not supposed to be used or I am using them in the wrong
way or is it a bug as I asked in my first mail.
On Wed, Oct 28, 2015 at 3:20
I am getting this spark not serializable exception when running spark submit in
standalone mode. I am trying to use spark streaming which gets its stream from
kafka queues.. but it is not able to process the mapping actions on the RDDs
from the stream ..the code where the serialization exception
got it ..thank u...
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/using-JavaRDD-in-spark-redis-connector-tp14391p14812.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
-
21 matches
Mail list logo