Hi,
I'd like to append a column of a dataframe to another DF (using Spark
1.5.2):
DataFrame outputDF = unlabelledDF.withColumn("predicted_label",
predictedDF.col("predicted"));
I get the following exception:
java.lang.IllegalArgumentException: requirement failed: DataFrame must have
the same sc
edDF.join(predictedDF.select(“id”,”predicted”),”id”)
>
> On 11 February 2016 at 10:12, Zsolt Tóth wrote:
>
>> Hi,
>>
>> I'd like to append a column of a dataframe to another DF (using Spark
>> 1.5.2):
>>
>> DataFrame outputDF = unlabelledDF.withColumn("pre
org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:304)
Regards,
Zsolt
2016-02-12 13:11 GMT+01:00 Ted Yu :
> Can you pastebin the full error with all column types ?
>
> There should be a difference between some column(s).
>
> Cheers
>
> > On Feb 11, 2016, at 2:12 AM, Zsolt Tóth
> wrote:
&
Hi,
I try to throw an exception of my own exception class (MyException extends
SparkException) on one of the executors. This works fine on Spark 1.3.x,
1.4.x but throws a deserialization/ClassNotFound exception on Spark 1.5.x.
This happens only when I throw it on an executor, on the driver it
succ
Hi,
this is exactly the same as my issue, seems to be a bug in 1.5.x.
(see my thread for details)
2015-11-19 11:20 GMT+01:00 Jeff Zhang :
> Seems your jdbc url is not correct. Should be jdbc:mysql://
> 192.168.41.229:3306
>
> On Thu, Nov 19, 2015 at 6:03 PM, wrote:
>
>> hi guy,
>>
>>I a
Hi Tamás,
the exception class is in the application jar, I'm using the spark-submit
script.
2015-11-19 11:54 GMT+01:00 Tamas Szuromi :
> Hi Zsolt,
>
> How you load the jar and how you prepend it to the classpath?
>
> Tamas
>
>
>
>
> On 19 November 2015 a
Hi,
I have a Spark job with many transformations (sequence of maps and
mapPartitions) and only one action in the end (DataFrame.write()). The
transformations return an RDD, so I need to create a DataFrame.
To be able to use sqlContext.createDataFrame() I need to know the schema of
the Row but for
Hi,
I ran your example on Spark-1.4.1 and 1.5.0-rc3. It succeeds on 1.4.1 but
throws the OOM on 1.5.0. Do any of you know which PR introduced this
issue?
Zsolt
2015-09-07 16:33 GMT+02:00 Zoltán Zvara :
> Hey, I'd try to debug, profile ResolvedDataSource. As far as I know, your
> write will b
Hi,
in Spark 1.6 the glm's predict() method returned a DataFrame with 0/1
prediction values. In 2.0 however, the same code returns confidence-like
values, e.g. 0.5320209312.
Can anyone tell me, what caused the change here? Is it possible to get the
old, binary values with Spark 2.0?
Regards,
Zsol
Hi,
I ran some tests regarding Spark's Delegation Token renewal mechanism. As I
see, the concept here is simple: if I give my keytab file and client
principal to Spark, it starts a token renewal thread, and renews the
namenode delegation tokens after some time. This works fine.
Then I tried to ru
Any ideas about this one? Am I missing something here?
2016-11-03 15:22 GMT+01:00 Zsolt Tóth :
> Hi,
>
> I ran some tests regarding Spark's Delegation Token renewal mechanism. As
> I see, the concept here is simple: if I give my keytab file and client
> principal to Spar
e definitely have run into it. So
> if you're not hitting it, it's most definitely an issue with your test
> configuration.
>
> On Thu, Nov 3, 2016 at 7:22 AM, Zsolt Tóth
> wrote:
> > Hi,
> >
> > I ran some tests regarding Spark's Delegation Token r
extend its lifetime. The feature you're talking about is for
> creating *new* delegation tokens after the old ones expire and cannot
> be renewed anymore (i.e. the max-lifetime configuration).
>
> On Thu, Nov 3, 2016 at 2:02 PM, Zsolt Tóth
> wrote:
> > Yes, I did change dfs.
based on the renew-interval instead of the max-lifetime?
2016-11-04 2:37 GMT+01:00 Marcelo Vanzin :
> On Thu, Nov 3, 2016 at 3:47 PM, Zsolt Tóth
> wrote:
> > What is the purpose of the delegation token renewal (the one that is done
> > automatically by Hadoop libraries, afte
Hi,
I need to run a map() and a mapPartitions() on my input DF. As a
side-effect of the map(), a partition-local variable should be updated,
that is used in the mapPartitions() afterwards.
I can't use Broadcast variable, because it's shared between partitions on
the same executor.
Where can I def
Any comment on this one?
2016. nov. 16. du. 12:59 ezt írta ("Zsolt Tóth" ):
> Hi,
>
> I need to run a map() and a mapPartitions() on my input DF. As a
> side-effect of the map(), a partition-local variable should be updated,
> that is used in the mapPartitions()
Hi,
I'm trying to replace values in a nested column in a JSON-based dataframe
using withColumn().
This syntax works for select, filter, etc, giving only the nested "country"
column:
df.select('body.payload.country')
but if I do this, it will create a new column with the name
"body.payload.countr
Hi,
I use DecisionTree for multi class classification.
I can get the probability of the predicted label for every node in the
decision tree from node.predict().prob(). Is it possible to retrieve or
count the probability of every possible label class in the node?
To be more clear:
Say in Node A the
Hi,
I'm using Spark in yarn-cluster mode and submit the jobs programmatically
from the client in Java. I ran into a few issues when tried to set the
resource allocation properties.
1. It looks like setting spark.executor.memory, spark.executor.cores and
spark.executor.instances have no effect bec
One more question: Is there reason why Spark throws an error when
requesting too much memory instead of capping it to the maximum value (as
YARN would do by default)?
Thanks!
2015-02-10 17:32 GMT+01:00 Zsolt Tóth :
> Hi,
>
> I'm using Spark in yarn-cluster mode and s
Hi,
I submit spark jobs in yarn-cluster mode remotely from java code by calling
Client.submitApplication(). For some reason I want to use 1.3.0 jars on the
client side (e.g spark-yarn_2.10-1.3.0.jar) but I have
spark-assembly-1.2.1* on the cluster.
The problem is that the ApplicationMaster can't f
Hi,
I use sc.hadoopFile(directory, OrcInputFormat.class, NullWritable.class,
OrcStruct.class) to use data in ORC format as an RDD. I made some
benchmarking on ORC input vs Text input for MLlib and I ran into a few
issues with ORC.
Setup: yarn-cluster mode, 11 executors, 4 cores, 9g executor memory
Hi,
I have a simple Spark application: it creates an input rdd with
sc.textfile, and it calls flatMapToPair, reduceByKey and map on it. The
output rdd is small, a few MB's. Then I call collect() on the output.
If the textfile is ~50GB, it finishes in a few minutes. However, if it's
larger (~100GB
the SQL data source API:
> https://github.com/apache/spark/pull/3753. You can try pulling that PR
> and help test it. -Xiangrui
>
> On Wed, Mar 25, 2015 at 5:03 AM, Zsolt Tóth
> wrote:
> > Hi,
> >
> > I use sc.hadoopFile(directory, OrcInputFormat.class, NullWrita
size huge, you can simply do a count() to
> trigger the execution.
>
> Can you paste your exception stack trace so that we'll know whats
> happening?
>
> Thanks
> Best Regards
>
> On Fri, Mar 27, 2015 at 9:18 PM, Zsolt Tóth
> wrote:
>
>> Hi,
>>
>
I use EMR 3.3.1 which comes with Java 7. Do you think that this may cause
the issue? Did you test it with Java 8?
me, or it might point to spark internals.
>
> On Wed, Apr 8, 2015 at 3:45 AM, Zsolt Tóth
> wrote:
>
>> I use EMR 3.3.1 which comes with Java 7. Do you think that this may cause
>> the issue? Did you test it with Java 8?
>>
>
>
Hi all,
it looks like the 1.2.2 pre-built version for hadoop2.4 is not available on
the mirror sites. Am I missing something?
Regards,
Zsolt
28 matches
Mail list logo