I think you are right that there is no way to call Java UDF without
registration right now. Adding another 20 methods to functions would be
scary. Maybe the best way is to have a companion object
for UserDefinedFunction, and define UDF there?
e.g.
object UserDefinedFunction {
def define(f: org
Did was it a clean compilation?
TD
On Fri, May 29, 2015 at 10:48 PM, Ted Yu wrote:
> Hi,
> I ran the following command on 1.4.0 RC3:
>
> mvn -Phadoop-2.4 -Dhadoop.version=2.7.0 -Pyarn -Phive package
>
> I saw the following failure:
>
> ^[[32mStreamingContextSuite:^[[0m
> ^[[32m- from no conf co
Yea would be great to support a Column. Can you create a JIRA, and possibly
a pull request?
On Fri, May 29, 2015 at 2:45 AM, Olivier Girardot <
o.girar...@lateral-thoughts.com> wrote:
> Actually, the Scala API too is only based on column name
>
> Le ven. 29 mai 2015 à 11:23, Olivier Girardot <
>
I downloaded source tar ball and ran command similar to following with:
clean package -DskipTests
Then I ran the following command.
Fyi
> On May 30, 2015, at 12:42 AM, Tathagata Das wrote:
>
> Did was it a clean compilation?
>
> TD
>
>> On Fri, May 29, 2015 at 10:48 PM, Ted Yu wrote:
>
No 1.4.0 Blockers at this point, which is great. Forking this thread
to discuss something else.
There are 92 issues targeted for 1.4.0, 28 of which are marked
Critical. Many are procedural issues like "update docs for 1.4" or
"check X for 1.4". Are these resolved? They sound like things that are
d
The idea of asking for both the argument and return class is interesting. I
don't think we do that for the scala APIs currently, right? In
functions.scala, we only use the TypeTag for RT.
def udf[RT: TypeTag, A1: TypeTag](f: Function1[A1, RT]):
UserDefinedFunction = {
UserDefinedFunction(f,
Jira done : https://issues.apache.org/jira/browse/SPARK-7969
I've already started working on it but it's less trivial than it seems
because I don't exactly now the inner workings of the catalog,
and how to get the qualified name of a column to match it against the
schema/catalog.
Regards,
Olivier
If I do the following
df2 = df.withColumn('y', df['x'] * 7)
df3 = df2.withColumn('z', df2.y * 3)
df3.explain()
Then the result is
> Project [date#56,id#57,timestamp#58,x#59,(x#59 * 7.0) AS y#64,((x#59
* 7.0) AS y#64 * 3.0) AS z#65]
> PhysicalRDD [date#56,id#57,timestamp#58,x
On second thought, perhaps can this be done by writing a rule that builds
the dag of dependencies between expressions, then convert it into several
layers of projections, where each new layer is allowed to depend on
expression results from previous projections?
Are there any pitfalls to this appro
I think just match the Column’s expr as UnresolvedAttribute and use
UnresolvedAttribute’s name to match schema’s field name is enough.
Seems no need to regard expr as a more general one. :)
On May 30, 2015 at 11:14:05 PM, Girardot Olivier
(o.girar...@lateral-thoughts.com) wrote:
Jira done : ht
Hi,
I try to aggregate the value in each partition internally.
For example,
Before:
worker 1:worker 2:
1, 2, 1 2, 1, 2
After:
worker 1: worker 2:
(1->2), (2->1) (1->1), (2->2)
I try to use mappartitions,
object MyTest {
Thanks Josh and Yin.
Created following JIRA for the same :-
https://issues.apache.org/jira/browse/SPARK-7970
Thanks
-Nitin
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/ClosureCleaner-slowing-down-Spark-SQL-queries-tp12466p12515.html
Sent from the
bq. val result = fDB.mappartitions(testMP).collect
Not sure if you pasted the above code - there was a typo: method name
should be mapPartitions
Cheers
On Sat, May 30, 2015 at 9:44 AM, unioah wrote:
> Hi,
>
> I try to aggregate the value in each partition internally.
> For example,
>
> Bef
Name resolution is not as easy I think. Wenchen can maybe give you some
advice on resolution about this one.
On Sat, May 30, 2015 at 9:37 AM, Yijie Shen
wrote:
> I think just match the Column’s expr as UnresolvedAttribute and use
> UnresolvedAttribute’s name to match schema’s field name is eno
I think you are looking for
http://en.wikipedia.org/wiki/Common_subexpression_elimination in the
optimizer.
One thing to note is that as we do more and more optimization like this,
the optimization time might increase. Do you see a case where this can
bring you substantial performance gains?
On
We added all the typetags for arguments but haven't got around to use them
yet. I think it'd make sense to have them and do the auto cast, but we can
have rules in analysis to forbid certain casts (e.g. don't auto cast double
to int).
On Sat, May 30, 2015 at 7:12 AM, Justin Uang wrote:
> The id
Thank you for your reply.
But the typo is not reason for the problem.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/problem-with-using-mapPartitions-tp12514p12520.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
+1 (non-binding, of course)
1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:07 min
mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
-Dhadoop.version=2.6.0 -DskipTests
2. Tested pyspark, mlib - running as well as compare results with 1.3.1
2.1. statistics (min,max,mean,Pearson,Spe
FYI we merged a patch that improves unit test log debugging. In order for
that to work, all test suites have been changed to extend SparkFunSuite
instead of ScalaTest's FunSuite. We also added a rule in the Scala style
checker to fail Jenkins if FunSuite is used.
The patch that introduced SparkFun
I solved the problem.
It was caused by using spark-core_2.11 mvn repository.
When I compiled with spark-core_2.10, the problem doesn't show up again.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/problem-with-using-mapPartitions-tp12514p12523.ht
I think this is likely something that we'll want to do during the code
generation phase. Though its probably not the lowest hanging fruit at this
point.
On Sun, May 31, 2015 at 5:02 AM, Reynold Xin wrote:
> I think you are looking for
> http://en.wikipedia.org/wiki/Common_subexpression_eliminat
21 matches
Mail list logo