Re: Can't use UDFs with Dataframes in spark-2.0-preview scala-2.10

2016-06-07 Thread franklyn
Thanks Ted !. I'm using https://github.com/apache/spark/commit/8f5a04b6299e3a47aca13cbb40e72344c0114860 and building with scala-2.10 I can confirm that it works with scala-2.11 -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Can-t-use-UDFs-with-Dataf

Re: Can't use UDFs with Dataframes in spark-2.0-preview scala-2.10

2016-06-07 Thread franklyn
Thanks for reproducing it Ted, should i make a Jira Issue?. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Can-t-use-UDFs-with-Dataframes-in-spark-2-0-preview-scala-2-10-tp17845p17852.html Sent from the Apache Spark Developers List mailing list archiv

Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-19 Thread Franklyn D'souza
-1 https://issues.apache.org/jira/browse/SPARK-18589 hasn't been resolved by this release and is a blocker in our adoption of spark 2.0. I've updated the issue with some steps to reproduce the error. On Mon, Dec 19, 2016 at 4:37 AM, Sean Owen wrote: > PS, here are the open issues for 2.1.0. Forg

Handling nulls in vector columns is non-trivial

2017-06-21 Thread Franklyn D'souza
with real world data. I'd like to know how other users are dealing with this and what plans there are to extend vector support for dataframes. Thanks!, Franklyn

Re: Handling nulls in vector columns is non-trivial

2017-06-21 Thread Franklyn D'souza
hon/ml/imputer_example.py > > which should at least partially address the problem. > > On 06/22/2017 03:03 AM, Franklyn D'souza wrote: > > I just wanted to highlight some of the rough edges around using > > vectors in columns in dataframes. > > > > If there is a n

Re: Handling nulls in vector columns is non-trivial

2017-06-22 Thread Franklyn D'souza
to give it more of a first class support in dataframes by having it work with the lit column expression. On Wed, Jun 21, 2017 at 9:30 PM, Franklyn D'souza < franklyn.dso...@shopify.com> wrote: > From the documentation it states that ` The input columns should be of > DoubleType or

Re: Handling nulls in vector columns is non-trivial

2017-06-23 Thread Franklyn D'souza
=schema) df = df.crossJoin(empty_vector) df = df.withColumn('feature', F.coalesce('feature', '_empty_vector') On Thu, Jun 22, 2017 at 11:54 AM, Franklyn D'souza < franklyn.dso...@shopify.com> wrote: > We've developed Scala UDFs internally t

Operations on DataFrames with User Defined Types in pyspark

2016-02-11 Thread Franklyn D'souza
I'm using the UDT api to work with a custom Money datatype in dataframes. heres how i have it setup class StringUDT(UserDefinedType): @classmethod def sqlType(self): return StringType() @classmethod def module(cls): return cls.__module__ @classmethod def

Nulls getting converted to 0 with spark 2.0 SNAPSHOT

2016-03-07 Thread Franklyn D'souza
udf(df._tmp_col)) df = df.drop("_tmp_col") *# None gets converted to 0* *df.collect() # [Row(b=u'one', a=1), Row(b=u'two', a=0)]* Thanks, Franklyn

Can't compile 2.0-preview with scala 2.10

2016-06-06 Thread Franklyn D'souza
ncies failed with message: Found Banned Dependency: org.scala-lang.modules:scala-xml_2.11:jar:1.0.2 Found Banned Dependency: org.scalatest:scalatest_2.11:jar:2.2.6 Is scala 2.10 not being supported going forward ?. If so the profile should probably be removed from the master pom.xml Thanks, Franklyn

Can't use UDFs with Dataframes in spark-2.0-preview scala-2.10

2016-06-07 Thread Franklyn D'souza
I've built spark-2.0-preview (8f5a04b) with scala-2.10 using the following > > > ./dev/change-version-to-2.10.sh > ./dev/make-distribution.sh -DskipTests -Dzookeeper.version=3.4.5 > -Dcurator.version=2.4.0 -Dscala-2.10 -Phadoop-2.6 -Pyarn -Phive and then ran the following code in a pyspark shell

Spark Assembly jar ?

2016-06-14 Thread Franklyn D'souza
Just wondering where the spark-assembly jar has gone in 2.0. i've been reading that its been removed but i'm not sure what the new workflow is .