Re: Building Spark : Adding new DataType in Catalyst

kmader Wed, 22 Apr 2015 22:08:58 -0700

Unless you are directly concerned with the query optimization you needn't
modify catalyst or any of the core Spark SQL code.


You can simply create a new project with Spark SQL as a dependency and like
is done in MLLib Vectors (in 1.3, the newer versions have it for matrices as
well)
Use the @SQLUserDefinedType annotation and create a new UDT class converting
your object to and from SparkSQL types. Note the annotation must be in the
root of a file (not inside an object) otherwise it won't get seen by Spark
SQL.

@SQLUserDefinedType(udt = classOf[VectorUDT])
sealed trait Vector extends Serializable {

private[spark] class VectorUDT extends UserDefinedType[Vector] 

see the implementation details in the following file

https://github.com/apache/spark/blob/branch-1.3/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Building-Spark-Adding-new-DataType-in-Catalyst-tp22604p22623.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Building Spark : Adding new DataType in Catalyst

Reply via email to