Unless you are directly concerned with the query optimization you needn't
modify catalyst or any of the core Spark SQL code.
You can simply create a new project with Spark SQL as a dependency and like
is done in MLLib Vectors (in 1.3, the newer versions have it for matrices as
well)
Use the @SQLUserDefinedType annotation and create a new UDT class converting
your object to and from SparkSQL types. Note the annotation must be in the
root of a file (not inside an object) otherwise it won't get seen by Spark
SQL.
@SQLUserDefinedType(udt = classOf[VectorUDT])
sealed trait Vector extends Serializable {
private[spark] class VectorUDT extends UserDefinedType[Vector]
see the implementation details in the following file
https://github.com/apache/spark/blob/branch-1.3/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Building-Spark-Adding-new-DataType-in-Catalyst-tp22604p22623.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]