Hi Jao, You're right that defining serialize and deserialize is the main task in implementing a UDT. They are basically translating between your native representation (ByteImage) and SQL DataTypes. The sqlType you defined looks correct, and you're correct to use a row of length 4. Other than that, it should just require copying data to and from SQL Rows. There are quite a few examples of that in the codebase; I'd recommend searching based on the particular DataTypes you're using.
Are there particular issues you're running into? Joseph On Mon, Jan 19, 2015 at 12:59 AM, Jaonary Rabarisoa <jaon...@gmail.com> wrote: > Hi all, > > I'm trying to implement a pipeline for computer vision based on the latest > ML package in spark. The first step of my pipeline is to decode image (jpeg > for instance) stored in a parquet file. > For this, I begin to create a UserDefinedType that represents a decoded > image stored in a array of byte. Here is my first attempt : > > > > > > > > > > > > > > > > > > > > > > > > *@SQLUserDefinedType(udt = classOf[ByteImageUDT])class ByteImage(channels: > Int, width: Int, height: Int, data: Array[Byte])private[spark] class > ByteImageUDT extends UserDefinedType[ByteImage] { override def sqlType: > StructType = { // type: 0 = sparse, 1 = dense // We only use "values" > for dense vectors, and "size", "indices", and "values" for sparse // > vectors. The "values" field is nullable because we might want to add binary > vectors later, // which uses "size" and "indices", but not "values". > StructType(Seq( StructField("channels", IntegerType, nullable = false), > StructField("width", IntegerType, nullable = false), > StructField("height", IntegerType, nullable = false), > StructField("data", BinaryType, nullable = false) } override def > serialize(obj: Any): Row = { val row = new GenericMutableRow(4) val img > = obj.asInstanceOf[ByteImage]* > > > > > > > *... } override def deserialize(datum: Any): Vector = { * > > *....* > > > > > > > > > * } } override def pyUDT: String = "pyspark.mllib.linalg.VectorUDT" > override def userClass: Class[Vector] = classOf[Vector]}* > > > I take the VectorUDT as a starting point but there's a lot of thing that I > don't really understand. So any help on defining serialize and deserialize > methods will be appreciated. > > Best Regards, > > Jao > >