Hi,

I think it would be great if we could do the string parsing only once and
then just apply the transformation for each interval (reducing the
processing overhead for short intervals).

Also, one issue with the approach above is that transform() has the
following signature:

  def transform(transformFunc: RDD[T] => RDD[U]): DStream[U]

and therefore, in my example

val result = lines.transform((rdd, time) => {
>>   // execute statement
>>   rdd.registerAsTable("data")
>>   sqlc.sql(query)
>> })
>>
>
the variable `result ` is of type DStream[Row]. That is, the
meta-information from the SchemaRDD is lost and, from what I understand,
there is then no way to learn about the column names of the returned data,
as this information is only encoded in the SchemaRDD. I would love to see a
fix for this.

Thanks
Tobias

Reply via email to