Re: Representing a recursive data type in Spark SQL

Reynold Xin Thu, 28 May 2015 00:31:25 -0700

I think it is fairly hard to support recursive data types. What I've seen
in one other proprietary system in the past is to let the user define the
depth of the nested data types, and then just expand the struct/map/list
definition to the maximum level of depth.


Would this solve your problem?




On Wed, May 20, 2015 at 6:07 PM, Jeremy Lucas <jeremyalu...@gmail.com>
wrote:

> Hey Rakesh,
>
> To clarify, what I was referring to is when doing something like this:
>
> sqlContext.applySchema(rdd, mySchema)
>
> mySchema must be a well-defined StructType, which presently does not allow
> for a recursive type.
>
>
> On Wed, May 20, 2015 at 5:39 PM Rakesh Chalasani <vnit.rak...@gmail.com>
> wrote:
>
>> Hi Jeremy:
>>
>> Row is a collect of 'Any'. So, you can be used as a recursive data type.
>> Is this what you were looking for?
>>
>> Example:
>> val x = sc.parallelize(Array.range(0,10)).map(x => Row(Row(x),
>> Row(x.toString)))
>>
>> Rakesh
>>
>>
>>
>> On Wed, May 20, 2015 at 7:23 PM Jeremy Lucas <jeremyalu...@gmail.com>
>> wrote:
>>
>>> Spark SQL has proven to be quite useful in applying a partial schema to
>>> large JSON logs and being able to write plain SQL to perform a wide variety
>>> of operations over this data. However, one small thing that keeps coming
>>> back to haunt me is the lack of support for recursive data types, whereby a
>>> member of a complex/struct value can be of the same type as the
>>> complex/struct value itself.
>>>
>>> I am hoping someone may be able to point me in the right direction of
>>> where to start to build out such capabilities, as I'd be happy to
>>> contribute, but am very new to this particular component of the Spark
>>> project.
>>>
>>

Re: Representing a recursive data type in Spark SQL

Reply via email to