[Spark SQL] experimental join strategy

2017-10-23 Thread Jonathan Gray
Hi,

I'm trying to implement a custom join strategy using the experimental
extraStrategies extension point.  I have created a very simple
implementation (see the below gist) but currently it is failing and I don't
understand why.

https://gist.github.com/jongray/a413b5401c4edd47b7d8f559e5b2f79b

I hope the gist demonstrates what I'm trying to do but I'd like to take the
steps I've done explicitly (distinct the foreign keys on the 'left' and
then use them to filter the 'right' join table before joining it to the
left) be implemented in the strategy.

This initial implementation will takes a brute force approach to the
problem (in my particular use case I happen to want all joins following
this pattern) but I would like to extend it to take a cost based approach
(but not now).

Could someone review and point out anything that looks wrong?  The final
physical explain plans that get displayed for both the explicit and the
strategy join look nearly identical but the results from using my join
strategy are incorrect - it looks like a cartesian product of the filtered
sets.

Thanks,
Jon


Re: Nested/Chained case statements generate codegen over 64k exception

2016-07-25 Thread Jonathan Gray
I came back to this to try and investigate further using the latest version
of the project.  However, I don't have enough experience with the code base
to understand fully what is now happening, could someone take a look at the
testcase attached to this JIRA and run on the latest version of the code
base?

It currently appears as one branch of the code receives the code
compilation exception and so applies the fallback.  However, subsequent a
similar exception is thrown for different branches of the code (does the
non-compilable code get put into a cache somewhere?)  So, where it should
now be falling back to non-codegen it doesn't appear to completely.

On 19 May 2016 at 09:25, Jonathan Gray  wrote:

> That makes sense, I will take a look there first. That will at least give
> a clearer understanding of the problem space to determine when to fallback.
> On 15 May 2016 3:02 am, "Reynold Xin"  wrote:
>
>> It might be best to fix this with fallback first, and then figure out how
>> we can do it more intelligently.
>>
>>
>>
>> On Sat, May 14, 2016 at 2:29 AM, Jonathan Gray 
>> wrote:
>>
>>> Hi,
>>>
>>> I've raised JIRA SPARK-15258 (with code attached to re-produce problem)
>>> and would like to have a go at fixing it but don't really know where to
>>> start.  Could anyone provide some pointers?
>>>
>>> I've looked at the code associated with SPARK-13242 but was hoping to
>>> find a way to avoid the codegen fallback.  Is this something that is
>>> possible?
>>>
>>> Thanks,
>>> Jon
>>>
>>
>>


Nested/Chained case statements generate codegen over 64k exception

2016-05-14 Thread Jonathan Gray
Hi,

I've raised JIRA SPARK-15258 (with code attached to re-produce problem) and
would like to have a go at fixing it but don't really know where to start.
Could anyone provide some pointers?

I've looked at the code associated with SPARK-13242 but was hoping to find
a way to avoid the codegen fallback.  Is this something that is possible?

Thanks,
Jon


Re: Nested/Chained case statements generate codegen over 64k exception

2016-05-19 Thread Jonathan Gray
That makes sense, I will take a look there first. That will at least give a
clearer understanding of the problem space to determine when to fallback.
On 15 May 2016 3:02 am, "Reynold Xin"  wrote:

> It might be best to fix this with fallback first, and then figure out how
> we can do it more intelligently.
>
>
>
> On Sat, May 14, 2016 at 2:29 AM, Jonathan Gray 
> wrote:
>
>> Hi,
>>
>> I've raised JIRA SPARK-15258 (with code attached to re-produce problem)
>> and would like to have a go at fixing it but don't really know where to
>> start.  Could anyone provide some pointers?
>>
>> I've looked at the code associated with SPARK-13242 but was hoping to
>> find a way to avoid the codegen fallback.  Is this something that is
>> possible?
>>
>> Thanks,
>> Jon
>>
>
>