[Spark SQL] experimental join strategy
Hi, I'm trying to implement a custom join strategy using the experimental extraStrategies extension point. I have created a very simple implementation (see the below gist) but currently it is failing and I don't understand why. https://gist.github.com/jongray/a413b5401c4edd47b7d8f559e5b2f79b I hope the gist demonstrates what I'm trying to do but I'd like to take the steps I've done explicitly (distinct the foreign keys on the 'left' and then use them to filter the 'right' join table before joining it to the left) be implemented in the strategy. This initial implementation will takes a brute force approach to the problem (in my particular use case I happen to want all joins following this pattern) but I would like to extend it to take a cost based approach (but not now). Could someone review and point out anything that looks wrong? The final physical explain plans that get displayed for both the explicit and the strategy join look nearly identical but the results from using my join strategy are incorrect - it looks like a cartesian product of the filtered sets. Thanks, Jon
Re: Nested/Chained case statements generate codegen over 64k exception
I came back to this to try and investigate further using the latest version of the project. However, I don't have enough experience with the code base to understand fully what is now happening, could someone take a look at the testcase attached to this JIRA and run on the latest version of the code base? It currently appears as one branch of the code receives the code compilation exception and so applies the fallback. However, subsequent a similar exception is thrown for different branches of the code (does the non-compilable code get put into a cache somewhere?) So, where it should now be falling back to non-codegen it doesn't appear to completely. On 19 May 2016 at 09:25, Jonathan Gray wrote: > That makes sense, I will take a look there first. That will at least give > a clearer understanding of the problem space to determine when to fallback. > On 15 May 2016 3:02 am, "Reynold Xin" wrote: > >> It might be best to fix this with fallback first, and then figure out how >> we can do it more intelligently. >> >> >> >> On Sat, May 14, 2016 at 2:29 AM, Jonathan Gray >> wrote: >> >>> Hi, >>> >>> I've raised JIRA SPARK-15258 (with code attached to re-produce problem) >>> and would like to have a go at fixing it but don't really know where to >>> start. Could anyone provide some pointers? >>> >>> I've looked at the code associated with SPARK-13242 but was hoping to >>> find a way to avoid the codegen fallback. Is this something that is >>> possible? >>> >>> Thanks, >>> Jon >>> >> >>
Nested/Chained case statements generate codegen over 64k exception
Hi, I've raised JIRA SPARK-15258 (with code attached to re-produce problem) and would like to have a go at fixing it but don't really know where to start. Could anyone provide some pointers? I've looked at the code associated with SPARK-13242 but was hoping to find a way to avoid the codegen fallback. Is this something that is possible? Thanks, Jon
Re: Nested/Chained case statements generate codegen over 64k exception
That makes sense, I will take a look there first. That will at least give a clearer understanding of the problem space to determine when to fallback. On 15 May 2016 3:02 am, "Reynold Xin" wrote: > It might be best to fix this with fallback first, and then figure out how > we can do it more intelligently. > > > > On Sat, May 14, 2016 at 2:29 AM, Jonathan Gray > wrote: > >> Hi, >> >> I've raised JIRA SPARK-15258 (with code attached to re-produce problem) >> and would like to have a go at fixing it but don't really know where to >> start. Could anyone provide some pointers? >> >> I've looked at the code associated with SPARK-13242 but was hoping to >> find a way to avoid the codegen fallback. Is this something that is >> possible? >> >> Thanks, >> Jon >> > >