Re: Analyzing and reusing cached Datasets

2016-11-20 Thread Jacek Laskowski
Hi Michael, Thanks a lot for your prompt answer. I greatly appreciate it. Having said that, I think we might be...cough...cough...wrong :) I think the "issue" is in QueryPlan.sameResult [1] as its scaladoc says: * Since its likely undecidable to generally determine if two given plans will pr

Re: Analyzing and reusing cached Datasets

2016-11-19 Thread Michael Armbrust
You are hitting a weird optimization in withColumn. Specifically, to avoid building up huge trees with chained calls to this method, we collapse projections eagerly (instead of waiting for the optimizer). Typically we look for cached data in between analysis and optimization, so that optimization