Inner join with the table itself

2018-01-15 Thread Michael Shtelma
Hi all, If I try joining the table with itself using join columns, I am getting the following error: "Join condition is missing or trivial. Use the CROSS JOIN syntax to allow cartesian products between these relations.;" This is not true, and my join is not trivial and is not a real cross join. I

Re: Inner join with the table itself

2018-01-15 Thread Jacek Laskowski
Hi Michael, -dev +user What's the query? How do you "fool spark"? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/ma

Limit the block size of data received by spring streaming receiver

2018-01-15 Thread Xilang Yan
Hey, We use a customize receiver to receive data from our MQ. We used to use def store(dataItem: T) to store data however I found the block size can be very different from 0.5K to 5M size. So that data partition processing time is very different. Shuffle is an option, but I want to avoid it.

Re: Join Strategies

2018-01-15 Thread Herman van Hövell tot Westerflier
Hey Marco, A Cartesian product is an inner join by definition :). The current cartesian product operator does not support outer joins, so we use the only operator that does: BroadcastNestedLoopJoinExec. This is far from great, and it does have the potential to OOM, there are some safety nets in th

Broken SQL Visualization?

2018-01-15 Thread Tomasz Gawęda
Hi, today I have updated my test cluster to current Spark master, after that my SQL Visualization page started to crash with following error in JS: [cid:part1.DB2FB812.D25D60D1@outlook.com] Screenshot was cut for readability and to hide internal server names ;) It may be caused by upgrade or b

Re: Broken SQL Visualization?

2018-01-15 Thread Ted Yu
Did you include any picture ? Looks like the picture didn't go thru. Please use third party site.  Thanks Original message From: Tomasz Gawęda Date: 1/15/18 2:07 PM (GMT-08:00) To: dev@spark.apache.org, u...@spark.apache.org Subject: Broken SQL Visualization? Hi, today I hav

Re: Broken SQL Visualization?

2018-01-15 Thread Wenchen Fan
Hi, thanks for reporting, can you include the steps to reproduce this bug? On Tue, Jan 16, 2018 at 7:07 AM, Ted Yu wrote: > Did you include any picture ? > > Looks like the picture didn't go thru. > > Please use third party site. > > Thanks > > Original message > From: Tomasz G

Thoughts on Cloudpickle Update

2018-01-15 Thread Bryan Cutler
Hi All, I've seen a couple issues lately related to cloudpickle, notably https://issues.apache.org/jira/browse/SPARK-22674, and would like to get some feedback on updating the version in PySpark which should fix these issues and allow us to remove some workarounds. Spark is currently using a fork

Re: Thoughts on Cloudpickle Update

2018-01-15 Thread Hyukjin Kwon
Hi Bryan, Yup, I support to match the version. I pushed it forward before to match it with https://github.com/cloudpipe/cloudpickle before few times in Spark's copy and also cloudpickle itself with few fixes. I believe our copy is closest to 0.4.1. I have been trying to follow up the changes in c