Re: Distinct on Map data type -- SPARK-19893

2018-01-13 Thread ckhari4u
Wan, Thanks a lot,! I see the issue now. Do we have any JIRA's open for the future work to be done on this? -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@sp

Distinct on Map data type -- SPARK-19893

2018-01-12 Thread ckhari4u
I see SPARK-19893 is backported to Spark 2.1 and 2.0.1 as well. I do not see a clear justification for why SPARK 19893 is important and needed. I have a sample table which works fine with an earlier build of Spark 2.1.0. Now that the latest build is having the backport of SPARK-19893, its failing w

Re: Result obtained before the completion of Stages

2017-12-27 Thread ckhari4u
That's a good catch. I just checked the jstack, ps -ef of executor process. they are progressing and completing much after the result generation. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To un

Re: Result obtained before the completion of Stages

2017-12-26 Thread ckhari4u
Hi Reynold, I am running a Spark SQL query. val df = spark.sql("select * from table1 t1 join table2 t2 on t1.col1=t2.col1") df.count() -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscrib

Re: Result obtained before the completion of Stages

2017-12-26 Thread ckhari4u
Hi Sean, Thanks for the reply. I believe I am not facing the scenarios you mentioned. Timestamp conflict: I see the Spark driver logs on the console (tried with INFO and DEBUG). In all the scenarios, I see the result getting printed and the application execution continues for 4 more minutes. ie

Result obtained before the completion of Stages

2017-12-26 Thread ckhari4u
I found this interesting behavior while running some adhoc analysis query. I have a Spark SQL query where I am joining 2 tables and then performing a count operation. In the Spark Web UI, I see there are 4 Stages getting launched. The interesting behavior I see here is that I see the result befor

Spark History Server does not redirect to Yarn aggregated logs for container logs

2017-06-07 Thread ckhari4u
Hey Guys, I am hitting the below issue when trying to access the STDOUT/STDERR logs in Spark History Server for the executors of a Spark application executed in Yarn mode. I have enabled Yarn log aggregation. Repro Steps: 1) Run the spark-shell in yarn client mode. Or run Pi job in Yarn mode.