Re: [SQL] Why no numOutputRows metric for LocalTableScanExec in webUI?

2017-11-16 Thread Shixiong(Ryan) Zhu
SQL metrics are collected using SparkListener. If there are no tasks, org.apache.spark.sql.execution.ui.SQLListener cannot collect any metrics. On Thu, Nov 16, 2017 at 1:53 AM, Jacek Laskowski wrote: > Hi, > > I seem to have figured out why the metric is not in the web UI for the > query, but wi

[ML] Spark Package Release: Deep Learning Pipelines 0.2.0

2017-11-16 Thread Siddharth Murching
Hi all, Just wanted to announce that Deep Learning Pipelines 0.2.0 has been released, providing utilities for transfer learning, parallelized hyperparameter tuning of Keras models, and applying neural networks to DataFrames as SQL UDFs. Spark packages: https://spark-packages.org/package/databrick

Re: Faster and Lower memory implementation toPandas

2017-11-16 Thread Reynold Xin
Please send a PR. Thanks for looking at this. On Thu, Nov 16, 2017 at 7:27 AM Andrew Andrade wrote: > Hello devs, > > I know a lot of great work has been done recently with pandas to spark > dataframes and vice versa using Apache Arrow, but I faced a specific pain > point on a low memory setup w

Faster and Lower memory implementation toPandas

2017-11-16 Thread Andrew Andrade
Hello devs, I know a lot of great work has been done recently with pandas to spark dataframes and vice versa using Apache Arrow, but I faced a specific pain point on a low memory setup without Arrow. Specifically I was finding a driver OOM running a toPandas on a small dataset (<100 MB compressed

Re: HashingTFModel/IDFModel in Structured Streaming

2017-11-16 Thread Davis Varghese
Since we are on spark 2.2, I backported/fixed it. Here is the diff file comparing against https://github.com/apache/spark/blob/73fe1d8087cfc2d59ac5b9af48b4cf5f5b86f920/mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala 24c24 < import org.apache.spark.ml.param.{Param, ParamMap, P

Re: [SQL] Why no numOutputRows metric for LocalTableScanExec in webUI?

2017-11-16 Thread Jacek Laskowski
Hi, I seem to have figured out why the metric is not in the web UI for the query, but wish I knew how to explain it for any metric and operator. It seems that numOutputRows metric won't be displayed in web UI when a query uses no Spark jobs. val names = Seq("Jacek", "Agata").toDF("name") // no