Re: How to convert spark data-frame to datasets?

2016-11-21 Thread Minudika Malshan
Hi, Thanks all for the support. And sorry for the mistake done by posting here instead of users list. :) BR On Tue, Nov 22, 2016 at 10:33 AM, Sachith Withana wrote: > Hi Minudika, > > To add to what Oscar said, this blog post [1] should clarify it for you. > And this should be posted in the us

Re: How to convert spark data-frame to datasets?

2016-11-21 Thread Sachith Withana
Hi Minudika, To add to what Oscar said, this blog post [1] should clarify it for you. And this should be posted in the user-list not the dev. [1] https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html Cheers, Sachith On Thu, Aug 18, 2016 at 8:

Re: [SPARK-16654][CORE][WIP] Add UI coverage for Application Level Blacklisting

2016-11-21 Thread Reynold Xin
You can submit a pull request against Imran's branch for the pull request. On Mon, Nov 21, 2016 at 7:33 PM Jose Soltren wrote: > Hi - I'm proposing a patch set for UI coverage of Application Level > Blacklisting: > > https://github.com/jsoltren/spark/pull/1 > > This patch set builds on top of Im

Please limit commits for branch-2.1

2016-11-21 Thread Joseph Bradley
To committers and contributors active in MLlib, Thanks everyone who has started helping with the QA tasks in SPARK-18316! I'd like to request that we stop committing non-critical changes to MLlib, including the Python and R APIs, since still-changing public APIs make it hard to QA. We need have a

Re: Memory leak warnings in Spark 2.0.1

2016-11-21 Thread Nicholas Chammas
I'm also curious about this. Is there something we can do to help troubleshoot these leaks and file useful bug reports? On Wed, Oct 12, 2016 at 4:33 PM vonnagy wrote: > I am getting excessive memory leak warnings when running multiple mapping > and > aggregations and using DataSets. Is there any

RE: MinMaxScaler behaviour

2016-11-21 Thread Joeri Hermans
I see. I think I read the documentation a little bit too quick :) My apologies. Kind regards, Joeri From: Sean Owen [so...@cloudera.com] Sent: 21 November 2016 21:32 To: Joeri Hermans; dev@spark.apache.org Subject: Re: MinMaxScaler behaviour It's a dege

Re: MinMaxScaler behaviour

2016-11-21 Thread Sean Owen
It's a degenerate case of course. 0, 0.5 and 1 all make about as much sense. Is there a strong convention elsewhere to use 0? Min/max scaling is the wrong thing to do for a data set like this anyway. What you probably intend to do is scale each image so that its max intensity is 1 and min intensit

MinMaxScaler behaviour

2016-11-21 Thread Joeri Hermans
Hi all, I observed some weird behaviour while applying some feature transformations using MinMaxScaler. More specifically, I was wondering if this behaviour is intended and makes sense? Especially because I explicitly defined min and max. Basically, I am preprocessing the MNIST dataset, and the

Re: OutOfMemoryError on parquet SnappyDecompressor

2016-11-21 Thread Ryan Blue
It's unlikely that you're hitting this, unless you have several tasks writing at once on the same executor. Parquet does have high memory consumption, so the most likely explanation is either that you're close to the memory limit for other reasons, or that you need to increase the amount of overhea

Re: OutOfMemoryError on parquet SnappyDecompressor

2016-11-21 Thread Aniket
Thanks Ryan. I am running into this rarer issue. For now, I have moved away from parquet but if I will create a bug in jira if I am able to produce code that easily reproduces this. Thanks, Aniket On Mon, Nov 21, 2016, 3:24 PM Ryan Blue [via Apache Spark Developers List] < ml-node+s1001551n19972.

Re: OutOfMemoryError on parquet SnappyDecompressor

2016-11-21 Thread Ryan Blue
Aniket, The solution was to add a sort so that only one file is written at a time, which minimizes the memory footprint of columnar formats like Parquet. That's been released for quite a while, so memory issues caused by Parquet are more rare now. If you're using Parquet default settings and a rec

Re: How is the order ensured in the jdbc relation provider when inserting data from multiple executors

2016-11-21 Thread Maciej Szymkiewicz
In commonly used RDBM systems relations have no fixed order and physical location of the records can change during routine maintenance operations. Unless you explicitly order data during retrieval order you see is incidental and not guaranteed. Conclusion: order of inserts just doesn't matter. O

How is the order ensured in the jdbc relation provider when inserting data from multiple executors

2016-11-21 Thread Niranda Perera
Hi, Say, I have a table with 1 column and 1000 rows. I want to save the result in a RDBMS table using the jdbc relation provider. So I run the following query, "insert into table table2 select value, count(*) from table1 group by value order by value" While debugging, I found that the resultant