Re: Block Transfer Service encryption support

2015-03-16 Thread turp1twin
Hey Aaron, That is what I do, except I add the Netty SslHandler in the TransportServer and the TransportClientFactory I do this because the Server pipeline is a bit different as I have to add a Netty ChunkedWriteHandler... Again, this is a "rough" prototype, just to get something working... Ch

Re: Block Transfer Service encryption support

2015-03-16 Thread Aaron Davidson
Out of curiosity, why could we not use Netty's SslHandler injected into the TransportContext pipeline? On Mon, Mar 16, 2015 at 7:56 PM, turp1twin wrote: > Hey Patrick, > > Sorry for the delay, I was at Elastic{ON} last week and well, my day job > has > been keeping me busy... I went ahead and op

Re: Block Transfer Service encryption support

2015-03-16 Thread turp1twin
Hey Patrick, Sorry for the delay, I was at Elastic{ON} last week and well, my day job has been keeping me busy... I went ahead and opened a Jira feature request, https://issues.apache.org/jira/browse/SPARK-6373. In it I reference a commit I made in my fork which is a "rough" implementation, defini

Re: SparkSQL 1.3.0 cannot read parquet files from different file system

2015-03-16 Thread Pei-Lun Lee
Looks like this is already solved in https://issues.apache.org/jira/browse/SPARK-6330 On Mon, Mar 16, 2015 at 6:43 PM, Cheng Lian wrote: > Oh sorry, I misread your question. I thought you were trying something > like parquetFile(“s3n://file1,hdfs://file2”). Yeah, it’s a valid bug. > Thanks for

Re: broadcast hang out

2015-03-16 Thread Reynold Xin
It would be great to add a timeout. Do you mind submitting a pull request? On Sun, Mar 15, 2015 at 10:41 PM, lonely Feb wrote: > Anyone can help? Thanks a lot ! > > 2015-03-16 11:45 GMT+08:00 lonely Feb : > > > yes > > > > 2015-03-16 11:43 GMT+08:00 Mridul Muralidharan : > > > >> Cross region a

Re: enum-like types in Spark

2015-03-16 Thread Aaron Davidson
It's unrelated to the proposal, but Enum#ordinal() should be much faster, assuming it's not serialized to JVMs with different versions of the enum :) On Mon, Mar 16, 2015 at 12:12 PM, Kevin Markey wrote: > In some applications, I have rather heavy use of Java enums which are > needed for related

Re: enum-like types in Spark

2015-03-16 Thread Patrick Wendell
Hey Xiangrui, Do you want to write up a straw man proposal based on this line of discussion? - Patrick On Mon, Mar 16, 2015 at 12:12 PM, Kevin Markey wrote: > In some applications, I have rather heavy use of Java enums which are needed > for related Java APIs that the application uses. And unf

Re: enum-like types in Spark

2015-03-16 Thread Kevin Markey
In some applications, I have rather heavy use of Java enums which are needed for related Java APIs that the application uses. And unfortunately, they are also used as keys. As such, using the native hashcodes makes any function over keys unstable and unpredictable, so we now use Enum.name() a

Re: enum-like types in Spark

2015-03-16 Thread Xiangrui Meng
In MLlib, we use strings for emu-like types in Python APIs, which is quite common in Python and easy for py4j. On the JVM side, we implement `fromString` to convert them back to enums. -Xiangrui On Wed, Mar 11, 2015 at 12:56 PM, RJ Nowling wrote: > How do these proposals affect PySpark? I think

Re: extended jenkins downtime monday, march 16th, plus some hints at the future

2015-03-16 Thread shane knapp
ok, we're back up and building. upgrading the github plugin (and possibly EnvInject) caused the stacktraces, so i've kept those at the old versions that were working before. jenkins and the rest of the plugins are updated and we're g2g. i'll be, of course, keeping an eye on things today and will

Re: extended jenkins downtime monday, march 16th, plus some hints at the future

2015-03-16 Thread shane knapp
looks like we're having some issues w/the pull request builder and cron stacktraces in the logs. i'll be investigating further and will update when i figure out what's going on. On Mon, Mar 16, 2015 at 7:51 AM, shane knapp wrote: > this is starting now. > > On Fri, Mar 13, 2015 at 10:12 AM, sha

Re: extended jenkins downtime monday, march 16th, plus some hints at the future

2015-03-16 Thread shane knapp
this is starting now. On Fri, Mar 13, 2015 at 10:12 AM, shane knapp wrote: > i'll be taking jenkins down for some much-needed plugin updates, as well > as potentially upgrading jenkins itself. > > this will start at 730am PDT, and i'm hoping to have everything up by noon. > > the move to the ana

Re: Typo in 1.3.0 release notes: s/extended renamed/renamed/

2015-03-16 Thread Sean Owen
Here's the sentence: As part of stabilizing the Spark SQL API, the SchemaRDD class has been extended renamed to DataFrame. Yes, I can remove the word 'extended' On Mon, Mar 16, 2015 at 1:18 PM, Joe Halliwell wrote: > Cheers, > Joe > > Best regards, Joe

Typo in 1.3.0 release notes: s/extended renamed/renamed/

2015-03-16 Thread Joe Halliwell
Cheers, Joe Best regards, Joe

Re: problems with Parquet in Spark 1.3.0

2015-03-16 Thread Gil Vernik
I just noticed about this one https://issues.apache.org/jira/browse/SPARK-6351 https://github.com/apache/spark/pull/5039 I verified it and this resolves my issues with Parquet and swift:// name space. From: Gil Vernik/Haifa/IBM@IBMIL To: dev Date: 16/03/2015 02:11 PM Subject:

problems with Parquet in Spark 1.3.0

2015-03-16 Thread Gil Vernik
Hi, I am storing Parquet files in the OpenStack Swift and access those files from Spark. This works perfectly in Spark prior 1.3.0, but in 1.3.0 I am getting this error: Is there some configuration i missed? I am not sure where this error get from, does Spark 1.3.0 requires Parquet files to b

Re: Wrong version on the Spark documentation page

2015-03-16 Thread Cheng Lian
Patrick, Ted - My bad, yeah, it's because of browser cache. On 3/16/15 2:31 AM, Ted Yu wrote: When I enter http://spark.apache.org/docs/latest/ into Chrome address bar, I saw 1.3.0 Cheers On Sun, Mar 15, 2015 at 11:12 AM, Patrick Wendell > wrote: Cheng - what

Re: SparkSQL 1.3.0 cannot read parquet files from different file system

2015-03-16 Thread Cheng Lian
Oh sorry, I misread your question. I thought you were trying something like |parquetFile(“s3n://file1,hdfs://file2”)|. Yeah, it’s a valid bug. Thanks for opening the JIRA ticket and the PR! Cheng On 3/16/15 6:39 PM, Cheng Lian wrote: Hi Pei-Lun, We intentionally disallowed passing multiple

Re: SparkSQL 1.3.0 cannot read parquet files from different file system

2015-03-16 Thread Cheng Lian
Hi Pei-Lun, We intentionally disallowed passing multiple comma separated paths in 1.3.0. One of the reason is that users report that this fail when a file path contain an actual comma in it. In your case, you may do something like this: |val s3nDF = parquetFile("s3n://...") val hdfsDF =

Re: Which OutputCommitter to use for S3?

2015-03-16 Thread Pei-Lun Lee
Hi, I created a JIRA and PR for supporting a s3 friendly output committer for saveAsParquetFile: https://issues.apache.org/jira/browse/SPARK-6352 https://github.com/apache/spark/pull/5042 My approach is add a DirectParquetOutputCommitter class in spark-sql package and use a boolean config variabl

SparkSQL 1.3.0 cannot read parquet files from different file system

2015-03-16 Thread Pei-Lun Lee
Hi, I am using Spark 1.3.0, where I cannot load parquet files from more than one file system, say one s3n://... and another hdfs://..., which worked in older version, or if I set spark.sql.parquet.useDataSourceApi=false in 1.3. One way to fix this is instead of get a single FileSystem from defaul