Re: Unable to access Resource Manager /Name Node on port 9026 / 9101 on a Spark EMR Cluster

2016-04-15 Thread Wei-Shun Lo
Hi Chanda, You may want to check by using nmap to check whether the port and service is correctly started locally. ex. nmap localhost If the port is already successfully internally, it might be related to the outbound/inbound traffic control in your security group setting. Just fyi. On Fri, Ap

Will not store rdd_16_4383 as it would require dropping another block from the same RDD

2016-04-15 Thread Alexander Pivovarov
I run Spark 1.6.1 on YARN (EMR-4.5.0) I call RDD.count on MEMORY_ONLY_SER cached RDD (spark.serializer is KryoSerializer) after count task is done I noticed that Spark UI shows that RDD Fraction Cached is 6% only Size in Memory = 65.3 GB I looked at Executors stderr on Spark UI and saw lots of

Re: Skipping Type Conversion and using InternalRows for UDF

2016-04-15 Thread Michael Armbrust
This would also probably improve performance: https://github.com/apache/spark/pull/9565 On Fri, Apr 15, 2016 at 8:44 AM, Hamel Kothari wrote: > Hi all, > > So we have these UDFs which take <1ms to operate and we're seeing pretty > poor performance around them in practice, the overhead being >10m

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Mridul Muralidharan
On Friday, April 15, 2016, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > Yeah in support of this statement I think that my primary interest in > this Spark Extras and the good work by Luciano here is that anytime we > take bits out of a code base and “move it to GitHub” I see

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Cody Koeninger
100% agree with Sean & Reynold's comments on this. Adding this as a TLP would just cause more confusion as to "official" endorsement. On Fri, Apr 15, 2016 at 11:50 AM, Sean Owen wrote: > On Fri, Apr 15, 2016 at 5:34 PM, Luciano Resende wrote: >> I know the name might be confusing, but I also

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Mattmann, Chris A (3980)
Yeah in support of this statement I think that my primary interest in this Spark Extras and the good work by Luciano here is that anytime we take bits out of a code base and “move it to GitHub” I see a bad precedent being set. Creating this project at the ASF creates a synergy between *Apache Spar

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Mattmann, Chris A (3980)
Hey Reynold, Thanks. Getting to the heart of this, I think that this project would be successful if the Apache Spark PMC decided to participate and there was some overlap. As much as I think it would be great to stand up another project, the goal here from Luciano and crew (myself included) would

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Mattmann, Chris A (3980)
Yeah, so it’s the *Apache Spark* project. Just to clarify. Not once did you say Apache Spark below. On 4/15/16, 9:50 AM, "Sean Owen" wrote: >On Fri, Apr 15, 2016 at 5:34 PM, Luciano Resende wrote: >> I know the name might be confusing, but I also think that the projects have >> a very big

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Jean-Baptiste Onofré
+1 Regards JB On 04/15/2016 06:41 PM, Mattmann, Chris A (3980) wrote: Yeah in support of this statement I think that my primary interest in this Spark Extras and the good work by Luciano here is that anytime we take bits out of a code base and “move it to GitHub” I see a bad precedent being set

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Sean Owen
On Fri, Apr 15, 2016 at 5:34 PM, Luciano Resende wrote: > I know the name might be confusing, but I also think that the projects have > a very big synergy, more like sibling projects, where "Spark Extras" extends > the Spark community and develop/maintain components for, and pretty much > only for

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Reynold Xin
Anybody is free and welcomed to create another ASF project, but I don't think "Spark extras" is a good name. It unnecessarily creates another tier of code that ASF is "endorsing". On Friday, April 15, 2016, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > Yeah in support of this

ClassFormatError in latest spark 2 SNAPSHOT build

2016-04-15 Thread Koert Kuipers
not sure why, but i am getting this today using spark 2 snapshots... i am on java 7 and scala 2.11 16/04/15 12:35:46 WARN TaskSetManager: Lost task 2.0 in stage 3.0 (TID 15, localhost): java.lang.ClassFormatError: Duplicate field name&signature in class file org/apache/spark/sql/catalyst/expressio

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Sean Owen
I think this meant to be understood as a community site, and as a directory listing pointers to third-party projects. It's not a project of its own, and not part of Spark itself, with no special status. At least, I think that's how it should be presented and pretty much seems to come across that wa

Re: ClassFormatError in latest spark 2 SNAPSHOT build

2016-04-15 Thread Reynold Xin
Can you post the generated code? df.queryExecution.debug.codeGen() (Or something similar to that) On Friday, April 15, 2016, Koert Kuipers wrote: > not sure why, but i am getting this today using spark 2 snapshots... > i am on java 7 and scala 2.11 > > 16/04/15 12:35:46 WARN TaskSetManager: Lo

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Luciano Resende
On Fri, Apr 15, 2016 at 9:34 AM, Cody Koeninger wrote: > Given that not all of the connectors were removed, I think this > creates a weird / confusing three tier system > > 1. connectors in the official project's spark/extras or spark/external > 2. connectors in "Spark Extras" > 3. connectors in

Re: BytesToBytes and unaligned memory

2016-04-15 Thread Ted Yu
I am curious if all Spark unit tests pass with the forced true value for unaligned. If that is the case, it seems we can add s390x to the known architectures. It would also give us some more background if you can describe how java.nio.Bits#unaligned() is implemented on s390x. Josh / Andrew / Davi

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Luciano Resende
On Fri, Apr 15, 2016 at 9:18 AM, Sean Owen wrote: > Why would this need to be an ASF project of its own? I don't think > it's possible to have a yet another separate "Spark Extras" TLP (?) > > There is already a project to manage these bits of code on Github. How > about all of the interested par

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Cody Koeninger
Given that not all of the connectors were removed, I think this creates a weird / confusing three tier system 1. connectors in the official project's spark/extras or spark/external 2. connectors in "Spark Extras" 3. connectors in some random organization's github On Fri, Apr 15, 2016 at 11:18 A

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Chris Fregly
and how does this all relate to the existing 1-and-a-half-class citizen known as spark-packages.org? support for this citizen is buried deep in the Spark source (which was always a bit odd, in my opinion): https://github.com/apache/spark/search?utf8=%E2%9C%93&q=spark-packages On Fri, Apr 15, 20

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Sean Owen
Why would this need to be an ASF project of its own? I don't think it's possible to have a yet another separate "Spark Extras" TLP (?) There is already a project to manage these bits of code on Github. How about all of the interested parties manage the code there, under the same process, under the

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Luciano Resende
After some collaboration with other community members, we have created a initial draft for Spark Extras which is available for review at https://docs.google.com/document/d/1zRFGG4414LhbKlGbYncZ13nyX34Rw4sfWhZRA5YBtIE/edit?usp=sharing We would like to invite other community members to participate

Re: BytesToBytes and unaligned memory

2016-04-15 Thread Adam Roberts
Ted, yeah with the forced true value the tests in that suite all pass and I know they're being executed thanks to prints I've added Cheers, From: Ted Yu To: Adam Roberts/UK/IBM@IBMGB Cc: "dev@spark.apache.org" Date: 15/04/2016 16:43 Subject:Re: BytesToBytes and unaligned

Skipping Type Conversion and using InternalRows for UDF

2016-04-15 Thread Hamel Kothari
Hi all, So we have these UDFs which take <1ms to operate and we're seeing pretty poor performance around them in practice, the overhead being >10ms for the projections (this data is deeply nested with ArrayTypes and MapTypes so that could be the cause). Looking at the logs and code for ScalaUDF, I

Re: BytesToBytes and unaligned memory

2016-04-15 Thread Ted Yu
Can you clarify whether BytesToBytesMapOffHeapSuite passed or failed with the forced true value for unaligned ? If the test failed, please pastebin the failure(s). Thanks On Fri, Apr 15, 2016 at 8:32 AM, Adam Roberts wrote: > Ted, yep I'm working from the latest code which includes that unalig

Re: Unable to access Resource Manager /Name Node on port 9026 / 9101 on a Spark EMR Cluster

2016-04-15 Thread Jonathan Kelly
Ever since emr-4.x, the service ports have been synced as much as possible with open source, so the YARN ResourceManager UI is on port 8088, and the NameNode UI is on port 50070. See http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-release-differences.html#d0e23719 for more infor

Re: BytesToBytes and unaligned memory

2016-04-15 Thread Adam Roberts
Ted, yep I'm working from the latest code which includes that unaligned check, for experimenting I've modified that code to ignore the unaligned check (just go ahead and say we support it anyway, even though our JDK returns false: the return value of java.nio.Bits.unaligned()). My Platform.java

Re: BytesToBytes and unaligned memory

2016-04-15 Thread Ted Yu
I assume you tested 2.0 with SPARK-12181 . Related code from Platform.java if java.nio.Bits#unaligned() throws exception: // We at least know x86 and x64 support unaligned access. String arch = System.getProperty("os.arch", ""); //noinspection DynamicRegexReplaceableByCompiledPa

Unable to access Resource Manager /Name Node on port 9026 / 9101 on a Spark EMR Cluster

2016-04-15 Thread Chadha Pooja
Hi , We have setup a Spark Cluster (3 node) on Amazon EMR. We aren't able to use port 9026 and 9101 on the existing Spark EMR Cluster which are part of the Web UIs offered with Amazon EMR. I was able to use other ports like Zeppelin port, 8890, HUE etc We checked that the security settings cu

BytesToBytes and unaligned memory

2016-04-15 Thread Adam Roberts
Hi, I'm testing Spark 2.0.0 on various architectures and have a question, are we sure if core/src/test/java/org/apache/spark/unsafe/map/AbstractBytesToBytesMapSuite.java really is attempting to use unaligned memory access (for the BytesToBytesMapOffHeapSuite tests specifically)? Our JDKs on z

Re: Should localProperties be inheritable? Should we change that or document it?

2016-04-15 Thread Marcin Tustin
It would be a pleasure. That said, what do you think about adding the non-inheritable feature? I think that would be a big win for everything that doesn't specifically need Inheritability. On Friday, April 15, 2016, Reynold Xin wrote: > I think this was added a long time ago by me in order to ma

Re: Should localProperties be inheritable? Should we change that or document it?

2016-04-15 Thread Reynold Xin
I think this was added a long time ago by me in order to make certain things work for Shark (good old times ...). You are probably right that by now some apps depend on the fact that this is inheritable, and changing that could break them in weird ways. Do you mind documenting this, and also add a