Re: Spark Improvement Proposals(Internet mail)
There’s no need to compare to Flink’s Streaming Model. Spark should focus more on how to go beyond itself. From the beginning, Spark’s success comes from it’s unified model can satisfiy SQL,Streaming, Machine Learning Models and Graphs Jobs …… all in One. But From 1.6 to 2.0, the abstraction from RDD to DataFrame make no contribution to these two important areas (ML & Graph) with any substantial progress. Most things is for SQL and Streaming, which make Spark have to face the competition with Flink. But guys, these is not surposed to be the battle what Spark should face. SIP is a good start. Voice from technical communication should be heard and accepted, not buried in the PR bodies. Nowadays, Spark don’t lack of committers or contributors. The right direction and focus area, will decide where it goes, what competitor it encounter, and finally what it can be. --- Sincerely Andy 原始邮件 发件人: Debasish Das 收件人: Tomasz Gawęda 抄送: dev@spark.apache.org; Cody Koeninger 发送时间: 2016年10月17日(周一) 10:21 主题: Re: Spark Improvement Proposals(Internet mail) Thanks Cody for bringing up a valid point...I picked up Spark in 2014 as soon as I looked into it since compared to writing Java map-reduce and Cascading code, Spark made writing distributed code fun...But now as we went deeper with Spark and real-time streaming use-case gets more prominent, I think it is time to bring a messaging model in conjunction with the batch/micro-batch API that Spark is good atakka-streams close integration with spark micro-batching APIs looks like a great direction to stay in the game with Apache Flink...Spark 2.0 integrated streaming with batch with the assumption is that micro-batching is sufficient to run SQL commands on stream but do we really have time to do SQL processing at streaming data within 1-2 seconds ? After reading the email chain, I started to look into Flink documentation and if you compare it with Spark documentation, I think we have major work to do detailing out Spark internals so that more people from community start to take active role in improving the issues so that Spark stays strong compared to Flink. https://cwiki.apache.org/confluence/display/SPARK/Spark+Internals https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals Spark is no longer an engine that works for micro-batch and batch...We (and I am sure many others) are pushing spark as an engine for stream and query processing.we need to make it a state-of-the-art engine for high speed streaming data and user queries as well ! On Sun, Oct 16, 2016 at 1:30 PM, Tomasz Gawęda mailto:tomasz.gaw...@outlook.com>> wrote: Hi everyone, I'm quite late with my answer, but I think my suggestions may help a little bit. :) Many technical and organizational topics were mentioned, but I want to focus on these negative posts about Spark and about "haters" I really like Spark. Easy of use, speed, very good community - it's everything here. But Every project has to "flight" on "framework market" to be still no 1. I'm following many Spark and Big Data communities, maybe my mail will inspire someone :) You (every Spark developer; so far I didn't have enough time to join contributing to Spark) has done excellent job. So why are some people saying that Flink (or other framework) is better, like it was posted in this mailing list? No, not because that framework is better in all cases.. In my opinion, many of these discussions where started after Flink marketing-like posts. Please look at StackOverflow "Flink vs " posts, almost every post in "winned" by Flink. Answers are sometimes saying nothing about other frameworks, Flink's users (often PMC's) are just posting same information about real-time streaming, about delta iterations, etc. It look smart and very often it is marked as an aswer, even if - in my opinion - there wasn't told all the truth. My suggestion: I don't have enough money and knowledgle to perform huge performance test. Maybe some company, that supports Spark (Databricks, Cloudera? - just saying you're most visible in community :) ) could perform performance test of: - streaming engine - probably Spark will loose because of mini-batch model, however currently the difference should be much lower that in previous versions - Machine Learning models - batch jobs - Graph jobs - SQL queries People will see that Spark is envolving and is also a modern framework, because after reading posts mentioned above people may think "it is outdated, future is in framework X". Matei Zaharia posted excellent blog post about how Spark Structured Streaming beats every other framework in terms of easy-of-use and reliability. Performance tests, done in various environments (in example: laptop, small 2 node cluster, 10-node cluster, 20-node cluster), could be also very good marketing stuff to say "hey, you're telling that you're better, but Spark is still faster and is still getting even more fast!". This would be bas
trying to use Spark applications with modified Kryo
Hi I want to run some Spark applications with some changes in Kryo serializer. Please correct me, but I think I need to recompile spark (instead of just the Spark applications) in order to use the newly built Kryo serializer? I obtained Kryo 3.0.3 source and built it (mvn package install). Next, I took the source code for Spark 2.0.1 and built it (build/mvn -X -DskipTests -Dhadoop.version=2.6.0 clean package) I then compiled the Spark applications. However, I am not seeing my Kryo changes when I run the Spark applications. Please let me know if my assumptions and steps are correct. Thank you Prasun - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Custom Monitoring of Spark applications
Hi all, I am trying to write a custom Source for counting errors and output that with Spark sink mechanism ( CSV or JMX ) and having some problems understanding how this works. 1. I defined the Source, added counters created with MetricRegistry and registered the Source > SparkEnv.get().metricsSystem().registerSource(this) 2. Used that counter ( I could printout in driver the value ) 3. With CsvSink my counter is reported but value is 0. !! I have following questions: - I expect that codehale's Counter is serialised and registered but because objects are different is not the right counter. I have a version with accumulator and is working fine just little worried about performance. ( and design ) Is there another way of doing this ? maybe static fields ? - When running on YARN how many sink objects will be created ? - If I will create some singleton object and register that counter in Spark, counting is right but will never report from executor. How to enable reporting from executors when running on YARN ? My custom Source: public class CustomMonitoring implements Source { > private MetricRegistry metricRegistry = new MetricRegistry(); > public CustomMonitoring(List counts) { > for (String count : counts) { > metricRegistry.counter(count); > } > SparkEnv.get().metricsSystem().registerSource(this); > } > @Override > public String sourceName() { > return TURBINE_CUSTOM_MONITORING; > } > public MetricRegistry metricRegistry() { > return metricRegistry; > } > } metrics.properties > *.sink.csv.class=org.apache.spark.metrics.sink.CsvSink > *.sink.csv.directory=/tmp/csvSink/ > *.sink.csv.period=60 > *.sink.csv.unit=seconds Thanks you, Nicolae R.
Re: trying to use Spark applications with modified Kryo
On 17 Oct 2016, at 10:02, Prasun Ratn mailto:prasun.r...@gmail.com>> wrote: Hi I want to run some Spark applications with some changes in Kryo serializer. Please correct me, but I think I need to recompile spark (instead of just the Spark applications) in order to use the newly built Kryo serializer? I obtained Kryo 3.0.3 source and built it (mvn package install). Next, I took the source code for Spark 2.0.1 and built it (build/mvn -X -DskipTests -Dhadoop.version=2.6.0 clean package) I then compiled the Spark applications. However, I am not seeing my Kryo changes when I run the Spark applications. Kryo versions are very brittle. You'll -need to get an up to date/consistent version of Chill, which is where the transitive dependency on Kryo originates -rebuild spark depending on that chill release if you want hive integration, probably also rebuild Hive to be consistent too; the main reason Spark has its own Hive version is that Kryo version sharing. https://github.com/JoshRosen/hive/commits/release-1.2.1-spark2 Kryo has repackaged their class locations between versions. This lets the versions co-exist, but probably also explains why your apps aren't picking up the diffs. Finally, keep an eye on this github PR https://github.com/twitter/chill/issues/252
Re: trying to use Spark applications with modified Kryo
Thanks a lot Steve! On Mon, Oct 17, 2016 at 4:59 PM, Steve Loughran wrote: > > On 17 Oct 2016, at 10:02, Prasun Ratn wrote: > > Hi > > I want to run some Spark applications with some changes in Kryo serializer. > > Please correct me, but I think I need to recompile spark (instead of > just the Spark applications) in order to use the newly built Kryo > serializer? > > I obtained Kryo 3.0.3 source and built it (mvn package install). > > Next, I took the source code for Spark 2.0.1 and built it (build/mvn > -X -DskipTests -Dhadoop.version=2.6.0 clean package) > > I then compiled the Spark applications. > > However, I am not seeing my Kryo changes when I run the Spark applications. > > > Kryo versions are very brittle. > > You'll > > -need to get an up to date/consistent version of Chill, which is where the > transitive dependency on Kryo originates > -rebuild spark depending on that chill release > > if you want hive integration, probably also rebuild Hive to be consistent > too; the main reason Spark has its own Hive version is that > Kryo version sharing. > > https://github.com/JoshRosen/hive/commits/release-1.2.1-spark2 > > Kryo has repackaged their class locations between versions. This lets the > versions co-exist, but probably also explains why your apps aren't picking > up the diffs. > > Finally, keep an eye on this github PR > > https://github.com/twitter/chill/issues/252 > - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Spark Improvement Proposals
I think narrowly focusing on Flink or benchmarks is missing my point. My point is evolve or die. Spark's governance and organization is hampering its ability to evolve technologically, and it needs to change. On Sun, Oct 16, 2016 at 9:21 PM, Debasish Das wrote: > Thanks Cody for bringing up a valid point...I picked up Spark in 2014 as > soon as I looked into it since compared to writing Java map-reduce and > Cascading code, Spark made writing distributed code fun...But now as we went > deeper with Spark and real-time streaming use-case gets more prominent, I > think it is time to bring a messaging model in conjunction with the > batch/micro-batch API that Spark is good atakka-streams close > integration with spark micro-batching APIs looks like a great direction to > stay in the game with Apache Flink...Spark 2.0 integrated streaming with > batch with the assumption is that micro-batching is sufficient to run SQL > commands on stream but do we really have time to do SQL processing at > streaming data within 1-2 seconds ? > > After reading the email chain, I started to look into Flink documentation > and if you compare it with Spark documentation, I think we have major work > to do detailing out Spark internals so that more people from community start > to take active role in improving the issues so that Spark stays strong > compared to Flink. > > https://cwiki.apache.org/confluence/display/SPARK/Spark+Internals > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals > > Spark is no longer an engine that works for micro-batch and batch...We (and > I am sure many others) are pushing spark as an engine for stream and query > processing.we need to make it a state-of-the-art engine for high speed > streaming data and user queries as well ! > > On Sun, Oct 16, 2016 at 1:30 PM, Tomasz Gawęda > wrote: >> >> Hi everyone, >> >> I'm quite late with my answer, but I think my suggestions may help a >> little bit. :) Many technical and organizational topics were mentioned, >> but I want to focus on these negative posts about Spark and about "haters" >> >> I really like Spark. Easy of use, speed, very good community - it's >> everything here. But Every project has to "flight" on "framework market" >> to be still no 1. I'm following many Spark and Big Data communities, >> maybe my mail will inspire someone :) >> >> You (every Spark developer; so far I didn't have enough time to join >> contributing to Spark) has done excellent job. So why are some people >> saying that Flink (or other framework) is better, like it was posted in >> this mailing list? No, not because that framework is better in all >> cases.. In my opinion, many of these discussions where started after >> Flink marketing-like posts. Please look at StackOverflow "Flink vs " >> posts, almost every post in "winned" by Flink. Answers are sometimes >> saying nothing about other frameworks, Flink's users (often PMC's) are >> just posting same information about real-time streaming, about delta >> iterations, etc. It look smart and very often it is marked as an aswer, >> even if - in my opinion - there wasn't told all the truth. >> >> >> My suggestion: I don't have enough money and knowledgle to perform huge >> performance test. Maybe some company, that supports Spark (Databricks, >> Cloudera? - just saying you're most visible in community :) ) could >> perform performance test of: >> >> - streaming engine - probably Spark will loose because of mini-batch >> model, however currently the difference should be much lower that in >> previous versions >> >> - Machine Learning models >> >> - batch jobs >> >> - Graph jobs >> >> - SQL queries >> >> People will see that Spark is envolving and is also a modern framework, >> because after reading posts mentioned above people may think "it is >> outdated, future is in framework X". >> >> Matei Zaharia posted excellent blog post about how Spark Structured >> Streaming beats every other framework in terms of easy-of-use and >> reliability. Performance tests, done in various environments (in >> example: laptop, small 2 node cluster, 10-node cluster, 20-node >> cluster), could be also very good marketing stuff to say "hey, you're >> telling that you're better, but Spark is still faster and is still >> getting even more fast!". This would be based on facts (just numbers), >> not opinions. It would be good for companies, for marketing puproses and >> for every Spark developer >> >> >> Second: real-time streaming. I've written some time ago about real-time >> streaming support in Spark Structured Streaming. Some work should be >> done to make SSS more low-latency, but I think it's possible. Maybe >> Spark may look at Gearpump, which is also built on top of Akka? I don't >> know yet, it is good topic for SIP. However I think that Spark should >> have real-time streaming support. Currently I see many posts/comments >> that "Spark has too big latency". Spark Streaming is doing very good >> jobs with micro-
Re: cutting 2.0.2?
SPARK-17841 three line bugfix that has a week old PR SPARK-17812 being able to specify starting offsets is a must have for a Kafka mvp in my opinion, already has a PR SPARK-17813 I can put in a PR for this tonight if it'll be considered On Mon, Oct 17, 2016 at 12:28 AM, Reynold Xin wrote: > Since 2.0.1, there have been a number of correctness fixes as well as some > nice improvements to the experimental structured streaming (notably basic > Kafka support). I'm thinking about cutting 2.0.2 later this week, before > Spark Summit Europe. Let me know if there are specific things (bug fixes) > you really want to merge into branch-2.0. > > Cheers. > > - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Odp.: Spark Improvement Proposals
Maybe my mail was not clear enough. I didn't want to write "lets focus on Flink" or any other framework. The idea with benchmarks was to show two things: - why some people are doing bad PR for Spark - how - in easy way - we can change it and show that Spark is still on the top No more, no less. Benchmarks will be helpful, but I don't think they're the most important thing in Spark :) On the Spark main page there is still chart "Spark vs Hadoop". It is important to show that framework is not the same Spark with other API, but much faster and optimized, comparable or even faster than other frameworks. About real-time streaming, I think it would be just good to see it in Spark. I very like current Spark model, but many voices that says "we need more" - community should listen also them and try to help them. With SIPs it would be easier, I've just posted this example as "thing that may be changed with SIP". I very like unification via Datasets, but there is a lot of algorithms inside - let's make easy API, but with strong background (articles, benchmarks, descriptions, etc) that shows that Spark is still modern framework. Maybe now my intention will be clearer :) As I said organizational ideas were already mentioned and I agree with them, my mail was just to show some aspects from my side, so from theside of developer and person who is trying to help others with Spark (via StackOverflow or other ways) Pozdrawiam / Best regards, Tomasz Od: Cody Koeninger Wysłane: 17 października 2016 16:46 Do: Debasish Das DW: Tomasz Gawęda; dev@spark.apache.org Temat: Re: Spark Improvement Proposals I think narrowly focusing on Flink or benchmarks is missing my point. My point is evolve or die. Spark's governance and organization is hampering its ability to evolve technologically, and it needs to change. On Sun, Oct 16, 2016 at 9:21 PM, Debasish Das wrote: > Thanks Cody for bringing up a valid point...I picked up Spark in 2014 as > soon as I looked into it since compared to writing Java map-reduce and > Cascading code, Spark made writing distributed code fun...But now as we went > deeper with Spark and real-time streaming use-case gets more prominent, I > think it is time to bring a messaging model in conjunction with the > batch/micro-batch API that Spark is good atakka-streams close > integration with spark micro-batching APIs looks like a great direction to > stay in the game with Apache Flink...Spark 2.0 integrated streaming with > batch with the assumption is that micro-batching is sufficient to run SQL > commands on stream but do we really have time to do SQL processing at > streaming data within 1-2 seconds ? > > After reading the email chain, I started to look into Flink documentation > and if you compare it with Spark documentation, I think we have major work > to do detailing out Spark internals so that more people from community start > to take active role in improving the issues so that Spark stays strong > compared to Flink. > > https://cwiki.apache.org/confluence/display/SPARK/Spark+Internals > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals > > Spark is no longer an engine that works for micro-batch and batch...We (and > I am sure many others) are pushing spark as an engine for stream and query > processing.we need to make it a state-of-the-art engine for high speed > streaming data and user queries as well ! > > On Sun, Oct 16, 2016 at 1:30 PM, Tomasz Gawęda > wrote: >> >> Hi everyone, >> >> I'm quite late with my answer, but I think my suggestions may help a >> little bit. :) Many technical and organizational topics were mentioned, >> but I want to focus on these negative posts about Spark and about "haters" >> >> I really like Spark. Easy of use, speed, very good community - it's >> everything here. But Every project has to "flight" on "framework market" >> to be still no 1. I'm following many Spark and Big Data communities, >> maybe my mail will inspire someone :) >> >> You (every Spark developer; so far I didn't have enough time to join >> contributing to Spark) has done excellent job. So why are some people >> saying that Flink (or other framework) is better, like it was posted in >> this mailing list? No, not because that framework is better in all >> cases.. In my opinion, many of these discussions where started after >> Flink marketing-like posts. Please look at StackOverflow "Flink vs " >> posts, almost every post in "winned" by Flink. Answers are sometimes >> saying nothing about other frameworks, Flink's users (often PMC's) are >> just posting same information about real-time streaming, about delta >> iterations, etc. It look smart and very often it is marked as an aswer, >> even if - in my opinion - there wasn't told all the truth. >> >> >> My suggestion: I don't have enough money and knowledgle to perform huge >> performance test. Maybe some company, that supports Spark (Databricks, >> Cloudera? -
Re: cutting 2.0.2?
I would very much like to see SPARK-16962 included in 2.0.2 as it addresses unaligned memory access patterns that crash non-x86 platforms. I believe this falls in the category of "correctness fix". We (Oracle SAE) have applied the fixes for SPARK-16962 to branch-2.0 and have not encountered any problems on SPARC or x86 architectures attributable to unaligned accesses. Including this fix will allow Oracle SPARC customers to run Apache Spark without fear of crashing, expanding the reach of Apache Spark and making my life a little easier :) erik.oshaughne...@oracle.com -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/cutting-2-0-2-tp19473p19482.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Fwd: Large variation in spark in Task Deserialization Time
Hi Devs/All, I am seeing a huge variation on spark Task Deserialization Time for my collect and reduce operations. while most tasks complete within 100ms a few take mote than a couple of seconds which slows the entire program down. I have attached a screen shot of the web UI where you can see the variation As you can see the Task Deserialization Time time has a Max of 7s and 75th percentile at 0.3 seconds. Does anyone know the reasons that may cause these kind of numbers. Any help would be greatly appreciated. Best Regards, Pulasthi -- Pulasthi S. Wickramasinghe Graduate Student | Research Assistant School of Informatics and Computing | Digital Science Center Indiana University, Bloomington cell: 224-386-9035
Indexing w spark joins?
Hi, Apologies if I’ve asked this question before but I didn’t see it in the list and I’m certain that my last surviving brain cell has gone on strike over my attempt to reduce my caffeine intake… Posting this to both user and dev because I think the question / topic jumps in to both camps. Again since I’m a relative newbie on spark… I may be missing something so apologies up front… With respect to Spark SQL, in pre 2.0.x, there were only hash joins? In post 2.0.x you have hash, semi-hash , and sorted list merge. For the sake of simplicity… lets forget about cross product joins… Has anyone looked at how we could use inverted tables to improve query performance? The issue is that when you have a data sewer (lake) , what happens when your use case query is orthogonal to how your data is stored? This means full table scans. By using secondary indexes, we can reduce this albeit at a cost of increasing your storage footprint by the size of the index. Are there any JIRAs open that discuss this? Indexes to assist in terms of ‘predicate push downs’ (using the index when a field in a where clause is indexed) rather than performing a full table scan. Indexes to assist in the actual join if the join column is on an indexed column? In the first, using an inverted table to produce a sort ordered set of row keys that you would then use in the join process (same as if you produced the subset based on the filter.) To put this in perspective… here’s a dummy use case… CCCis (CCC) is the middle man in the insurance industry. They have a piece of software that sits in the repair shop (e.g Joe’s Auto Body) and works with multiple insurance carriers. The primary key in their data is going to be Insurance Company | Claim ID. This makes it very easy to find a specific claim for further processing. Now lets say I want to do some analysis on determining the average cost of repairing a front end collision of a Volvo S80? Or Break down the number and types of accidents by car manufacturer , model and color. (Then see if there is any correlation between car color and # and type of accidents) As you can see, all of these queries are orthogonal to my storage. So I need to create secondary indexes to help sift thru the data efficiently. Does this make sense? Please Note: I did some work for CCC back in the late 90’s. Any resemblance to their big data efforts is purely coincidence and you can replace CCC with Allstate, Progressive, StateFarm or some other auto insurance company … Thx -Mike
[build system] jenkins downtime for backups delayed by a hung build
i just noticed that jenkins was still in quiet mode this morning due to a hung build. i killed the build, backups happened, and the queue is now happily building. sorry for any delay! shane - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: source for org.spark-project.hive:1.2.1.spark2
Are these changes that the Hive community has rejected? I don't see a compelling reason to have a long-term Spark fork of Hive. rb On Sat, Oct 15, 2016 at 5:27 AM, Steve Loughran wrote: > > On 15 Oct 2016, at 01:28, Ryan Blue wrote: > > The Spark 2 branch is based on this one: https://github.com/ > JoshRosen/hive/commits/release-1.2.1-spark2 > > > Didn't know this had moved I had an outstanding PR against patricks > which should really go in, if not already taken up ( HIVE-11720 ; > https://github.com/pwendell/hive/pull/2 ) > > > IMO I think it would make sense if -somehow- that hive fork were in the > ASF; it's got to be in sync with Spark releases, and its not been ideal for > me in terms of getting one or two fixes in, the other one being culling > groovy 2.4.4 as an export ( https://github.com/steveloughran/hive/tree/ > stevel/SPARK-13471-groovy-2.4.4 ) > > I don't know if the hive team themselves would be up to having it in their > repo, or if committership logistics would suit it anyway. Otherwise, > approaching infra@ and asking for a forked repo is likely to work with a > bit of prodding > > rb > > On Fri, Oct 14, 2016 at 4:33 PM, Ethan Aubin > wrote: > >> In an email thread [1] from Aug 2015, it was mentioned that the source >> to org.spark-project.hive was at >> https://github.com/pwendell/hive/commits/release-1.2.1-spark . >> That branch has a 1.2.1.spark version but spark 2.0.1 uses >> 1.2.1.spark2. Could anyone point me to the repo for 1.2.1.spark2? >> Thanks --Ethan >> >> [https://mail-archives.apache.org/mod_mbox/spark-dev/201508. >> mbox/%3ca0aa8b38-deee-476a-93ff-92fead06e...@hortonworks.com%3E] >> >> - >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> > > > -- > Ryan Blue > Software Engineer > Netflix > > > -- Ryan Blue Software Engineer Netflix
Re: source for org.spark-project.hive:1.2.1.spark2
IIRC this was all about shading of dependencies, not changes to the source. On Mon, Oct 17, 2016 at 6:26 PM Ryan Blue wrote: > Are these changes that the Hive community has rejected? I don't see a > compelling reason to have a long-term Spark fork of Hive. > > rb > > On Sat, Oct 15, 2016 at 5:27 AM, Steve Loughran > wrote: > > > On 15 Oct 2016, at 01:28, Ryan Blue wrote: > > The Spark 2 branch is based on this one: > https://github.com/JoshRosen/hive/commits/release-1.2.1-spark2 > > > Didn't know this had moved I had an outstanding PR against patricks > which should really go in, if not already taken up ( HIVE-11720 ; > https://github.com/pwendell/hive/pull/2 ) > > > IMO I think it would make sense if -somehow- that hive fork were in the > ASF; it's got to be in sync with Spark releases, and its not been ideal for > me in terms of getting one or two fixes in, the other one being culling > groovy 2.4.4 as an export ( > https://github.com/steveloughran/hive/tree/stevel/SPARK-13471-groovy-2.4.4 > ) > > I don't know if the hive team themselves would be up to having it in their > repo, or if committership logistics would suit it anyway. Otherwise, > approaching infra@ and asking for a forked repo is likely to work with a > bit of prodding > > rb > > On Fri, Oct 14, 2016 at 4:33 PM, Ethan Aubin > wrote: > > In an email thread [1] from Aug 2015, it was mentioned that the source > to org.spark-project.hive was at > https://github.com/pwendell/hive/commits/release-1.2.1-spark . > That branch has a 1.2.1.spark version but spark 2.0.1 uses > 1.2.1.spark2. Could anyone point me to the repo for 1.2.1.spark2? > Thanks --Ethan > > [ > https://mail-archives.apache.org/mod_mbox/spark-dev/201508.mbox/%3ca0aa8b38-deee-476a-93ff-92fead06e...@hortonworks.com%3E > ] > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > > > > -- > Ryan Blue > Software Engineer > Netflix > > > > > > -- > Ryan Blue > Software Engineer > Netflix >
Re: cutting 2.0.2?
(I don't think 2.0.2 will be released for a while if at all but that's not what you're asking I think) It's a fairly safe change, but also isn't exactly a fix in my opinion. Because there are some other changes to make it all work for SPARC, I think it's more realistic to look to the 2.1.0 release anyway, which is likely to come first. On Mon, Oct 17, 2016 at 4:09 PM Erik O'Shaughnessy < erik.oshaughne...@oracle.com> wrote: > I would very much like to see SPARK-16962 included in 2.0.2 as it addresses > unaligned memory access patterns that crash non-x86 platforms. I believe > this falls in the category of "correctness fix". We (Oracle SAE) have > applied the fixes for SPARK-16962 to branch-2.0 and have not encountered > any > problems on SPARC or x86 architectures attributable to unaligned accesses. > Including this fix will allow Oracle SPARC customers to run Apache Spark > without fear of crashing, expanding the reach of Apache Spark and making my > life a little easier :) > > erik.oshaughne...@oracle.com > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/cutting-2-0-2-tp19473p19482.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >
[VOTE] Release Apache Spark 1.6.3 (RC1)
Please vote on releasing the following candidate as Apache Spark version 1.6.3. The vote is open until Thursday, Oct 20, 2016 at 18:00 PDT and passes if a majority of at least 3+1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.6.3 [ ] -1 Do not release this package because ... The tag to be voted on is v1.6.3-rc1 (7375bb0c825408ea010dcef31c0759cf94ffe5c2) This release candidate addresses 50 JIRA tickets: https://s.apache.org/spark-1.6.3-jira The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.6.3-rc1-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1205/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.6.3-rc1-docs/ === == How can I help test this release? === If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions from 1.6.2. == What justifies a -1 vote for this release? This is a maintenance release in the 1.6.x series. Bugs already present in 1.6.2, missing features, or bugs related to new features will not necessarily block this release.