Re: What's the meaning when the partitions is zero?
There are almost no cases in which you'd want a zero-partition RDD. The only one I can think of is an empty RDD, where the number of partitions is irrelevant. Still, I would not be surprised if other parts of the code assume at least 1 partition. Maybe this check could be tightened. It would be interesting to see if the tests catch any scenario where a 0-partition RDD is created, and why. On Fri, Sep 16, 2016 at 7:54 AM, WangJianfei wrote: > class HashPartitioner(partitions: Int) extends Partitioner { > require(partitions >= 0, s"Number of partitions ($partitions) cannot be > negative.") > > the soruce code require(partitions >=0) ,but I don't know why it makes sense > when the partitions is 0. > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/What-s-the-meaning-when-the-partitions-is-zero-tp18957.html > Sent from the Apache Spark Developers List mailing list archive at Nabble.com. > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Why we get 0 when the key is null?
"null" is a valid value in an RDD, so it has to be partition-able. On Fri, Sep 16, 2016 at 2:26 AM, WangJianfei wrote: > When the key is not In the rdd, I can also get an value , I just feel a > little strange. > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/Why-we-get-0-when-the-key-is-null-tp18952p18955.html > Sent from the Apache Spark Developers List mailing list archive at Nabble.com. > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: What's the meaning when the partitions is zero?
They are valid, especially in partition pruning. On Friday, September 16, 2016, Sean Owen wrote: > There are almost no cases in which you'd want a zero-partition RDD. > The only one I can think of is an empty RDD, where the number of > partitions is irrelevant. Still, I would not be surprised if other > parts of the code assume at least 1 partition. > > Maybe this check could be tightened. It would be interesting to see if > the tests catch any scenario where a 0-partition RDD is created, and > why. > > On Fri, Sep 16, 2016 at 7:54 AM, WangJianfei > > wrote: > > class HashPartitioner(partitions: Int) extends Partitioner { > > require(partitions >= 0, s"Number of partitions ($partitions) cannot be > > negative.") > > > > the soruce code require(partitions >=0) ,but I don't know why it makes > sense > > when the partitions is 0. > > > > > > > > -- > > View this message in context: http://apache-spark- > developers-list.1001551.n3.nabble.com/What-s-the-meaning- > when-the-partitions-is-zero-tp18957.html > > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > > > - > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >
Spark 2.0.1 release?
Hi all, Apologies if I've missed anything, but is there likely to see a 2.0.1 bug fix release, or does a jump to 2.1.0 with additional features seem more probable? The issues for 2.0.1 seem pretty much done here https://issues.apache.org/jira/browse/SPARK/fixforversion/12336857/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-issues-panel And there's a lot of good bugfixes in there that I'd love to be able to use without deploying our own builds. Thanks, Ewan This email and any attachments to it may contain confidential information and are intended solely for the addressee. If you are not the intended recipient of this email or if you believe you have received this email in error, please contact the sender and remove it from your system.Do not use, copy or disclose the information contained in this email or in any attachment. RealityMine Limited may monitor email traffic data including the content of email for the purposes of security. RealityMine Limited is a company registered in England and Wales. Registered number: 07920936 Registered office: Warren Bruce Court, Warren Bruce Road, Trafford Park, Manchester M17 1LB
Re: Spark 2.0.1 release?
2.0.1 is definitely coming soon. Was going to tag a rc yesterday but ran into some issue. I will try to do it early next week for rc. On Fri, Sep 16, 2016 at 11:16 AM, Ewan Leith wrote: > Hi all, > > Apologies if I've missed anything, but is there likely to see a 2.0.1 bug > fix release, or does a jump to 2.1.0 with additional features seem more > probable? > > The issues for 2.0.1 seem pretty much done here https://issues.apache.org/ > jira/browse/SPARK/fixforversion/12336857/?selectedTab=com.atlassian. > jira.jira-projects-plugin:version-issues-panel > > And there's a lot of good bugfixes in there that I'd love to be able to > use without deploying our own builds. > > Thanks, > Ewan > > > This email and any attachments to it may contain confidential information > and are intended solely for the addressee. > > > If you are not the intended recipient of this email or if you believe you > have received this email in error, please contact the sender and remove it > from your system.Do not use, copy or disclose the information contained in > this email or in any attachment. > > RealityMine Limited may monitor email traffic data including the content > of email for the purposes of security. > > RealityMine Limited is a company registered in England and Wales. > Registered number: 07920936 Registered office: Warren Bruce Court, Warren > Bruce Road, Trafford Park, Manchester M17 1LB >
Re: Compatibility of 1.6 spark.eventLog with a 2.0 History Server
The problem is we backported the Sql tab ui changes from 2.0 in our 1.6.1. They changed a parameter name in SQLMetricInfo. Still the community version is compatible, ours is not. > On Sep 15, 2016, at 11:08 AM, Mario Ds Briggs wrote: > > I had checked in 1.6.2 and it doesnt. I didnt check in lower versions. The > history server logs do show a 'Replaying log path: file:xxx.inprogress' when > the file is changed , but a refresh on UI doesnt show the new > jobs/stages/tasks whatever > > > > thanks > Mario > > Ryan Williams ---15/09/2016 11:19:56 pm---What is meant by: """ > > From: Ryan Williams > To: Reynold Xin , Mario Ds Briggs/India/IBM@IBMIN > Cc: "dev@spark.apache.org" > Date: 15/09/2016 11:19 pm > Subject: Re: Compatibility of 1.6 spark.eventLog with a 2.0 History Server > > > > > What is meant by: > > """ > (This is because clicking the refresh button in browser, updates the UI with > latest events, where-as in the 1.6 code base, this does not happen) > """ > > Hasn't refreshing the page updated all the information in the UI through the > 1.x line? > >
Re: Spark 2.0.1 release?
There are a few blockers for 2.0.1, but just two. For example https://issues.apache.org/jira/browse/SPARK-17418 must be resolved before another release. On Fri, Sep 16, 2016 at 7:23 PM, Reynold Xin wrote: > 2.0.1 is definitely coming soon. Was going to tag a rc yesterday but ran > into some issue. I will try to do it early next week for rc. > > > On Fri, Sep 16, 2016 at 11:16 AM, Ewan Leith > wrote: >> >> Hi all, >> >> Apologies if I've missed anything, but is there likely to see a 2.0.1 bug >> fix release, or does a jump to 2.1.0 with additional features seem more >> probable? >> >> The issues for 2.0.1 seem pretty much done here >> https://issues.apache.org/jira/browse/SPARK/fixforversion/12336857/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-issues-panel >> >> And there's a lot of good bugfixes in there that I'd love to be able to >> use without deploying our own builds. >> >> Thanks, >> Ewan >> >> >> >> This email and any attachments to it may contain confidential information >> and are intended solely for the addressee. >> >> >> >> If you are not the intended recipient of this email or if you believe you >> have received this email in error, please contact the sender and remove it >> from your system.Do not use, copy or disclose the information contained in >> this email or in any attachment. >> >> RealityMine Limited may monitor email traffic data including the content >> of email for the purposes of security. >> >> RealityMine Limited is a company registered in England and Wales. >> Registered number: 07920936 Registered office: Warren Bruce Court, Warren >> Bruce Road, Trafford Park, Manchester M17 1LB > > - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Spark 2.0.1 release?
That's great news, since it's that close I'll get started on building and testing the branch myself Thanks, Ewan On 16 Sep 2016 19:23, Reynold Xin wrote: 2.0.1 is definitely coming soon. Was going to tag a rc yesterday but ran into some issue. I will try to do it early next week for rc. On Fri, Sep 16, 2016 at 11:16 AM, Ewan Leith mailto:ewan.le...@realitymine.com>> wrote: Hi all, Apologies if I've missed anything, but is there likely to see a 2.0.1 bug fix release, or does a jump to 2.1.0 with additional features seem more probable? The issues for 2.0.1 seem pretty much done here https://issues.apache.org/jira/browse/SPARK/fixforversion/12336857/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-issues-panel And there's a lot of good bugfixes in there that I'd love to be able to use without deploying our own builds. Thanks, Ewan This email and any attachments to it may contain confidential information and are intended solely for the addressee. If you are not the intended recipient of this email or if you believe you have received this email in error, please contact the sender and remove it from your system.Do not use, copy or disclose the information contained in this email or in any attachment. RealityMine Limited may monitor email traffic data including the content of email for the purposes of security. RealityMine Limited is a company registered in England and Wales. Registered number: 07920936 Registered office: Warren Bruce Court, Warren Bruce Road, Trafford Park, Manchester M17 1LB This email and any attachments to it may contain confidential information and are intended solely for the addressee. If you are not the intended recipient of this email or if you believe you have received this email in error, please contact the sender and remove it from your system.Do not use, copy or disclose the information contained in this email or in any attachment. RealityMine Limited may monitor email traffic data including the content of email for the purposes of security. RealityMine Limited is a company registered in England and Wales. Registered number: 07920936 Registered office: Warren Bruce Court, Warren Bruce Road, Trafford Park, Manchester M17 1LB
Re: Spark 1.x/2.x qualifiers in downstream artifact names
On Wed, Aug 24, 2016 at 12:12 PM, Sean Owen wrote: > If you're just varying versions (or things that can be controlled by a > profile, which is most everything including dependencies), you don't > need and probably don't want multiple POM files. Even that wouldn't > mean you can't use classifiers. > It is worse (or better) than that, profiles didn't work for us in combination with Scala 2.10/2.11, so we modify the POM in place as part of CI and the release process. > I have seen it used for HBase, core Hadoop. I am not sure I've seen it > used for Spark 2 vs 1 but no reason it couldn't be. Frequently > projects would instead declare that as of some version, Spark 2 is > required, rather than support both. Or shim over an API difference > with reflection if that's all there was to it. Spark does both of > those sorts of things itself to avoid having to publish multiple > variants at all. (Well, except for Scala 2.10 vs 2.11!) > We shim over Hadoop changes where necessary but the Spark changes between 1.x and 2.x are too much. We have since resolved to deploy separate Spark 1.x and 2.x artifactIds as described below. Relevant pull requests: https://github.com/bigdatagenomics/adam/pull/1123 https://github.com/bigdatagenomics/utils/pull/78 Thanks! michael > > On Wed, Aug 24, 2016 at 6:02 PM, Michael Heuer wrote: > > Have you seen any successful applications of this for Spark 1.x/2.x? > > > > From the doc "The classifier allows to distinguish artifacts that were > built > > from the same POM but differ in their content." > > > > We'd be building from different POMs, since we'd be modifying the Spark > > dependency version (and presumably any other dependencies that needed the > > same Spark 1.x/2.x distinction). > > > > > > On Wed, Aug 24, 2016 at 11:49 AM, Sean Owen wrote: > >> > >> This is also what "classifiers" are for in Maven, to have variations > >> on one artifact and version. https://maven.apache.org/pom.html > >> > >> It has been used to ship code for Hadoop 1 vs 2 APIs. > >> > >> In a way it's the same idea as Scala's "_2.xx" naming convention, with > >> a less unfortunate implementation. > >> > >> > >> On Wed, Aug 24, 2016 at 5:41 PM, Michael Heuer > wrote: > >> > Hello, > >> > > >> > We're a project downstream of Spark and need to provide separate > >> > artifacts > >> > for Spark 1.x and Spark 2.x. Has any convention been established or > >> > even > >> > proposed for artifact names and/or qualifiers? > >> > > >> > We are currently thinking > >> > > >> > org.bdgenomics.adam:adam-{core,apis,cli}_2.1[0,1] for Spark 1.x and > >> > Scala > >> > 2.10 & 2.11 > >> > > >> > and > >> > > >> > org.bdgenomics.adam:adam-{core,apis,cli}-spark2_2.1[0,1] for Spark > 1.x > >> > and > >> > Scala 2.10 & 2.11 > >> > > >> > https://github.com/bigdatagenomics/adam/issues/1093 > >> > > >> > > >> > Thanks in advance, > >> > > >> >michael > > > > >
Re: What's the meaning when the partitions is zero?
if so, we will get exception when the numPartitions is 0. def getPartition(key: Any): Int = key match { case null => 0 //case None => 0 case _ => Utils.nonNegativeMod(key.hashCode, numPartitions) } -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/What-s-the-meaning-when-the-partitions-is-zero-tp18957p18967.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: What's the meaning when the partitions is zero?
When numPartitions is 0, there is no data in the rdd: so getPartition is never invoked. - Mridul On Friday, September 16, 2016, WangJianfei wrote: > if so, we will get exception when the numPartitions is 0. > def getPartition(key: Any): Int = key match { > case null => 0 > //case None => 0 > case _ => Utils.nonNegativeMod(key.hashCode, numPartitions) > } > > > > -- > View this message in context: http://apache-spark- > developers-list.1001551.n3.nabble.com/What-s-the-meaning- > when-the-partitions-is-zero-tp18957p18967.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >
Doubt about ExternalSorter.spillMemoryIteratorToDisk
We can see that when the number of been written objects equals serializerBatchSize, the flush() will be called. But if the objects written exceeds the default buffer size, what will happen? if this situation happens,will the flush() be called automatelly? private[this] def spillMemoryIteratorToDisk(inMemoryIterator: WritablePartitionedIterator) : SpilledFile = { // ignore some code here try { while (inMemoryIterator.hasNext) { val partitionId = inMemoryIterator.nextPartition() require(partitionId >= 0 && partitionId < numPartitions, s"partition Id: ${partitionId} should be in the range [0, ${numPartitions})") inMemoryIterator.writeNext(writer) elementsPerPartition(partitionId) += 1 objectsWritten += 1 if (objectsWritten == serializerBatchSize) { flush() } } // ignore some code here SpilledFile(file, blockId, batchSizes.toArray, elementsPerPartition) } -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Doubt-about-ExternalSorter-spillMemoryIteratorToDisk-tp18969.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org