The nice thing about putting discussion on the Jira is that everything about the bug is in one place. So people looking to understand the discussion a few years from now only have to look on the jira ticket rather than also search the mailing list archives and hope commenters all put the string "SPARK-1855" into the messages.
On Sun, May 18, 2014 at 10:34 AM, Jacek Laskowski <ja...@japila.pl> wrote: > Hi, > > I'm curious if it's a common approach to have discussions in JIRA not here. > I don't think it's the ASF way. > > Pozdrawiam, > Jacek Laskowski > http://blog.japila.pl > 17 maj 2014 23:55 "Matei Zaharia" <matei.zaha...@gmail.com> napisaĆ(a): > > > We do actually have replicated StorageLevels in Spark. You can use > > MEMORY_AND_DISK_2 or construct your own StorageLevel with your own custom > > replication factor. > > > > BTW you guys should probably have this discussion on the JIRA rather than > > the dev list; I think the replies somehow ended up on the dev list. > > > > Matei > > > > On May 17, 2014, at 1:36 AM, Mridul Muralidharan <mri...@gmail.com> > wrote: > > > > > We don't have 3x replication in spark :-) > > > And if we use replicated storagelevel, while decreasing odds of > failure, > > it > > > does not eliminate it (since we are not doing a great job with > > replication > > > anyway from fault tolerance point of view). > > > Also it does take a nontrivial performance hit with replicated levels. > > > > > > Regards, > > > Mridul > > > On 17-May-2014 8:16 am, "Xiangrui Meng" <men...@gmail.com> wrote: > > > > > >> With 3x replication, we should be able to achieve fault tolerance. > > >> This checkPointed RDD can be cleared if we have another in-memory > > >> checkPointed RDD down the line. It can avoid hitting disk if we have > > >> enough memory to use. We need to investigate more to find a good > > >> solution. -Xiangrui > > >> > > >> On Fri, May 16, 2014 at 4:00 PM, Mridul Muralidharan < > mri...@gmail.com> > > >> wrote: > > >>> Effectively this is persist without fault tolerance. > > >>> Failure of any node means complete lack of fault tolerance. > > >>> I would be very skeptical of truncating lineage if it is not > reliable. > > >>> On 17-May-2014 3:49 am, "Xiangrui Meng (JIRA)" <j...@apache.org> > > wrote: > > >>> > > >>>> Xiangrui Meng created SPARK-1855: > > >>>> ------------------------------------ > > >>>> > > >>>> Summary: Provide memory-and-local-disk RDD checkpointing > > >>>> Key: SPARK-1855 > > >>>> URL: > https://issues.apache.org/jira/browse/SPARK-1855 > > >>>> Project: Spark > > >>>> Issue Type: New Feature > > >>>> Components: MLlib, Spark Core > > >>>> Affects Versions: 1.0.0 > > >>>> Reporter: Xiangrui Meng > > >>>> > > >>>> > > >>>> Checkpointing is used to cut long lineage while maintaining fault > > >>>> tolerance. The current implementation is HDFS-based. Using the > > BlockRDD > > >> we > > >>>> can create in-memory-and-local-disk (with replication) checkpoints > > that > > >> are > > >>>> not as reliable as HDFS-based solution but faster. > > >>>> > > >>>> It can help applications that require many iterations. > > >>>> > > >>>> > > >>>> > > >>>> -- > > >>>> This message was sent by Atlassian JIRA > > >>>> (v6.2#6252) > > >>>> > > >> > > > > >