Hi Benjamin, SAMZA-968 <https://issues.apache.org/jira/browse/SAMZA-968> is already assigned to you.
Thanks, Jagadish On Thu, Jun 16, 2016 at 10:51 AM, Benjamin Smith < ben.sm...@ranksoftwareinc.com> wrote: > Sure, looks like a straightforward enough change. > > > I've created: https://issues.apache.org/jira/browse/SAMZA-968 > > > I don't see anyway to assign it to myself though? > > ________________________________ > From: Yi Pan <nickpa...@gmail.com> > Sent: Thursday, June 16, 2016 1:02:59 PM > To: dev@samza.apache.org > Subject: Re: Bug in SequenceFileHdfsFileWriter > > Hi, Benjamin, > > Thanks a lot for reporting this! It makes sense from reading the posts. > Could you open a JIRA? Are you interested in assigning to yourself and > contribute the fix? > > Thanks a lot again! > > -Yi > > On Thu, Jun 16, 2016 at 9:52 AM, Benjamin Smith < > ben.sm...@ranksoftwareinc.com> wrote: > > > > > Hello, > > > > I am working on a project where we are integrating Samza and Hive. As > part > > of this project, we ran into an issue where sequence files written from > > Samza were taking a long time (hours) to completely sync with HDFS. > > > > After some Googling and digging into the code, it appears that the issue > > is here: > > > > > https://github.com/apache/samza/blob/master/samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala#L111 > > > > Writer.stream(dfs.create(path)) implies that the caller of > > dfs.create(path) is responsible for closing the created stream > explicitly. > > This doesn't happen, and the SequenceFileHdfsWriter call to close will > only > > flush the stream. > > > > I believe the correct line should be: > > > > Writer.file(path) > > > > Or, SequenceFileHdfsWriter should explicitly track and close the stream. > > > > Thanks! > > > > Ben > > > > Refernece material: > > > > > http://stackoverflow.com/questions/27916872/why-the-sequencefile-is-truncated > > > > > https://apache.googlesource.com/hadoop-common/+/HADOOP-6685/src/java/org/apache/hadoop/io/SequenceFile.java#1238 > > > > > -- Jagadish V, Graduate Student, Department of Computer Science, Stanford University