> On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote: > > Thanks for the contribution! I added a few comments below. Thanks
You're welcome. Thanks for Samza. > On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote: > > gradle.properties, line 18 > > <https://reviews.apache.org/r/43732/diff/1/?file=1254930#file1254930line18> > > > > Please keep the -SNAPSHOT suffix here. Fixed in new patch. > On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote: > > gradle/dependency-versions.gradle, line 32 > > <https://reviews.apache.org/r/43732/diff/1/?file=1254931#file1254931line32> > > > > Please remove -MINE. I had (Snappy linking) issues with the standard RocksDb and had to compile my own. See SAMZA-870. Fixed in new patch. > On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote: > > gradle/dependency-versions.gradle, line 23 > > <https://reviews.apache.org/r/43732/diff/1/?file=1254931#file1254931line23> > > > > There is SAMZA-878 opened for this jackson library upgrade. You can > > associate the patch w/ that ticket as well. Attached the whole patch submitted here as I wasn't sure. Seems overkill. > On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote: > > gradle/dependency-versions.gradle, line 33 > > <https://reviews.apache.org/r/43732/diff/1/?file=1254931#file1254931line33> > > > > The minimum supported YARN version is still 2.6.1 in Samza and we have > > not deprecated the support yet. Hence, I am curious why you need 2.7.1 here? Had trouble getting jobs to work on my 2.7.1 cluster but I'm no longer certain that changing this was what cured it. Dropping. > On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote: > > samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala, > > line 50 > > <https://reviews.apache.org/r/43732/diff/1/?file=1254932#file1254932line50> > > > > Adding configuration variables also requires adding documentation to > > docs/learn/documentation/versioned/jobs/configuration-table.html. Fixed in new patch. > On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote: > > samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala, > > line 30 > > <https://reviews.apache.org/r/43732/diff/1/?file=1254933#file1254933line30> > > > > Please remove the auto-generated author info. Apache does not allow > > this. Fixed in new patch. > On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote: > > samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala, > > line 33 > > <https://reviews.apache.org/r/43732/diff/1/?file=1254933#file1254933line33> > > > > How is this class used? It would be nice to create some javadoc here to > > explain it. > > > > A set of unit tests for the new class is also needed here. Fixed in new patch. > On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote: > > samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbOptionsHelper.java, > > line 19 > > <https://reviews.apache.org/r/43732/diff/1/?file=1254934#file1254934line19> > > > > This does not seem to be relevant to the AvroDataFileWriter? Can we > > move it to another patch? Dropped from new patch. - Edi ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/43732/#review120357 ----------------------------------------------------------- On Feb. 18, 2016, 7:46 p.m., Edi Bice wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/43732/ > ----------------------------------------------------------- > > (Updated Feb. 18, 2016, 7:46 p.m.) > > > Review request for samza. > > > Repository: samza > > > Description > ------- > > https://issues.apache.org/jira/browse/SAMZA-876 > > Implemented AvroDataFileHdfsWriter fashioned loosely after > BinarySequenceFileHDFSWriter. > > Exposed several RocksDb configuration options (recommended in RocksDb tuning > guide): > > rocksdb.log.level > rocksdb.log.keepfilenum > rocksdb.log.timetoroll > rocksdb.log.maxfilesize > > rocksdb.bloomfilter.bits > rocksdb.max.background.compactions > rocksdb.max.background.flushes > rocksdb.num.write.buffers > rocksdb.target.file.size.base > rocksdb.max.bytes.level.base > > > Diffs > ----- > > gradle.properties 16e1f5d > gradle/dependency-versions.gradle 52e25aa > samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala > 7993119 > > samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala > PRE-CREATION > > samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbOptionsHelper.java > d4f765c > > Diff: https://reviews.apache.org/r/43732/diff/ > > > Testing > ------- > > I am using AvroDataFileHdfsWriter at the end of my pipeline. I feed the > generated avro files to Apache Samoa. Have processed millions of records > successfully. The RocksDb config changes are older and were used and verified > to be working when originally implemented. > > > Thanks, > > Edi Bice > >