Re: How to Use samza-hdfs

2016-12-20 Thread Rui Tang
The problem resolved. I change systems.hdfs.producer.hdfs.write.batch.size.bytes to very little value then the sequence file is filled with my data. Thank you! On Wed, Dec 21, 2016 at 9:37 AM Hai Lu wrote: > "close" as in the file is closed - either the producer moves on to the > next file or

Re: How to Use samza-hdfs

2016-12-20 Thread Hai Lu
"close" as in the file is closed - either the producer moves on to the next file or the job gets shutdown. I meant to say that this is liking writing into disk, the data don't actually reach the disk until a flush happens. So while you are still producing into HDFS, the content don't get committed/

Re: How to Use samza-hdfs

2016-12-20 Thread Rui Tang
By the way, what do you mean "close", close what? And what should the stream parameter been, like the following "default" one? It seems noting to do with the result. private final SystemStream OUTPUT_STREAM = new SystemStream("hdfs", * "default"*); On Wed, Dec 21, 2016 at 8:23 AM Rui Tang wrote

Re: How to Use samza-hdfs

2016-12-20 Thread Rui Tang
Thank you, I'll try it out! On Wed, Dec 21, 2016 at 1:45 AM Hai Lu wrote: > Hi Rui, > > I've tried out the HDFS producer, too. In my experience, you won't be able > to see changes written into HDFS in realtime. The content of the files > become visible only after they get closed. > > You can pro

Re: How to Use samza-hdfs

2016-12-20 Thread Hai Lu
Hi Rui, I've tried out the HDFS producer, too. In my experience, you won't be able to see changes written into HDFS in realtime. The content of the files become visible only after they get closed. You can probably play with the "producer.hdfs.write.batch.size.bytes" config to force rolling over t

How to Use samza-hdfs

2016-12-19 Thread Rui Tang
I'm using samza-hdfs to write Kafka streams to HDFS, but I can't make it work. Here is my samza job's properties file: # Job job.factory.class=org.apache.samza.job.yarn.YarnJobFactory job.name=kafka2hdfs # YARN yarn.package.path=file://${basedir}/target/${project.artifactId}-${pom.version}-dist.