The problem resolved.
I change systems.hdfs.producer.hdfs.write.batch.size.bytes to very little
value then the sequence file is filled with my data.
Thank you!
On Wed, Dec 21, 2016 at 9:37 AM Hai Lu wrote:
> "close" as in the file is closed - either the producer moves on to the
> next file or
"close" as in the file is closed - either the producer moves on to the next
file or the job gets shutdown. I meant to say that this is liking writing
into disk, the data don't actually reach the disk until a flush happens. So
while you are still producing into HDFS, the content don't get
committed/
By the way, what do you mean "close", close what?
And what should the stream parameter been, like the following "default"
one? It seems noting to do with the result.
private final SystemStream OUTPUT_STREAM = new SystemStream("hdfs",
* "default"*);
On Wed, Dec 21, 2016 at 8:23 AM Rui Tang wrote
Thank you, I'll try it out!
On Wed, Dec 21, 2016 at 1:45 AM Hai Lu wrote:
> Hi Rui,
>
> I've tried out the HDFS producer, too. In my experience, you won't be able
> to see changes written into HDFS in realtime. The content of the files
> become visible only after they get closed.
>
> You can pro
Hi Rui,
I've tried out the HDFS producer, too. In my experience, you won't be able
to see changes written into HDFS in realtime. The content of the files
become visible only after they get closed.
You can probably play with the "producer.hdfs.write.batch.size.bytes"
config to force rolling over t
I'm using samza-hdfs to write Kafka streams to HDFS, but I can't make it
work.
Here is my samza job's properties file:
# Job
job.factory.class=org.apache.samza.job.yarn.YarnJobFactory
job.name=kafka2hdfs
# YARN
yarn.package.path=file://${basedir}/target/${project.artifactId}-${pom.version}-dist.