To: chen kevin
Cc: German Schiavon , fanxin ,
User
Subject: Re: Stream-static join : Refreshing subset of static data / Connection
pooling
The real question is two fold:
1) we had to do collect on each microbatch. In high velocity streams this could
result in millions of records causing memory
Hi,
you can use Debezium to capture real-timely the row-level changes in
PostgreSql, then stream them to kafka, finally etl and write the data to hbase
by flink/spark streaming。So you can join the data in hbase directly. in
consideration of the particularly big table, the scan performance in
1. the issue about that Kerberos expires.
* You don’t need to care aboubt usually, you can use the local keytab at
every node in the Hadoop cluster.
* If there don’t have the keytab in your Hadoop cluster, you will need
update your keytab in every executor periodically。
2. bes
,
Kevin
From: Steve Loughran mailto:ste...@hortonworks.com>>
Date: Friday, September 16, 2016 at 3:46 AM
To: Chen Kevin mailto:kevin.c...@neustar.biz>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>"
mailto:user@spark.apache.org>>
Subject: Re: Missing o
Hi,
Has any one encountered an issue of missing output partition file in S3 ? My
spark job writes output to a S3 location. Occasionally, I noticed one partition
file is missing. As a result, one chunk of data was lost. If I rerun the same
job, the problem usually goes away. This has been happen
what is the behaviour?
On Tue, Jan 27, 2015 at 6:21 AM, Chen, Kevin
mailto:kevin.c...@neustar.biz>> wrote:
Does anyone know if I can save a RDD as a text file to a pre-created directory
in S3 bucket?
I have a directory created in S3 bucket: //nexgen-software/dev
When I tried
Does anyone know if I can save a RDD as a text file to a pre-created directory
in S3 bucket?
I have a directory created in S3 bucket: //nexgen-software/dev
When I tried to save a RDD as text file in this directory:
rdd.saveAsTextFile("s3n://nexgen-software/dev/output");
I got following excepti