Hi Chris,
I have to make sure my DB has updated value for any record at a given point
of time.
Say following is data. I have to take 4th row for EmpId 2.
Also if any Emp details are already there in Oracle. I have to update it
with latest value in the stream.
EmpId, salary, timestamp
1, 1000 ,
Just thinking on this, if your needs can be addressed using batch instead
of streaming, I think this is a viable solution. Using a lambda
architecture approach seems like a possible solution.
On Sun., 30 Jun. 2019, 9:54 am Chris Teoh, wrote:
> Not sure what your needs are here.
>
> If you can af
We're trying to setup a system that includes Spark. The rest of the
services have good Docker containers and Helm charts to start from.
Spark on the other hand is proving difficult. We forked a container and
have tried to create our own chart but are having several problems with
this.
So back to
You can implement custom partitioner to do the bucketing.
On Sun, Jun 30, 2019 at 5:15 AM Chris Teoh wrote:
> The closest thing I can think of here is if you have both dataframes
> written out using buckets. Hive uses this technique for join optimisation
> such that both datasets of the same buc
Does something like the code below make any sense or would there be a more
efficient way to do it ?
val wordsOnOnePartition = input
> .map { word => Math.abs(word.id.hashCode) % numPartitions -> word }
> .partitionBy(new PartitionIdPassthrough(numPartitions))
> val indices = wo