Jorn,
Thanks for the response. My downstream database is Kudu.
1. Yes. As you have suggested, I have been using a central caching mechanism
that caches the rdd results and to make a comparison with the next batch to
check for the latest timestamps and ignore the old timestamps. But, I see
handlin
What DB do you have?
You have some options, such as
1) use a key value store (they can be accessed very efficiently) to see if
there has been a newer key already processed - if yes then ignore value if no
then insert into database
2) redesign the key to include the timestamp and find out the la
Hi All,
I am using Spark 2.2.0 & I have below use case:
*Reading from Kafka using Spark Streaming and updating(not just inserting)
the records into downstream database*
I understand that the way Spark read messages from Kafka will not be in a
order of timestamp as stored in Kafka partitions rath
- this column was added in later partitions and not present in earlier
ones.
-
- i assume partition pruning should just load from that particular
partition i am specifying when using spark sql ?
- (spark version 2.2)
On Fri, May 11, 2018 at 2:24 PM, ARAVIND ARUMUGHAM Sethurath
Hmm yeah that does look wrong. Would be great if someone opened a PR to
correct the docs :)
On Thu, May 10, 2018 at 5:13 PM Yuta Morisawa
wrote:
> The problem is solved.
> The actual schema of Kafka message is different from documentation.
>
>
> https://spark.apache.org/docs/latest/structured-s
I have a hive table created on top of s3 DATA in parquet format and
partitioned by one column named eventdate.
1) When using HIVE QUERY, it returns data for a column named "headertime"
which is in the schema of BOTH the table and the file.
select headertime from dbName.test_bug where eventdate=20
Hello,
Would like to know if anyone tried oozie with spark 2.3 actions on
Kubernetes for scheduling spark jobs .
Thanks,
Purna
Hi,
If I try to register a UDTF using SQLContext ( with enableHiveSupport set)
using the code:
I get the following error:
It works OK if I use deprecated HiveContext.
Is there a way to register UDTF without using deprecated code?
This is happening in some tests I am writing using
but I do