Hi,
If I try to register a UDTF using SQLContext ( with enableHiveSupport set)
using the code:
I get the following error:
It works OK if I use deprecated HiveContext.
Is there a way to register UDTF without using deprecated code?
This is happening in some tests I am writing using
but I do
Hello,
Would like to know if anyone tried oozie with spark 2.3 actions on
Kubernetes for scheduling spark jobs .
Thanks,
Purna
I have a hive table created on top of s3 DATA in parquet format and
partitioned by one column named eventdate.
1) When using HIVE QUERY, it returns data for a column named "headertime"
which is in the schema of BOTH the table and the file.
select headertime from dbName.test_bug where eventdate=20
Hmm yeah that does look wrong. Would be great if someone opened a PR to
correct the docs :)
On Thu, May 10, 2018 at 5:13 PM Yuta Morisawa
wrote:
> The problem is solved.
> The actual schema of Kafka message is different from documentation.
>
>
> https://spark.apache.org/docs/latest/structured-s
- this column was added in later partitions and not present in earlier
ones.
-
- i assume partition pruning should just load from that particular
partition i am specifying when using spark sql ?
- (spark version 2.2)
On Fri, May 11, 2018 at 2:24 PM, ARAVIND ARUMUGHAM Sethurath
Hi All,
I am using Spark 2.2.0 & I have below use case:
*Reading from Kafka using Spark Streaming and updating(not just inserting)
the records into downstream database*
I understand that the way Spark read messages from Kafka will not be in a
order of timestamp as stored in Kafka partitions rath
What DB do you have?
You have some options, such as
1) use a key value store (they can be accessed very efficiently) to see if
there has been a newer key already processed - if yes then ignore value if no
then insert into database
2) redesign the key to include the timestamp and find out the la
Jorn,
Thanks for the response. My downstream database is Kudu.
1. Yes. As you have suggested, I have been using a central caching mechanism
that caches the rdd results and to make a comparison with the next batch to
check for the latest timestamps and ignore the old timestamps. But, I see
handlin