date:20180511

UDTF registration fails for hiveEnabled SQLContext

2018-05-11 Thread Mick Davies

Hi, If I try to register a UDTF using SQLContext ( with enableHiveSupport set) using the code: I get the following error: It works OK if I use deprecated HiveContext. Is there a way to register UDTF without using deprecated code? This is happening in some tests I am writing using but I do

Oozie with spark 2.3 in Kubernetes

2018-05-11 Thread purna pradeep

Hello, Would like to know if anyone tried oozie with spark 2.3 actions on Kubernetes for scheduling spark jobs . Thanks, Purna

SPARK SQL: returns null for a column, while HIVE query returns data for the same column

2018-05-11 Thread ARAVIND ARUMUGHAM Sethurathnam

I have a hive table created on top of s3 DATA in parquet format and partitioned by one column named eventdate. 1) When using HIVE QUERY, it returns data for a column named "headertime" which is in the schema of BOTH the table and the file. select headertime from dbName.test_bug where eventdate=20

Re: Spark 2.3.0 Structured Streaming Kafka Timestamp

2018-05-11 Thread Michael Armbrust

Hmm yeah that does look wrong. Would be great if someone opened a PR to correct the docs :) On Thu, May 10, 2018 at 5:13 PM Yuta Morisawa wrote: > The problem is solved. > The actual schema of Kafka message is different from documentation. > > > https://spark.apache.org/docs/latest/structured-s

Re: SPARK SQL: returns null for a column, while HIVE query returns data for the same column

2018-05-11 Thread ARAVIND ARUMUGHAM Sethurathnam

- this column was added in later partitions and not present in earlier ones. - - i assume partition pruning should just load from that particular partition i am specifying when using spark sql ? - (spark version 2.2) On Fri, May 11, 2018 at 2:24 PM, ARAVIND ARUMUGHAM Sethurath

ordered ingestion not guaranteed

2018-05-11 Thread ravidspark

Hi All, I am using Spark 2.2.0 & I have below use case: *Reading from Kafka using Spark Streaming and updating(not just inserting) the records into downstream database* I understand that the way Spark read messages from Kafka will not be in a order of timestamp as stored in Kafka partitions rath

Re: ordered ingestion not guaranteed

2018-05-11 Thread Jörn Franke

What DB do you have? You have some options, such as 1) use a key value store (they can be accessed very efficiently) to see if there has been a newer key already processed - if yes then ignore value if no then insert into database 2) redesign the key to include the timestamp and find out the la

Re: ordered ingestion not guaranteed

2018-05-11 Thread ravidspark

Jorn, Thanks for the response. My downstream database is Kudu. 1. Yes. As you have suggested, I have been using a central caching mechanism that caches the rdd results and to make a comparison with the next batch to check for the latest timestamps and ignore the old timestamps. But, I see handlin

UDTF registration fails for hiveEnabled SQLContext

Oozie with spark 2.3 in Kubernetes

SPARK SQL: returns null for a column, while HIVE query returns data for the same column

Re: Spark 2.3.0 Structured Streaming Kafka Timestamp

Re: SPARK SQL: returns null for a column, while HIVE query returns data for the same column

ordered ingestion not guaranteed

Re: ordered ingestion not guaranteed

Re: ordered ingestion not guaranteed

8 matches

Site Navigation

Mail list logo

Footer information