Re: spark2.3.2 "load data inpath /hrds/tablename-* " can't use * for A class of files

2018-11-21 Thread yutaochina
now i find this problem resolve in spark2.4.0 ï¼›we can find in jira https://issues.apache.org/jira/browse/SPARK-23425 -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ -

Re: How to Keep Null values in Parquet

2018-11-21 Thread Chetan Khatri
Hello Soumya, Thanks for quick response, I haven't tried. I am doing now and see. On Thu, Nov 22, 2018 at 8:13 AM Soumya D. Sanyal wrote: > Hi Chetan, > > Have you tried casting the null values/columns to a supported type — e.g. > `StringType`, `IntegerType`, etc? > > See also https://issues.a

Re: How to Keep Null values in Parquet

2018-11-21 Thread Soumya D. Sanyal
Hi Chetan, Have you tried casting the null values/columns to a supported type — e.g. `StringType`, `IntegerType`, etc? See also https://issues.apache.org/jira/browse/SPARK-10943 . — Soumya > On Nov 21, 2018, at 9:29 PM, Chetan Khatri > wro

spark2.3.2 "load data inpath /hrds/tablename-* " can't use * for A class of files

2018-11-21 Thread yutaochina
-- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

How to Keep Null values in Parquet

2018-11-21 Thread Chetan Khatri
Hello Spark Users, I have a Dataframe with some of Null Values, When I am writing to parquet it is failing with below error: Caused by: java.lang.RuntimeException: Unsupported data type NullType. at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.execution.data

Casting nested columns and updated nested struct fields.

2018-11-21 Thread Colin Williams
Hello, I'm currently trying to update the schema for a dataframe with nested columns. I would either like to update the schema itself or cast the column without having to explicitly select all the columns just to cast one. In regards to updating the schema it looks like I would probably need to w

Spark 2.3.0 with HDP Got completely successfully but status FAILED with error

2018-11-21 Thread Chetan Khatri
Hello Spark Users, I am working with Spark 2.3.0 with HDP Distribution, where my spark job got completed successfully but final job status is failed with below error: What is best way to prevent this kind of errors? Thanks 8/11/21 17:38:15 INFO ApplicationMaster: Final app status: SUCCEEDED, ex

Structured Streaming restart results in illegal state exception

2018-11-21 Thread Magnus Nilsson
Hello, I'm evaluating Structured Streaming trying to understand how resilient the pipeline is to failures. I ran a small test streaming data from an Azure Event Hub using Azure Databricks saving the data into a parquet file on the Databricks filesystem dbfs:/. I did an unclean shutdown by cancell

Re: Re: spark-sql force parallel union

2018-11-21 Thread Alessandro Solimando
Hello, maybe I am overlooking the problem but what I would go for something similar: def unionDFs(dfs: List[DataFrame]): DataFrame = { dfs.drop(0).foldRight(dfs.apply(0))((df1: DataFrame, df2: DataFrame) => df1 union df2) } (Would be better to keep dfs as-is and you use an empty DF with the co

Structured Streaming to file sink results in illegal state exception

2018-11-21 Thread Magnus Nilsson
I'm evaluating Structured Streaming trying to understand how resilient the pipeline is. I ran a small test streaming data from an Azure Event Hub using Azure Databricks saving the data into a parquet file on the Databricks filesystem dbfs:/. I did an unclean shutdown by cancelling the query. Then