any help please
On Tue, Aug 7, 2018 at 1:49 PM, Pranav Agrawal
wrote:
> I am hitting issue,
> https://issues.cloudera.org/browse/DISTRO-800 (related to
> https://issues.apache.org/jira/browse/HIVE-11625)
>
> I am unable to write empty array of types int or string (array of size 0)
> into parquet
The following may help although in Scala. The idea is to firstly concat
each value with time, assembly all time_value into an array and explode,
and finally split time_value into time and value.
val ndf = df.select(col("name"), col("otherName"),
explode(
array(concat_ws(":", col("v1"),
Hello Community,
I'm using Spark 2.3 and Spark 1.6.0 in my cluster with Cloudera
distribution 5.13.0.
Both are configured to run on Yarn, but i'm unable to see completed
application in Spark2 history server, while in Spark 1.6.0 i did.
1) I checked the HDFS permissions for both folders and both
+-+-++++
| name|otherName|val1|val2|val3|
+-+-++++
| bob| b1| 1| 2| 3|
|alive| c1| 3| 4| 6|
| eve| e1| 7| 8| 9|
+-+-++++
I need this to become
+-+-++-
| name|other
FYI, it works with static partitioning
spark.sql("insert overwrite table mytable PARTITION(P1=1085, P2=164590861)
select c1, c2,..cn, P1, P2 from updateTable")
On Thu, Aug 2, 2018 at 5:01 PM, Nirav Patel wrote:
> I am trying to insert overwrite multiple partitions into existing
> partitioned hiv
I am using spark 2.2.1 and hive2.1. I am trying to insert overwrite
multiple partitions into existing partitioned hive/parquet table.
Table was created using sparkSession.
I have a table 'mytable' with partitions P1 and P2.
I have following set on sparkSession object:
"hive.exec.dynamic.partiti
Because of some legacy issues I can't immediately upgrade spark version. But I
try filter data before loading it into spark based on the suggestion by
val df = sparkSession.read.format("jdbc").option(...).option("dbtable",
"(select .. from ... where url <> '') table_name")load()
df
Hi James,
It is always advisable to use the latest SPARK version. That said, can you
please giving a try to dataframes and udf if possible. I think, that would
be a much scalable way to address the issue.
Also in case possible, it is always advisable to use the filter option
before fetching the d
I am very new to Spark. Just successfully setup Spark SQL connecting to
postgresql database, and am able to display table with code
sparkSession.sql("SELECT id, url from table_a where col_b <> '' ").show()
Now I want to perform filter and map function on col_b value. In plain scala it
would
Hi guys.
I was investigating a spark property
/spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")/. It
works perfectly in local fs, but on s3 i stumbled into a strange behavior.
If i don't have a hive table or this table is empty, spark won't save any
data into this table with Sa
I am hitting issue,
https://issues.cloudera.org/browse/DISTRO-800 (related to
https://issues.apache.org/jira/browse/HIVE-11625)
I am unable to write empty array of types int or string (array of size 0)
into parquet, please assist or suggest workaround for the same.
spark version: 2.2.1
AWS EMR: 5
11 matches
Mail list logo