Hi all,
We are using Spark 2.3 and we are facing a weird issue. The streaming job
works perfectly fine in one environment but in an another environment it
does not.
We feel that in low performing environment the checkpointing of the state is
not as performant as the other.
Is there any metric
I am following up on this question because I have a similar issue.
When is that we need to control the parallelism manually? Skewed partitions?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscri
>From what I understand , if the transformation is untyped it will return a
>Dataframe , otherwise it will return a Dataset. In the source code you will
>see that return type is a Dataframe instead of a Dataset and they should also
>be annotated with @group untypedrel. Thus , you could check th
What you suggested works in Spark 2.3 , but in the version that I am using
(2.1) it produces the following exception :
found : org.apache.spark.sql.types.ArrayType
required: org.apache.spark.sql.types.StructType
ds.select(from_json($"news", schema) as "news_parsed").show(false)
Use spark.sql.types.ArrayType instead of a Scala Array as the root type
when you define the schema and it will work.
Regards,
Magnus
On Fri, Feb 22, 2019 at 11:15 PM Yeikel wrote:
> I have an "unnamed" json array stored in a *column*.
>
> The format is the following :
>
> column name : news
>