Some things just didn't work as i had first expected it. For example,
when writing from a spark collection to an alluxio destination didn't
persist them to s3 automatically.
I remember having to use the alluxio library directly to force the
files to persist to s3 after spark finished writing to a
Hi,
Here is a little bit of background.
I've been using stateless streaming API's for a while like using
JavaDstream and so on and they worked well. It's has come to a point where
we need to do realtime stateful streaming based on event time and other
things but for now I am just trying to get us
You cant do ".count()" directly on streaming DataFrames. This is because
"count" is an Action (remember RDD actions) that executes and returns a
result immediately which can be done only when the data is bounded (e.g.
batch/interactive queries). For streaming queries, you have to let it run
in the
Hi!
Thanks for the response. Looks like from_json requires schema ahead of
time. Is there any function I can use to infer schema from the json
messages I am receiving through Kafka? I tried with the code below however
I get the following exception.
org.apache.spark.sql.AnalysisException: Queries
I understand the confusing. "json" format is for json encoded files being
written in a directory. For Kafka, use "kafk" format. Then you decode the
binary data as a json, you can use the function "from_json" (spark 2.1 and
above). Here is our blog post on this.
https://databricks.com/blog/2017/04/
Hi,
I'd like to hear the official statement too.
My take on GraphX and Spark Streaming is that they are long dead projects
with GraphFrames and Structured Streaming taking their place, respectively.
Jacek
On 13 May 2017 3:00 p.m., "Sergey Zhemzhitsky" wrote:
> Hello Spark users,
>
> I just wo
Hello Spark users,
I just would like to know whether the GraphX component should be considered
deprecated and no longer actively maintained
and should not be considered when starting new graph-processing projects on top
of Spark in favour of other
graph-processing frameworks?
I'm asking because
Another error. Anyone have any idea?
this one happens when I tried to convert a spark dataframe to pandas:
---Py4JError
Traceback (most recent call
last)/home/ubuntu/spark-2.1.1-bin-hadoop2.7/p
My code runs error free on my local pc. Just tried running the same code on
a ubuntu machine on ec2, and got the error below. Any idea where to start
in terms of debugging?
---Py4JError
Tracebac
HI All,
What is the difference between sparkSession.readStream.format("kafka") vs
sparkSession.readStream.format("json") ?
I am sending json encoded messages in Kafka and I am not sure which one of
the above I should use?
Thanks!
10 matches
Mail list logo