Hello,
I have observed that for long queries (say > 1 hour) from pyspark with Spark
Connect (3.5.2), the client often gets stuck: even when the query completes
successfully, the client stays waiting in _execute_and_fetch_as_iterator​ (see
traceback below).
The client session still shows as act
Hello,
I have a piece of code that looks roughly like this:
df = spark.read.parquet("s3://bucket/data.parquet/name=A",
"s3://bucket/data.parquet/name=B")
df_out = df. # Do stuff to transform df
df_out.write.partitionBy("name").parquet("s3://bucket/data.parquet")
I specific explicit path
d running code though and would like to know if someone can
>> shed more light on this.
>>
>> Regards,
>> Chandan
>>
>> On Sat, Sep 22, 2018 at 7:43 PM peay wrote:
>>
>>> Hello,
>>>
>>> I am trying to use watermarking withou
Hello,
I am trying to use watermarking without aggregation, to filter out records that
are just too late, instead of appending them to the output. My understanding is
that aggregation is required for `withWatermark` to have any effect. Is that
correct?
I am looking for something along the line
Hello,
I run a Spark cluster on YARN, and we have a bunch of client-mode applications
we use for interactive work. Whenever we start one of this, an application
master container is started.
My understanding is that this is mostly an empty shell, used to request further
containers or get status
Hello,
I am trying to use data_frame.write.partitionBy("day").save("dataset.parquet")
to write a dataset while splitting by day.
I would like to run a Spark job to process, e.g., a month:
dataset.parquet/day=2017-01-01/...
...
and then run another Spark job to add another month using the same
Hello,
I am running unit tests with Spark DataFrames, and I am looking for
configuration tweaks that would make tests faster. Usually, I use a local[2] or
local[4] master.
Something that has been bothering me is that most of my stages end up using 200
partitions, independently of whether I rep
/foreachRDD equivalent for pyspark Structured Streaming
Local Time: 3 May 2017 12:05 PM
UTC Time: 3 May 2017 10:05
From: tathagata.das1...@gmail.com
To: peay
user@spark.apache.org
You can apply apply any kind of aggregation on windows. There are some built in
aggregations (e.g. sum and count) as well as
Hello,
I would like to get started on Spark Streaming with a simple window.
I've got some existing Spark code that takes a dataframe, and outputs a
dataframe. This includes various joins and operations that are not supported by
structured streaming yet. I am looking to essentially map/apply thi