Hello-
I am attempting to use SparkR to read in a parquet file from S3.
The exact same operation succeeds using PySpark - but I get a 503 error using
SparkR.
In fact, I get the 503 even if I use a bad endpoint or bad credentials. It's as
if Spark isn't even trying to make the HTTP request. It's
Sure. My point was that Delta Lake is also one of the 3rd party libraries
and there's no way for Apache Spark to do that. There's a Delta Lake's own
group and the request is better to be there.
On Mon, Oct 5, 2020 at 9:54 PM Enrico Minack wrote:
> Though spark.read. refers to "built-in" data sou
Replied inline.
On Tue, Oct 6, 2020 at 6:07 AM Sergey Oboguev wrote:
> Hi Jungtaek,
>
> Thanks for your response.
>
> *> you'd want to dive inside the checkpoint directory and have separate
> numbers per top-subdirectory*
>
> All the checkpoint store numbers are solely for the subdirectory set b
Hi Jungtaek,
Thanks for your response.
*> you'd want to dive inside the checkpoint directory and have separate
numbers per top-subdirectory*
All the checkpoint store numbers are solely for the subdirectory set by
option("checkpointLocation", .. checkpoint dir for writer ... )
Other subdirector
Hi Jainshasha,
I need to read each row from Dataframe and made some changes to it before
inserting it into ES.
Thanks
Siva
On Mon, Oct 5, 2020 at 8:06 PM jainshasha wrote:
> Hi Siva
>
> To emit data into ES using spark structured streaming job you need to used
> ElasticSearch jar which has sup
Hi Siva
To emit data into ES using spark structured streaming job you need to used
ElasticSearch jar which has support for sink for spark structured streaming
job. For this you can use this one my branch where we have integrated ES
with spark 3.0 and scala 2.12 compatible
https://github.com/Thales
Hi Team,
I have a spark streaming job, which will read from kafka and write into
elastic via Http request.
I want to validate each request from Kafka and change the payload as per
business need and write into Elastic Search.
I have used ES Http Request to push the data into Elastic Search. Can s
Though spark.read. refers to "built-in" data sources, there is
nothing that prevents 3rd party libraries to "extend" spark.read in
Scala or Python. As users know the Spark-way to read built-in data
sources, it feels natural to hook 3rd party data sources under the same
scheme, to give users a h
First of all, you'd want to divide these numbers by the number of
micro-batches, as file creations in checkpoint directory would occur
similarly per micro-batch.
Second, you'd want to dive inside the checkpoint directory and have
separate numbers per top-subdirectory.
After that we can see whether
Hi,
That's not explained in the SS guide doc but explained in the scala API doc.
http://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/streaming/GroupState.html
The statement being quoted from the scala API doc answers your question.
The timeout is reset every time the function is c
Hi,
"spark.read." is a "shorthand" for "built-in" data sources, not for
external data sources. spark.read.format() is still an official way to use
it. Delta Lake is not included in Apache Spark so that is indeed not
possible for Spark to refer to.
Starting from Spark 3.0, the concept of "catalog"
Hi there,
I'm just wondering if there is any incentive to implement read/write methods in
the DataFrameReader/DataFrameWriter for delta similar to e.g. parquet?
For example, using PySpark, "spark.read.parquet" is available, but
"spark.read.delta" is not (same for write).
In my opinion, "spark.r
Hi,
I have testest few JDBC BigQuery providers like Progress Direct and Simba
but none of them seem to work properly through Spark.
The only way I can read and write to BigQuery is through Spark BigQuery API
using the following scenario
spark-shell --jars=gs://spark-lib/bigquery/spark-bigquery-l
Hi all, I have following question:
What happens to the state (in terms of expiration) if I’m updating the
state without setting timeout?
E.g. in FlatMapGroupsWithStateFunction
1. first batch:
state.update(myObj)
state.setTimeoutDuration(timeout)
1. second batch:
state.update(myObj)
14 matches
Mail list logo