That is what I expected, however, I did a very simple test (using println
just to see when the exception is triggered in the iterator) using local
master and I saw it failed once and cause the entire operation to fail.
Is this something which may be unique to local master (or some default
configur
a failure in the data reader results to a task failure, and Spark will
re-try the task for you (IIRC re-try 3 times before fail the job).
Can you check your Spark log and see if the task fails consistently?
On Tue, Jul 3, 2018 at 2:17 PM assaf.mendelson
wrote:
> Hi All,
>
> I am implemented a d
Hi All,
I am implemented a data source V2 which integrates with an internal system
and I need to make it resilient to errors in the internal data source.
The issue is that currently, if there is an exception in the data reader,
the exception seems to fail the entire task. I would prefer instead t
I think telling people that they’re being considered as committers early on is
a good idea, but AFAIK we’ve always had individual committers do that with
contributors who were doing great work in various areas. We don’t have a
centralized process for it though — it’s up to whoever wants to work
That's fair, and it's great to find high quality contributors. But I also
feel the two projects have very different background and maturity phase.
There are 1300+ contributors to Spark, and only 300 to Beam, with the vast
majority of contributions coming from a single company for Beam (based on
my
As someone who floats a bit between both projects (as a contributor) I'd
love to see us adopt some of these techniques to be pro-active about
growing our committer-ship (I think perhaps we could do this by also moving
some of the newer committers into the PMC faster so there are more eyes out
looki
Worth, I think, a read and consideration from Spark folks. I'd be
interested in comments; I have a few reactions too.
-- Forwarded message -
From: Kenneth Knowles
Date: Sat, Jun 30, 2018 at 1:15 AM
Subject: Beam's recent community development work
To: , , Griselda Cuevas <
g...@ap
The vote passes. Thanks to all who helped with the release!
I'll start publishing everything tomorrow, and an announcement will
be sent when artifacts have propagated to the mirrors (probably
early next week).
+1 (* = binding):
- Marcelo Vanzin *
- Sean Owen *
- Tom Graves *
- Holder Kaurau *- Do
I forgot to post it, I'm +1.
Tom
On Monday, July 2, 2018, 12:19:08 AM CDT, Holden Karau
wrote:
Leaving documents aside (I think we should maybe have a thread on how we want
to handle doc changes to existing releases on dev@) I'm +1 PySpark venv checks
out.
On Sun, Jul 1, 2018 at 9:40
May be this is a bug. The source can be found at:
https://github.com/purijatin/spark-retrain-bug
*Issue:*
The program takes input a set of documents. Where each document is in a
separate file.
The spark program tf-idf of the terms (Tokenizer -> Stopword remover ->
stemming -> tf -> tfidf).
Once
10 matches
Mail list logo