+1 on all counts (consensus, time bound, define roles)
I can update the doc in the next few days and share back. Then maybe we can
just officially vote on this. As Tim suggested, we might not get it 100%
right the first time and would need to re-iterate. But that's fine.
On Thu, Jan 5, 2017 at 3
Hi,
I can't help but wonder if there is any practical reason for keeping
monolithic test modules. These things are already pretty large (1500 -
2200 LOCs) and can only grow. Development aside, I assume that many
users use tests the same way as me, to check the intended behavior, and
largish loosel
Hello Maciej,
If there's a jira available for this I'd like to help get this moving, let me
know next steps.
Thanks in advance.
From: Maciej Szymkiewicz
Sent: Wednesday, January 11, 2017 4:18 AM
To: dev@spark.apache.org
Subject: [PYSPARK] Python tests organiza
It would be good to break them down a bit more, provided that we don't
increase for example total runtime due to extra setup.
On Wed, Jan 11, 2017 at 9:45 AM Saikat Kanjilal wrote:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Hello Maciej,
>
>
> If there's a jira available for this I'd like to help get this m
Is it worth to come up with a proposal for this and float to dev?
From: Reynold Xin
Sent: Wednesday, January 11, 2017 9:47 AM
To: Maciej Szymkiewicz; Saikat Kanjilal; dev@spark.apache.org
Subject: Re: [PYSPARK] Python tests organization
It would be good to break
Yes absolutely.
On Wed, Jan 11, 2017 at 9:54 AM Saikat Kanjilal wrote:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Is it worth to come up with a proposal for this and float to dev?
>
>
>
>
>
>
>
>
>
>
> --
>
>
> *From:* Reynold Xin
>
>
> *Sent:* Wednesday, January 11, 2017 9:47 AM
Maciej/Reynolds,
If its ok with you guys I can start working on a proposal and create a JIRA,
let me know next steps.
Thanks in advance.
From: Maciej Szymkiewicz
Sent: Wednesday, January 11, 2017 10:14 AM
To: Saikat Kanjilal
Subject: Re: [PYSPARK] Python tests
Hi,
We've been running into ConcurrentModificationExcpetions "KafkaConsumer is
not safe for multi-threaded access" with the CachedKafkaConsumer. I've been
working through debugging this issue and after looking through some of the
spark source code I think this is a bug.
Our set up is:
Spark 2.0.2
I think you may reuse the kafka DStream (the DStream returned by
createDirectStream). If you need to read from the same Kafka source, you
need to create another DStream.
On Wed, Jan 11, 2017 at 2:38 PM, Kalvin Chau wrote:
> Hi,
>
> We've been running into ConcurrentModificationExcpetions "KafkaC
Do you change "spark.streaming.concurrentJobs" to more than 1? Kafka 0.10
connector requires it must be 1.
On Wed, Jan 11, 2017 at 3:25 PM, Kalvin Chau wrote:
> I'm not re-using any InputDStreams actually, this is one InputDStream that
> has a window applied to it.
> Then when Spark creates and
I'm not re-using any InputDStreams actually, this is one InputDStream that
has a window applied to it.
Then when Spark creates and assigns tasks to read from the Topic, one
executor gets assigned two tasks to read from the same TopicPartition, and
uses the same CachedKafkaConsumer to read from the
Or do you enable "spark.speculation"? If not, Spark Streaming should not
launch two tasks using the same TopicPartition.
On Wed, Jan 11, 2017 at 3:33 PM, Kalvin Chau wrote:
> I have not modified that configuration setting, and that doesn't seem to
> be documented anywhere.
>
> Does the Kafka 0.1
Could you post your codes, please?
On Wed, Jan 11, 2017 at 3:53 PM, Kalvin Chau wrote:
> "spark.speculation" is not set, so it would be whatever the default is.
>
>
> On Wed, Jan 11, 2017 at 3:43 PM Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
>
>> Or do you enable "spark.speculation"?
I have not modified that configuration setting, and that doesn't seem to be
documented anywhere.
Does the Kafka 0.10 require the number of cores on an executor be set to 1?
I didn't see that documented anywhere either.
On Wed, Jan 11, 2017 at 3:27 PM Shixiong(Ryan) Zhu
wrote:
> Do you change "s
"spark.speculation" is not set, so it would be whatever the default is.
On Wed, Jan 11, 2017 at 3:43 PM Shixiong(Ryan) Zhu
wrote:
> Or do you enable "spark.speculation"? If not, Spark Streaming should not
> launch two tasks using the same TopicPartition.
>
> On Wed, Jan 11, 2017 at 3:33 PM, Kal
Here is the minimal code example where I was able to replicate:
Batch interval is set to 2 to get the exceptions to happen more often.
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> brokers,
"key.deserializer" -> classOf[KafkaAvroDeserializer],
"value.deserializer" -> classOf[
Hi Georg,
It is not strange. As I said before, it depends how the data is partitioned.
When you try to get the available value from next partition like this:
var lastNotNullRow: Option[FooBar] = toCarryBd.value.get(i).get
if (lastNotNullRow == None) {
lastNotNullRow = toCarryBd.va
I've filed an issue here https://issues.apache.org/jira/browse/SPARK-19185,
let me know if I missed anything!
--Kalvin
On Wed, Jan 11, 2017 at 5:43 PM Shixiong(Ryan) Zhu
wrote:
> Thanks for reporting this. Finally I understood the root cause. Could you
> file a JIRA on https://issues.apache.org
I see that there is the possibility to improve and make the algorithm more
fault tolerant as outlined by both of you.
Could you explain a little bit more why
+--++
| foo| bar|
+--++
|2016-01-01| first|
|201
19 matches
Mail list logo