date:20170111

Re: Spark Improvement Proposals

2017-01-11 Thread Reynold Xin

+1 on all counts (consensus, time bound, define roles) I can update the doc in the next few days and share back. Then maybe we can just officially vote on this. As Tim suggested, we might not get it 100% right the first time and would need to re-iterate. But that's fine. On Thu, Jan 5, 2017 at 3

[PYSPARK] Python tests organization

2017-01-11 Thread Maciej Szymkiewicz

Hi, I can't help but wonder if there is any practical reason for keeping monolithic test modules. These things are already pretty large (1500 - 2200 LOCs) and can only grow. Development aside, I assume that many users use tests the same way as me, to check the intended behavior, and largish loosel

Re: [PYSPARK] Python tests organization

2017-01-11 Thread Saikat Kanjilal

Hello Maciej, If there's a jira available for this I'd like to help get this moving, let me know next steps. Thanks in advance. From: Maciej Szymkiewicz Sent: Wednesday, January 11, 2017 4:18 AM To: dev@spark.apache.org Subject: [PYSPARK] Python tests organiza

Re: [PYSPARK] Python tests organization

2017-01-11 Thread Reynold Xin

It would be good to break them down a bit more, provided that we don't increase for example total runtime due to extra setup. On Wed, Jan 11, 2017 at 9:45 AM Saikat Kanjilal wrote: > > > > > > > > > > > > > > > Hello Maciej, > > > If there's a jira available for this I'd like to help get this m

Re: [PYSPARK] Python tests organization

2017-01-11 Thread Saikat Kanjilal

Is it worth to come up with a proposal for this and float to dev? From: Reynold Xin Sent: Wednesday, January 11, 2017 9:47 AM To: Maciej Szymkiewicz; Saikat Kanjilal; dev@spark.apache.org Subject: Re: [PYSPARK] Python tests organization It would be good to break

Re: [PYSPARK] Python tests organization

2017-01-11 Thread Reynold Xin

Yes absolutely. On Wed, Jan 11, 2017 at 9:54 AM Saikat Kanjilal wrote: > > > > > > > > > > > > > > > Is it worth to come up with a proposal for this and float to dev? > > > > > > > > > > > -- > > > *From:* Reynold Xin > > > *Sent:* Wednesday, January 11, 2017 9:47 AM

Re: [PYSPARK] Python tests organization

2017-01-11 Thread Saikat Kanjilal

Maciej/Reynolds, If its ok with you guys I can start working on a proposal and create a JIRA, let me know next steps. Thanks in advance. From: Maciej Szymkiewicz Sent: Wednesday, January 11, 2017 10:14 AM To: Saikat Kanjilal Subject: Re: [PYSPARK] Python tests

[Streaming] ConcurrentModificationExceptions when Windowing

2017-01-11 Thread Kalvin Chau

Hi, We've been running into ConcurrentModificationExcpetions "KafkaConsumer is not safe for multi-threaded access" with the CachedKafkaConsumer. I've been working through debugging this issue and after looking through some of the spark source code I think this is a bug. Our set up is: Spark 2.0.2

Re: [Streaming] ConcurrentModificationExceptions when Windowing

2017-01-11 Thread Shixiong(Ryan) Zhu

I think you may reuse the kafka DStream (the DStream returned by createDirectStream). If you need to read from the same Kafka source, you need to create another DStream. On Wed, Jan 11, 2017 at 2:38 PM, Kalvin Chau wrote: > Hi, > > We've been running into ConcurrentModificationExcpetions "KafkaC

Re: [Streaming] ConcurrentModificationExceptions when Windowing

2017-01-11 Thread Shixiong(Ryan) Zhu

Do you change "spark.streaming.concurrentJobs" to more than 1? Kafka 0.10 connector requires it must be 1. On Wed, Jan 11, 2017 at 3:25 PM, Kalvin Chau wrote: > I'm not re-using any InputDStreams actually, this is one InputDStream that > has a window applied to it. > Then when Spark creates and

Re: [Streaming] ConcurrentModificationExceptions when Windowing

2017-01-11 Thread Kalvin Chau

I'm not re-using any InputDStreams actually, this is one InputDStream that has a window applied to it. Then when Spark creates and assigns tasks to read from the Topic, one executor gets assigned two tasks to read from the same TopicPartition, and uses the same CachedKafkaConsumer to read from the

Re: [Streaming] ConcurrentModificationExceptions when Windowing

2017-01-11 Thread Shixiong(Ryan) Zhu

Or do you enable "spark.speculation"? If not, Spark Streaming should not launch two tasks using the same TopicPartition. On Wed, Jan 11, 2017 at 3:33 PM, Kalvin Chau wrote: > I have not modified that configuration setting, and that doesn't seem to > be documented anywhere. > > Does the Kafka 0.1

Re: [Streaming] ConcurrentModificationExceptions when Windowing

2017-01-11 Thread Shixiong(Ryan) Zhu

Could you post your codes, please? On Wed, Jan 11, 2017 at 3:53 PM, Kalvin Chau wrote: > "spark.speculation" is not set, so it would be whatever the default is. > > > On Wed, Jan 11, 2017 at 3:43 PM Shixiong(Ryan) Zhu < > shixi...@databricks.com> wrote: > >> Or do you enable "spark.speculation"?

Re: [Streaming] ConcurrentModificationExceptions when Windowing

2017-01-11 Thread Kalvin Chau

I have not modified that configuration setting, and that doesn't seem to be documented anywhere. Does the Kafka 0.10 require the number of cores on an executor be set to 1? I didn't see that documented anywhere either. On Wed, Jan 11, 2017 at 3:27 PM Shixiong(Ryan) Zhu wrote: > Do you change "s

Re: [Streaming] ConcurrentModificationExceptions when Windowing

2017-01-11 Thread Kalvin Chau

"spark.speculation" is not set, so it would be whatever the default is. On Wed, Jan 11, 2017 at 3:43 PM Shixiong(Ryan) Zhu wrote: > Or do you enable "spark.speculation"? If not, Spark Streaming should not > launch two tasks using the same TopicPartition. > > On Wed, Jan 11, 2017 at 3:33 PM, Kal

Re: [Streaming] ConcurrentModificationExceptions when Windowing

2017-01-11 Thread Kalvin Chau

Here is the minimal code example where I was able to replicate: Batch interval is set to 2 to get the exceptions to happen more often. val kafkaParams = Map[String, Object]( "bootstrap.servers" -> brokers, "key.deserializer" -> classOf[KafkaAvroDeserializer], "value.deserializer" -> classOf[

Re: handling of empty partitions

2017-01-11 Thread Liang-Chi Hsieh

Hi Georg, It is not strange. As I said before, it depends how the data is partitioned. When you try to get the available value from next partition like this: var lastNotNullRow: Option[FooBar] = toCarryBd.value.get(i).get if (lastNotNullRow == None) { lastNotNullRow = toCarryBd.va

Re: [Streaming] ConcurrentModificationExceptions when Windowing

2017-01-11 Thread Kalvin Chau

I've filed an issue here https://issues.apache.org/jira/browse/SPARK-19185, let me know if I missed anything! --Kalvin On Wed, Jan 11, 2017 at 5:43 PM Shixiong(Ryan) Zhu wrote: > Thanks for reporting this. Finally I understood the root cause. Could you > file a JIRA on https://issues.apache.org

Re: handling of empty partitions

2017-01-11 Thread Georg Heiler

I see that there is the possibility to improve and make the algorithm more fault tolerant as outlined by both of you. Could you explain a little bit more why +--++ | foo| bar| +--++ |2016-01-01| first| |201

Re: Spark Improvement Proposals

[PYSPARK] Python tests organization

Re: [PYSPARK] Python tests organization

Re: [PYSPARK] Python tests organization

Re: [PYSPARK] Python tests organization

Re: [PYSPARK] Python tests organization

Re: [PYSPARK] Python tests organization

[Streaming] ConcurrentModificationExceptions when Windowing

Re: [Streaming] ConcurrentModificationExceptions when Windowing

Re: [Streaming] ConcurrentModificationExceptions when Windowing

Re: [Streaming] ConcurrentModificationExceptions when Windowing

Re: [Streaming] ConcurrentModificationExceptions when Windowing

Re: [Streaming] ConcurrentModificationExceptions when Windowing

Re: [Streaming] ConcurrentModificationExceptions when Windowing

Re: [Streaming] ConcurrentModificationExceptions when Windowing

Re: [Streaming] ConcurrentModificationExceptions when Windowing

Re: handling of empty partitions

Re: [Streaming] ConcurrentModificationExceptions when Windowing

Re: handling of empty partitions

19 matches

Site Navigation

Mail list logo

Footer information