from:"Xuelin Cao"

Re: When will Spark Streaming supports Kafka-simple consumer API?

2015-02-05 Thread Xuelin Cao.2015

rpose. JIRA - https://issues.apache.org/jira/browse/SPARK-4964 > > Can you elaborate on why you have to use SimpleConsumer in your > environment? > > TD > > > On Wed, Feb 4, 2015 at 7:44 PM, Xuelin Cao <[hidden email] > <http:///user/SendEmail.jtp?type=node&node=104

When will Spark Streaming supports Kafka-simple consumer API?

2015-02-04 Thread Xuelin Cao

Hi, In our environment, Kafka can only be used with simple consumer API, like storm spout does. And, also, I found there are suggestions that " Kafka connector of Spark should not be used in production because it is based on the high-level

Can spark provide an option to start reduce stage early?

2015-02-02 Thread Xuelin Cao

In hadoop MR, there is an option *mapred.reduce.slowstart.completed.maps* which can be used to start reducer stage when X% mappers are completed. By doing this, the data shuffling process is able to parallel with the map process. In a large multi-tenancy cluster, this option is usually tuned off.

Re: Will Spark-SQL support vectorized query engine someday?

2015-01-20 Thread Xuelin Cao

compression encoding. For example, one can turn > string comparisons into integer comparisons. These will probably give much > larger performance improvements in common queries. > > > On Mon, Jan 19, 2015 at 6:27 PM, Xuelin Cao > wrote: > >> Hi, >> >> Cor

Will Spark-SQL support vectorized query engine someday?

2015-01-19 Thread Xuelin Cao

Hi, Correct me if I were wrong. It looks like, the current version of Spark-SQL is *tuple-at-a-time* module. Basically, each time the physical operator produces a tuple by recursively call child->execute . There are papers that illustrate the benefits of vectorized query engine. And Hiv

Re: When will spark support "push" style shuffle?

2015-01-07 Thread Xuelin Cao.2015

Got it. The explain makes sense. Thank you. On Thu, Jan 8, 2015 at 1:06 PM, Patrick Wendell [via Apache Spark Developers List] wrote: > This question is conflating a few different concepts. I think the main > question is whether Spark will have a shuffle implementation that > streams data rathe

When will Spark SQL support building DB index natively?

2014-12-17 Thread Xuelin Cao

Hi, In Spark SQL help document, it says "Some of these (such as indexes) are less important due to Spark SQL’s in-memory computational model. Others are slotted for future releases of Spark SQL. - Block level bitmap indexes and virtual columns (used to build indexes)" For our

Re: Why Executor Deserialize Time takes more than 300ms?

2014-11-22 Thread Xuelin Cao

Thanks Imran, The problems is, *every time* I run the same task, the deserialization time is around 300~500ms. I don't know if this is a normal case. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Why-Executor-Deserialize-Time-takes-more-than

Why Executor Deserialize Time takes more than 300ms?

2014-11-22 Thread Xuelin Cao

In our experimental cluster (1 driver, 5 workers), we tried the simplest example: sc.parallelize(Range(0, 100), 2).count In the event log, we found the executor takes too much time on deserialization, about 300 ~ 500ms, and the execution time is only 1ms. Our servers are with 2.3G Hz CPU *

Why Executor Deserialize Time takes more than 300ms?

2014-11-21 Thread Xuelin Cao

In our experimental cluster (1 driver, 5 workers), we tried the simplest example: sc.parallelize(Range(0, 100), 2).count In the event log, we found the executor takes too much time on deserialization, about 300 ~ 500ms, and the execution time is only 1ms. Our servers are with 2.3G Hz CPU * 24

Re: When will Spark Streaming supports Kafka-simple consumer API?

When will Spark Streaming supports Kafka-simple consumer API?

Can spark provide an option to start reduce stage early?

Re: Will Spark-SQL support vectorized query engine someday?

Will Spark-SQL support vectorized query engine someday?

Re: When will spark support "push" style shuffle?

When will Spark SQL support building DB index natively?

Re: Why Executor Deserialize Time takes more than 300ms?

Why Executor Deserialize Time takes more than 300ms?

Why Executor Deserialize Time takes more than 300ms?

10 matches

Site Navigation

Mail list logo

Footer information