date:20160803

Quick question about hive-exec 1.2.1.spark2

2016-08-03 Thread Tao Li

Hi, The spark-hive module has a dependency on hive-exec module (a custom built module from "Hive on Spark” project). Can someone point me to the source code repo of the hive-exec module? Thanks. Here is the maven repo link: https://mvnrepository.com/artifact/org.spark-project.hive/hive-exec/1.

Re: AccumulatorV2 += operator

2016-08-03 Thread Holden Karau

Ah in that case the programming guides text is still talking about the deprecated accumulator API despite having an updated code sample (the way it suggests making an accumulator is also deprecated). I think the fix is updating the programming guide rather than adding += to the API. On Wednesday,

Re: How does MapWithStateRDD distribute the data

2016-08-03 Thread Cody Koeninger

Are you using KafkaUtils.createDirectStream? On Wed, Aug 3, 2016 at 9:42 AM, Soumitra Johri wrote: > Hi, > > I am running a steaming job with 4 executors and 16 cores so that each > executor has two cores to work with. The input Kafka topic has 4 partitions. > With this given configuration I was

Re: AccumulatorV2 += operator

2016-08-03 Thread Bryan Cutler

No, I was referring to the programming guide section on accumulators, it says " Tasks running on a cluster can then add to it using the add method or the += operator (in Scala and Python)." On Aug 2, 2016 2:52 PM, "Holden Karau" wrote: > I believe it was intentional with the idea that it would b

How does MapWithStateRDD distribute the data

2016-08-03 Thread Soumitra Johri

Hi, I am running a steaming job with 4 executors and 16 cores so that each executor has two cores to work with. The input Kafka topic has 4 partitions. With this given configuration I was expecting MapWithStateRDD to be evenly distributed across all executors, how ever I see that it uses only two

Re: What happens in Dataset limit followed by rdd

2016-08-03 Thread Maciej Szymkiewicz

Pushing down across mapping would be great. If you're used to SQL or work frequently with lazy collections this is a behavior you learn to expect. On 08/02/2016 02:12 PM, Sun Rui wrote: > Spark does optimise subsequent limits, for example: > > scala> df1.limit(3).limit(1).explain > == Phys

Spark SQL and Kryo registration

2016-08-03 Thread Olivier Girardot

Hi everyone, I'm currently to use Spark 2.0.0 and making Dataframes work with kryo.registrationRequired=true Is it even possible at all considering the codegen ? Regards, Olivier Girardot | Associé o.girar...@lateral-thoughts.com +33 6 24 09 17 94

Re: Spark on yarn, only 1 or 2 vcores getting allocated to the containers getting created.

2016-08-03 Thread Saisai Shao

Use dominant resource calculator instead of default resource calculator will get the expected vcores as you wanted. Basically by default yarn does not honor cpu cores as resource, so you will always see vcore is 1 no matter what number of cores you set in spark. On Wed, Aug 3, 2016 at 12:11 PM, sa

Quick question about hive-exec 1.2.1.spark2

Re: AccumulatorV2 += operator

Re: How does MapWithStateRDD distribute the data

Re: AccumulatorV2 += operator

How does MapWithStateRDD distribute the data

Re: What happens in Dataset limit followed by rdd

Spark SQL and Kryo registration

Re: Spark on yarn, only 1 or 2 vcores getting allocated to the containers getting created.

8 matches

Site Navigation

Mail list logo

Footer information