Hi,
The spark-hive module has a dependency on hive-exec module (a custom built
module from "Hive on Spark” project). Can someone point me to the source code
repo of the hive-exec module? Thanks.
Here is the maven repo link:
https://mvnrepository.com/artifact/org.spark-project.hive/hive-exec/1.
Ah in that case the programming guides text is still talking about the
deprecated accumulator API despite having an updated code sample (the way
it suggests making an accumulator is also deprecated). I think the fix is
updating the programming guide rather than adding += to the API.
On Wednesday,
Are you using KafkaUtils.createDirectStream?
On Wed, Aug 3, 2016 at 9:42 AM, Soumitra Johri
wrote:
> Hi,
>
> I am running a steaming job with 4 executors and 16 cores so that each
> executor has two cores to work with. The input Kafka topic has 4 partitions.
> With this given configuration I was
No, I was referring to the programming guide section on accumulators, it
says " Tasks running on a cluster can then add to it using the add method
or the += operator (in Scala and Python)."
On Aug 2, 2016 2:52 PM, "Holden Karau" wrote:
> I believe it was intentional with the idea that it would b
Hi,
I am running a steaming job with 4 executors and 16 cores so that each
executor has two cores to work with. The input Kafka topic has 4 partitions.
With this given configuration I was expecting MapWithStateRDD to be evenly
distributed across all executors, how ever I see that it uses only two
Pushing down across mapping would be great. If you're used to SQL or
work frequently with lazy collections this is a behavior you learn to
expect.
On 08/02/2016 02:12 PM, Sun Rui wrote:
> Spark does optimise subsequent limits, for example:
>
> scala> df1.limit(3).limit(1).explain
> == Phys
Hi everyone, I'm currently to use Spark 2.0.0 and making Dataframes work with
kryo.registrationRequired=true Is it even possible at all considering the
codegen ?
Regards,
Olivier Girardot | Associé
o.girar...@lateral-thoughts.com
+33 6 24 09 17 94
Use dominant resource calculator instead of default resource calculator
will get the expected vcores as you wanted. Basically by default yarn does
not honor cpu cores as resource, so you will always see vcore is 1 no
matter what number of cores you set in spark.
On Wed, Aug 3, 2016 at 12:11 PM, sa