Re: Large number of conf broadcasts

2015-12-18 Thread Anders Arpteg
to read about 120 >> thousand avro files into a single data frame. >> >> Is your patch part of a pull request from the master branch in github? >> >> Thanks, >> Prasad. >> >> From: Anders Arpteg >> Date: Thursday, October 22, 2

Re: Large number of conf broadcasts

2015-12-17 Thread Prasad Ravilla
Thanks, Koert. Regards, Prasad. From: Koert Kuipers Date: Thursday, December 17, 2015 at 1:06 PM To: Prasad Ravilla Cc: Anders Arpteg, user Subject: Re: Large number of conf broadcasts https://github.com/databricks/spark-avro/pull/95<https://urldefense.proofpoint.com/v2/url?u=ht

Re: Large number of conf broadcasts

2015-12-17 Thread Koert Kuipers
l request from the master branch in github? > > Thanks, > Prasad. > > From: Anders Arpteg > Date: Thursday, October 22, 2015 at 10:37 AM > To: Koert Kuipers > Cc: user > Subject: Re: Large number of conf broadcasts > > Yes, seems unnecessary. I actually tried patc

Re: Large number of conf broadcasts

2015-12-17 Thread Prasad Ravilla
Kuipers Cc: user Subject: Re: Large number of conf broadcasts Yes, seems unnecessary. I actually tried patching the com.databricks.spark.avro reader to only broadcast once per dataset, instead of every single file/partition. It seems to work just as fine, and there are significantly less

Re: Large number of conf broadcasts

2015-10-26 Thread Anders Arpteg
Nice Koert, lets hope it gets merged soon. /Anders On Fri, Oct 23, 2015 at 6:32 PM Koert Kuipers wrote: > https://github.com/databricks/spark-avro/pull/95 > > On Fri, Oct 23, 2015 at 5:01 AM, Koert Kuipers wrote: > >> oh no wonder... it undoes the glob (i was reading from /some/path/*), >> cre

Re: Large number of conf broadcasts

2015-10-23 Thread Koert Kuipers
https://github.com/databricks/spark-avro/pull/95 On Fri, Oct 23, 2015 at 5:01 AM, Koert Kuipers wrote: > oh no wonder... it undoes the glob (i was reading from /some/path/*), > creates a hadoopRdd for every path, and then creates a union of them using > UnionRDD. > > thats not what i want... no

Re: Large number of conf broadcasts

2015-10-23 Thread Koert Kuipers
oh no wonder... it undoes the glob (i was reading from /some/path/*), creates a hadoopRdd for every path, and then creates a union of them using UnionRDD. thats not what i want... no need to do union. AvroInpuFormat already has the ability to handle globs (or multiple paths comma separated) very e

Re: Large number of conf broadcasts

2015-10-22 Thread Anders Arpteg
Yes, seems unnecessary. I actually tried patching the com.databricks.spark.avro reader to only broadcast once per dataset, instead of every single file/partition. It seems to work just as fine, and there are significantly less broadcasts and not seeing out of memory issues any more. Strange that mo

Re: Large number of conf broadcasts

2015-10-22 Thread Koert Kuipers
i am seeing the same thing. its gona completely crazy creating broadcasts for the last 15 mins or so. killing it... On Thu, Sep 24, 2015 at 1:24 PM, Anders Arpteg wrote: > Hi, > > Running spark 1.5.0 in yarn-client mode, and am curios in why there are so > many broadcast being done when loading