Re: Catalyst dependency on Spark Core

2014-07-15 Thread Baofeng Zhang
I see. So how about let me do this simple work to make my contribution :) It is cl. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Catalyst-dependency-on-Spark-Core-tp7303p7360.html Sent from the Apache Spark Developers List mailing list arc

Re: Catalyst dependency on Spark Core

2014-07-15 Thread Matei Zaharia
Yeah, that seems like something we can inline :). On Jul 15, 2014, at 7:30 PM, Baofeng Zhang wrote: > Is Matei following this? > > Catalyst uses the Utils to get the ClassLoader which loaded Spark. > > Can Catalyst directly do "getClass.getClassLoader" to avoid the dependency > on core? > >

Re: Catalyst dependency on Spark Core

2014-07-15 Thread Baofeng Zhang
Is Matei following this? Catalyst uses the Utils to get the ClassLoader which loaded Spark. Can Catalyst directly do "getClass.getClassLoader" to avoid the dependency on core? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Catalyst-dependency-on-

Re: [brainsotrming] Generalization of DStream, a ContinuousRDD ?

2014-07-15 Thread Tathagata Das
Very interesting ideas Andy! Conceptually i think it makes sense. In fact, it is true that dealing with time series data, windowing over application time, windowing over number of events, are things that DStream does not natively support. The real challenge is actually mapping the conceptual windo

Re: Hadoop's Configuration object isn't threadsafe

2014-07-15 Thread Patrick Wendell
Hey Andrew, Cloning the conf this might be a good/simple fix for this particular problem. It's definitely worth looking into. There are a few things we can probably do in Spark to deal with non-thread-safety inside of the Hadoop FileSystem and Configuration classes. One thing we can do in general

Re: Hadoop's Configuration object isn't threadsafe

2014-07-15 Thread Andrew Ash
Hi Shengzhe, Even if we did make Configuration threadsafe, it'd take quite some time for that to trickle down to a Hadoop release that we could actually rely on Spark users having installed. I agree we should consider whether making Configuration threadsafe is something that Hadoop should do, but

Re: Hadoop's Configuration object isn't threadsafe

2014-07-15 Thread yao
Good catch Andrew. In addition to your proposed solution, is that possible to fix Configuration class and make it thread-safe ? I think the fix should be trivial, just use a ConcurrentHashMap, but I am not sure if we can push this change upstream (will hadoop guys accept this change ? for them, it

Re: Reproducible deadlock in 1.0.1, possibly related to Spark-1097

2014-07-15 Thread Cody Koeninger
We tested that patch from aarondav's branch, and are no longer seeing that deadlock. Seems to have solved the problem, at least for us. On Mon, Jul 14, 2014 at 7:22 PM, Patrick Wendell wrote: > Andrew and Gary, > > Would you guys be able to test > https://github.com/apache/spark/pull/1409/file

Re: traveling next week

2014-07-15 Thread Cody Koeninger
No, sorry for the mixup, it was a "helpful" autocomplete similarity between an internal work list and the spark dev list :( Switched my spark mailing list subscription back to my personal email so you guys won't be subjected to further unwanted email. On Tue, Jul 15, 2014 at 12:36 PM, Patrick We

Re: traveling next week

2014-07-15 Thread Patrick Wendell
Cody - did you mean to send this to the spark dev list? On Tue, Jul 15, 2014 at 7:15 AM, Cody Koeninger wrote: > I'm going to be on a plane wed 23, return flight monday 28, so will miss > daily call those days. I'll be pushing forward on projects as I can, but > skype availability may be limited

[brainsotrming] Generalization of DStream, a ContinuousRDD ?

2014-07-15 Thread andy petrella
Dear Sparkers, *[sorry for the lengthy email... => head to the gist for a preview :-p**]* I would like to share some thinking I had due to a use case I faced. Basically, as the subject announced it, it's a generalization of the DStream c

traveling next week

2014-07-15 Thread Cody Koeninger
I'm going to be on a plane wed 23, return flight monday 28, so will miss daily call those days. I'll be pushing forward on projects as I can, but skype availability may be limited, so email if you need something from me.

Re: Catalyst dependency on Spark Core

2014-07-15 Thread Sean Owen
Agree. You end up with a "core" and a "corer core" to distinguish between and it ends up just being more complicated. This sounds like something that doesn't need a module. On Tue, Jul 15, 2014 at 5:59 AM, Patrick Wendell wrote: > Adding new build modules is pretty high overhead, so if this is a

Re: better compression codecs for shuffle blocks?

2014-07-15 Thread Reynold Xin
FYI dev, I submitted a PR making Snappy as the default compression codec: https://github.com/apache/spark/pull/1415 Also submitted a separate PR to add lz4 support: https://github.com/apache/spark/pull/1416 On Mon, Jul 14, 2014 at 5:06 PM, Aaron Davidson wrote: > One of the core problems here

Spark-summingbird

2014-07-15 Thread Nick Pentreath
Seems Twitter has made a bit of progress here: https://github.com/twitter/summingbird/tree/develop/summingbird-spark May be of interest and perhaps some devs with experience in both may be able to help out. N