Re: Spark on Mesos 0.20

2014-10-09 Thread Gurvinder Singh
On 10/10/2014 06:11 AM, Fairiz Azizi wrote: > Hello, > > Sorry for the late reply. > > When I tried the LogQuery example this time, things now seem to be fine! > > ... > > 14/10/10 04:01:21 INFO scheduler.DAGScheduler: Stage 0 (collect at > LogQuery.scala:80) finished in 0.429 s > > 14/10/10 0

Re: Spark on Mesos 0.20

2014-10-09 Thread Fairiz Azizi
Hello, Sorry for the late reply. When I tried the LogQuery example this time, things now seem to be fine! ... 14/10/10 04:01:21 INFO scheduler.DAGScheduler: Stage 0 (collect at LogQuery.scala:80) finished in 0.429 s 14/10/10 04:01:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose

[Spark SQL Continue] Sorry, it is not only limited in SQL, may due to network

2014-10-09 Thread Trident
Dear Community, Please ignore my last post about Spark SQL. When I run: val file = sc.textFile("./README.md") val count = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_+_) count.collect() ‍ it happends too. is there any possible reason f

[Spark SQL] Strange NPE in Spark SQL with Hive

2014-10-09 Thread Trident
Hi Community, I use Spark 1.0.2, using Spark SQL to do Hive SQL. When I run the following code in Spark Shell: val file = sc.textFile("./README.md") val count = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_+_) count.collect() ‍ Correct and no error

spark-prs and mesos/spark-ec2

2014-10-09 Thread Nicholas Chammas
Does it make sense to point the Spark PR review board to read from mesos/spark-ec2 as well? PRs submitted against that repo may reference Spark JIRAs and need review just like any other Spark PR. Nick

Re: will/when Spark/SparkSQL will support ORCFile format

2014-10-09 Thread James Yu
Sounds great, thanks! On Thu, Oct 9, 2014 at 2:22 PM, Michael Armbrust wrote: > Yes, the foreign sources work is only about exposing a stable set of APIs > for external libraries to link against (to avoid the spark assembly > becoming a dependency mess). The code path these APIs use will be t

Re: TorrentBroadcast slow performance

2014-10-09 Thread Matei Zaharia
Oops I forgot to add, for 2, maybe we can add a flag to use DISK_ONLY for TorrentBroadcast, or if the broadcasts are bigger than some size. Matei On Oct 9, 2014, at 3:04 PM, Matei Zaharia wrote: > Thanks for the feedback. For 1, there is an open patch: > https://github.com/apache/spark/pull/2

Re: TorrentBroadcast slow performance

2014-10-09 Thread Matei Zaharia
Thanks for the feedback. For 1, there is an open patch: https://github.com/apache/spark/pull/2659. For 2, broadcast blocks actually use MEMORY_AND_DISK storage, so they will spill to disk if you have low memory, but they're faster to access otherwise. Matei On Oct 9, 2014, at 12:11 PM, Guillau

Re: Trouble running tests

2014-10-09 Thread Michael Armbrust
Also, in general for SQL only changes it is sufficient to run "sbt/sbt catatlyst/test sql/test hive/test". The "hive/test" part takes the longest, so I usually leave that out until just before submitting unless my changes are hive specific. On Thu, Oct 9, 2014 at 11:40 AM, Nicholas Chammas < nich

Re: Fwd: Accumulator question

2014-10-09 Thread Josh Rosen
Hi Nathan, You’re right, it looks like we don’t currently provide a method to unregister accumulators.  I’ve opened a JIRA to discuss a fix:  https://issues.apache.org/jira/browse/SPARK-3885 In the meantime, here’s a workaround that might work:  Accumulators have a public setValue() method that

Re: will/when Spark/SparkSQL will support ORCFile format

2014-10-09 Thread Michael Armbrust
Yes, the foreign sources work is only about exposing a stable set of APIs for external libraries to link against (to avoid the spark assembly becoming a dependency mess). The code path these APIs use will be the same as that for datasources included in the core spark sql library. Michael On Thu,

Re: will/when Spark/SparkSQL will support ORCFile format

2014-10-09 Thread James Yu
For performance, will foreign data format support, same as native ones? Thanks, James On Wed, Oct 8, 2014 at 11:03 PM, Cheng Lian wrote: > The foreign data source API PR also matters here > https://www.github.com/apache/spark/pull/2475 > > Foreign data source like ORC can be added more easily

Re: TorrentBroadcast slow performance

2014-10-09 Thread Guillaume Pitel
Hi, Thanks to your answer, we've found the problem. It was on reverse IP resolution on the drivers we used (wrong configuration of the local bind9). Apparently, not being able to reverse-resolve the IP address of the nodes was the culprit of the 10s delay. We've hit two other secondary probl

Re: Trouble running tests

2014-10-09 Thread Nicholas Chammas
_RUN_SQL_TESTS needs to be true as well. Those two _... variables set get correctly when tests are run on Jenkins. They’re not meant to be manipulated directly by testers. Did you want to run SQL tests only locally? You can try faking being Jenkins by setting AMPLAB_JENKINS=true before calling run

Introduction to Spark Blog

2014-10-09 Thread devl.development
Hi Spark community Having spent some time getting up to speed with the various Spark components in the core package, I've written a blog to help other newcomers and contributors. By no means am I a Spark expert so would be grateful for any advice, comments or edit suggestions. Thanks very much

Trouble running tests

2014-10-09 Thread Yana
Hi, apologies if I missed a FAQ somewhere. I am trying to submit a bug fix for the very first time. Reading instructions, I forked the git repo (at c9ae79fba25cd49ca70ca398bc75434202d26a97) and am trying to run tests. I run this: ./dev/run-tests _SQL_TESTS_ONLY=true and after a while get the fo

Re: [MLlib] LogisticRegressionWithSGD and LogisticRegressionWithLBFGS converge with different weights.

2014-10-09 Thread DB Tsai
Nice to hear that your experiment is consistent to my assumption. The current L1/L2 will penalize the intercept as well which is not idea. I'm working on GLMNET in Spark using OWLQN, and I can exactly get the same solution as R but with scalability in # of rows and columns. Stay tuned! Sincerely,