On 10/10/2014 06:11 AM, Fairiz Azizi wrote:
> Hello,
>
> Sorry for the late reply.
>
> When I tried the LogQuery example this time, things now seem to be fine!
>
> ...
>
> 14/10/10 04:01:21 INFO scheduler.DAGScheduler: Stage 0 (collect at
> LogQuery.scala:80) finished in 0.429 s
>
> 14/10/10 0
Hello,
Sorry for the late reply.
When I tried the LogQuery example this time, things now seem to be fine!
...
14/10/10 04:01:21 INFO scheduler.DAGScheduler: Stage 0 (collect at
LogQuery.scala:80) finished in 0.429 s
14/10/10 04:01:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0,
whose
Dear Community,
Please ignore my last post about Spark SQL.
When I run:
val file = sc.textFile("./README.md")
val count = file.flatMap(line => line.split(" ")).map(word => (word,
1)).reduceByKey(_+_)
count.collect()
it happends too.
is there any possible reason f
Hi Community,
I use Spark 1.0.2, using Spark SQL to do Hive SQL.
When I run the following code in Spark Shell:
val file = sc.textFile("./README.md")
val count = file.flatMap(line => line.split(" ")).map(word => (word,
1)).reduceByKey(_+_)
count.collect()
Correct and no error
Does it make sense to point the Spark PR review board to read from
mesos/spark-ec2 as well? PRs submitted against that repo may reference
Spark JIRAs and need review just like any other Spark PR.
Nick
Sounds great, thanks!
On Thu, Oct 9, 2014 at 2:22 PM, Michael Armbrust
wrote:
> Yes, the foreign sources work is only about exposing a stable set of APIs
> for external libraries to link against (to avoid the spark assembly
> becoming a dependency mess). The code path these APIs use will be t
Oops I forgot to add, for 2, maybe we can add a flag to use DISK_ONLY for
TorrentBroadcast, or if the broadcasts are bigger than some size.
Matei
On Oct 9, 2014, at 3:04 PM, Matei Zaharia wrote:
> Thanks for the feedback. For 1, there is an open patch:
> https://github.com/apache/spark/pull/2
Thanks for the feedback. For 1, there is an open patch:
https://github.com/apache/spark/pull/2659. For 2, broadcast blocks actually use
MEMORY_AND_DISK storage, so they will spill to disk if you have low memory, but
they're faster to access otherwise.
Matei
On Oct 9, 2014, at 12:11 PM, Guillau
Also, in general for SQL only changes it is sufficient to run "sbt/sbt
catatlyst/test sql/test hive/test". The "hive/test" part takes the
longest, so I usually leave that out until just before submitting unless my
changes are hive specific.
On Thu, Oct 9, 2014 at 11:40 AM, Nicholas Chammas <
nich
Hi Nathan,
You’re right, it looks like we don’t currently provide a method to unregister
accumulators. I’ve opened a JIRA to discuss a fix:
https://issues.apache.org/jira/browse/SPARK-3885
In the meantime, here’s a workaround that might work: Accumulators have a
public setValue() method that
Yes, the foreign sources work is only about exposing a stable set of APIs
for external libraries to link against (to avoid the spark assembly
becoming a dependency mess). The code path these APIs use will be the same
as that for datasources included in the core spark sql library.
Michael
On Thu,
For performance, will foreign data format support, same as native ones?
Thanks,
James
On Wed, Oct 8, 2014 at 11:03 PM, Cheng Lian wrote:
> The foreign data source API PR also matters here
> https://www.github.com/apache/spark/pull/2475
>
> Foreign data source like ORC can be added more easily
Hi,
Thanks to your answer, we've found the problem. It was on reverse IP
resolution on the drivers we used (wrong configuration of the local
bind9). Apparently, not being able to reverse-resolve the IP address of
the nodes was the culprit of the 10s delay.
We've hit two other secondary probl
_RUN_SQL_TESTS needs to be true as well. Those two _... variables set get
correctly when tests are run on Jenkins. They’re not meant to be
manipulated directly by testers.
Did you want to run SQL tests only locally? You can try faking being
Jenkins by setting AMPLAB_JENKINS=true before calling run
Hi Spark community
Having spent some time getting up to speed with the various Spark components
in the core package, I've written a blog to help other newcomers and
contributors.
By no means am I a Spark expert so would be grateful for any advice,
comments or edit suggestions.
Thanks very much
Hi, apologies if I missed a FAQ somewhere.
I am trying to submit a bug fix for the very first time. Reading
instructions, I forked the git repo (at
c9ae79fba25cd49ca70ca398bc75434202d26a97) and am trying to run tests.
I run this: ./dev/run-tests _SQL_TESTS_ONLY=true
and after a while get the fo
Nice to hear that your experiment is consistent to my assumption. The
current L1/L2 will penalize the intercept as well which is not idea.
I'm working on GLMNET in Spark using OWLQN, and I can exactly get the
same solution as R but with scalability in # of rows and columns. Stay
tuned!
Sincerely,
17 matches
Mail list logo