Please find my answers on JIRA page.
Muhammad-Ali
On Thursday, January 15, 2015 3:25 AM, Xiangrui Meng
wrote:
Please find my comments on the JRIA page. -Xiangrui
On Tue, Jan 13, 2015 at 1:49 PM, Muhammad Ali A'råby
wrote:
> I have to say, I have created a Jira task for it:
> [SPARK
Yes, I am running on a local file system.
Is there a bug open for this? Mingyu Kim reported the problem last April:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-reads-partitions-in-a-wrong-order-td4818.html
-Ewan
On 01/16/2015 07:41 PM, Reynold Xin wrote:
You are running on a local
code updated. sorry, wrong branch uploaded before.
On Fri, Jan 16, 2015 at 2:13 PM, Kushal Datta
wrote:
> The source code is under a new module named 'graphx'. let me double check.
>
> On Fri, Jan 16, 2015 at 2:11 PM, Kyle Ellrott
> wrote:
>
>> Looking at https://github.com/kdatta/tinkerpop3/co
Hi Alex,
Can you attach the output of sql("explain extended ").collect.foreach(println)?
Thanks,
Yin
On Fri, Jan 16, 2015 at 1:54 PM, Alessandro Baretta
wrote:
> Reynold,
>
> The source file you are directing me to is a little too terse for me to
> understand what exactly is going on. Let me
That's a good idea. We didn't intentionally break the doc generation. The
doc generation for Catalyst is broken because we use Scala macros and we
haven't had time to investigate how to fix it yet.
If you have a minute and want to investigate, I can merge it in as soon as
possible.
On Fri, Ja
The source code is under a new module named 'graphx'. let me double check.
On Fri, Jan 16, 2015 at 2:11 PM, Kyle Ellrott wrote:
> Looking at https://github.com/kdatta/tinkerpop3/compare/graphx-gremlin I
> only see a maven build file. Do you have some source code some place else?
>
> I've worked
Reynold,
Your clarification is much appreciated. One issue though, that I would
strongly encourage you to work on, is to make sure that the Scaladoc CAN be
generated manually if needed (a "Use at your own risk" clause would be
perfectly legitimate here). The reason I say this is that currently eve
Looking at https://github.com/kdatta/tinkerpop3/compare/graphx-gremlin I
only see a maven build file. Do you have some source code some place else?
I've worked on a spark based implementation (
https://github.com/kellrott/spark-gremlin ), but its not done and I've been
tied up on other projects.
I
Reynold,
The source file you are directing me to is a little too terse for me to
understand what exactly is going on. Let me tell you what I'm trying to do
and what problems I'm encountering, so that you might be able to better
direct me investigation of the SparkSQL codebase.
I am computing the
Hi David,
Yes, we are still headed in that direction.
Please take a look at the repo I sent earlier.
I think that's a good starting point.
Thanks,
-Kushal.
On Thu, Jan 15, 2015 at 8:31 AM, David Robinson
wrote:
> I am new to Spark and GraphX, however, I use Tinkerpop backed graphs and
> think
Hi, thinking of picking up this Jira ticket:
https://issues.apache.org/jira/browse/SPARK-4259
Anyone done any work on this to date? Any thoughts on it before we go too
far in?
Thanks!
Best
Andrew
+1 to adding such an optimization to parquet. The bytes are tagged
specially as UTF8 in the parquet schema so it seem like it would be
possible to add this.
On Fri, Jan 16, 2015 at 8:17 AM, Mick Davies
wrote:
> Hi,
>
> It seems that a reasonably large proportion of query time using Spark SQL
>
You are running on a local file system right? HDFS orders the file based on
names, but local file system often don't. I think that's why the difference.
We might be able to do a sort and order the partitions when we create a RDD
to make this universal though.
On Fri, Jan 16, 2015 at 8:26 AM, Ewan
On Fri, Jan 16, 2015 at 10:07 AM, Michel Dufresne
wrote:
> Thank for your reply, I've should have mentioned that spark-env.sh is the
> only option i found because:
>
>- I'm creating the SpeakConf/SparkContext from a Play Application
>(therefore I'm not using spark-submit script)
Then you
Thank for your reply, I've should have mentioned that spark-env.sh is the
only option i found because:
- I'm passing the public IP address of the slave (which is determined in
the shell script)
- I'm creating the SpeakConf/SparkContext from a Play Application
(therefore I'm not using s
You can try to add it in in conf/spark-defaults.conf
# spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value
-Dnumbers="one two three”
Thanks.
Zhan Zhang
On Jan 16, 2015, at 9:56 AM, Michel Dufresne
wrote:
> Hi All,
>
> I'm trying to set some JVM options to the executor process
Hi All,
I'm trying to set some JVM options to the executor processes in a
standalone cluster. Here's what I have in *spark-env.sh*:
jmx_opt="-Dcom.sun.management.jmxremote"
> jmx_opt="${jmx_opt} -Djava.net.preferIPv4Stack=true"
> jmx_opt="${jmx_opt} -Dcom.sun.management.jmxremote.port="
> jmx
Hi all,
Quick one: when reading files, are the orders of partitions guaranteed
to be preserved? I am finding some weird behaviour where I run
sortByKeys() on an RDD (which has 16 byte keys) and write it to disk. If
I open a python shell and run the following:
for part in range(29):
print
Hi,
It seems that a reasonably large proportion of query time using Spark SQL
seems to be spent decoding Parquet Binary objects to produce Java Strings.
Has anyone considered trying to optimize these conversions as many are
duplicated.
Details are outlined in the conversation in the user mailing
Sent from my iPhone
Begin forwarded message:
> From: Robin East
> Date: 16 January 2015 11:35:23 GMT
> To: Joseph Bradley
> Cc: Yana Kadiyska , Devl Devel
>
> Subject: Re: LinearRegressionWithSGD accuracy
>
> Yes with scaled data intercept would be 5000 but the code as it stands is
> runn
21 matches
Mail list logo