Can you try running the spark-shell in yarn-cluster mode?
./bin/spark-shell --master yarn-client
Read more over here http://spark.apache.org/docs/1.0.0/running-on-yarn.html
Thanks
Best Regards
On Sun, Sep 28, 2014 at 7:08 AM, jamborta wrote:
> hi all,
>
> I have a job that works ok in yarn-cl
This workaround looks good to me. In this way, all queries are still
executed lazily within a single DAG, and Spark SQL is capable to
optimize the query plan as a whole.
On 9/29/14 11:26 AM, twinkle sachdeva wrote:
Thanks Cheng.
For the time being , As a work around, I had applied the schema
Thanks Cheng.
For the time being , As a work around, I had applied the schema
to Queryresult1, and then registered the result as temp table. Although
that works, but I was not sure of performance impact, as that might block
some optimisation in some scenarios.
This flow (on spark 1.1 ) works:
r
Chris,
Think I will check back with you to see if you made progress on this issue.
Any good news so far? Thanks. Once again, I really appreciate you look into
this issue.
Thanks,
Wei
On Thu, Aug 28, 2014 at 4:44 PM, Chris Fregly wrote:
> great question, wei. this is very important to understa
Figured this out... documented here and hope can help others:
http://koobehub.wordpress.com/2014/09/29/spark-the-standalone-cluster-deployment/
On Sun, Sep 28, 2014 at 12:36 AM, codeoedoc wrote:
> Hi guys,
>
> This is a spark fresh user...
>
> I'm trying to setup a spark cluster with multiple no
You might consider instead storing the data using saveAsParquetFile and
then querying that after running
sqlContext.parquetFile(...).registerTempTable(...).
On Sun, Sep 28, 2014 at 6:43 PM, Michael Armbrust
wrote:
> This is not possible until https://github.com/apache/spark/pull/2501 is
> merged
This is not possible until https://github.com/apache/spark/pull/2501 is
merged.
On Sun, Sep 28, 2014 at 6:39 PM, Haopu Wang wrote:
> Thanks for the response. From Spark Web-UI's Storage tab, I do see
> cached RDD there.
>
>
>
> But the storage level is "Memory Deserialized 1x Replicated". How
Thanks for the response. From Spark Web-UI's Storage tab, I do see cached RDD
there.
But the storage level is "Memory Deserialized 1x Replicated". How can I change
the storage level? Because I have a big table there.
Thanks!
From: Cheng Lian [mailto:l
The storage fraction only limits the amount of memory used for storage. It
doesn't actually limit anything else. I.e you can use all the memory if you
want in collect.
On Sunday, September 28, 2014, Brad Miller
wrote:
> Hi All,
>
> I am interested to collect() a large RDD so that I can run a lea
Hi Spark users and developers,
Some of the most active Spark developers (including Matei Zaharia, Michael
Armbrust, Joseph Bradley, TD, Paco Nathan, and me) will be in NYC for
Strata NYC. We are working with the Spark NYC meetup group and Bloomberg to
host a meetup event. This might be the event w
Thanks, Michael, for your quick response.
View is critical for my project that is migrating from shark to spark SQL. I
have implemented and tested everything else. It would be perfect if view could
be implemented soon.
Du
From: Michael Armbrust mailto:mich...@databricks.com>>
Date: Sunday, Se
Views are not supported yet. Its not currently on the near term roadmap,
but that can change if there is sufficient demand or someone in the
community is interested in implementing them. I do not think it would be
very hard.
Michael
On Sun, Sep 28, 2014 at 11:59 AM, Du Li wrote:
>
> Can anyb
Can anybody confirm whether or not view is currently supported in spark? I
found “create view translate” in the blacklist of HiveCompatibilitySuite.scala
and also the following scenario threw NullPointerException on
beeline/thriftserver (1.1.0). Any plan to support it soon?
> create table src
Hi All,
I am interested to collect() a large RDD so that I can run a learning
algorithm on it. I've noticed that when I don't increase
SPARK_DRIVER_MEMORY I can run out of memory. I've also noticed that it
looks like the same fraction of memory is reserved for storage on the
driver as on the work
It turned out a bug in my code. In the select clause the list of fields is
misaligned with the schema of the target table. As a consequence the map
data couldn’t be cast to some other type in the schema.
Thanks anyway.
On 9/26/14, 8:08 PM, "Cheng Lian" wrote:
>Would you mind to provide the DDL
Yes, looks like it can only be controlled by the
parameter spark.sql.autoBroadcastJoinThreshold, which is a little bit weird
to me.
How am I suppose to know the exact bytes of a table? Let me specify the
join algorithm is preferred I think.
Jianshi
On Sun, Sep 28, 2014 at 11:57 PM, Ted Yu wrote
Have you looked at SPARK-1800 ?
e.g. see sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
Cheers
On Sun, Sep 28, 2014 at 1:55 AM, Jianshi Huang
wrote:
> I cannot find it in the documentation. And I have a dozen dimension tables
> to (left) join...
>
>
> Cheers,
> --
> Jianshi Huang
All
Sorry this is spark related, but I thought some of you in San Francisco
might be interested in this talk. We announced this talk recently, it will be
at the end of next month (oct)
http://www.meetup.com/sfmachinelearning/events/208078582/
Prof CJ Lin is famous for his work on libsvm an
Hi
We have used LogisticRegression with two different optimization method SGD
and LBFGS in MLlib.
With the same dataset and the same training and test split, but get
different weights vector.
For example, we use
spark-1.1.0/data/mllib/sample_binary_classification_data.txt
as our training and test
Hi
If you want IDEA compile your spark project (version 1.0.0 and above), you
should do it with following steps.
1 clone spark project
2 use mvn to compile your spark project ( because you need the generated avro
source file in flume-sink module)
3 open spark/pom.xml with IDEA
4 check profiles
I cannot find it in the documentation. And I have a dozen dimension tables
to (left) join...
Cheers,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
BTW, I'm using standalone deployment (The name standalone deployment for
cluster, is kind of misleading... I think the doc needs to be updated.
It's not really standalone, but plain spark only deployment)
Thx,
cody
On Sun, Sep 28, 2014 at 12:36 AM, codeoedoc wrote:
> Hi guys,
>
> This is a spa
Thank you for your reply, and your tips on code refactoring is helpful, after a
second look on the code, the casts and null check is really unnecessary.
qinwei
From: Sean OwenDate: 2014-09-28 15:03To: qinweiCC: userSubject: Re: problem
with patitioning(Most of this code is not relevant t
Hi guys,
This is a spark fresh user...
I'm trying to setup a spark cluster with multiple nodes, starting with 2.
With one node, it is working fine. When I get a slave node, slave is able
to register to the master node. However when I launch a spark shell, and
when the executor is launched on the
(Most of this code is not relevant to the question and can be refactored
too. The casts and null checks look unnecessary.)
You are unioning RDDs so you have a result with the sum of their
partitions. The number of partitions is really a hint to Hadoop only so it
is not even necessarily 3 x 1920.
25 matches
Mail list logo