date:20140928

Re: yarn does not accept job in cluster mode

2014-09-28 Thread Akhil Das

Can you try running the spark-shell in yarn-cluster mode? ./bin/spark-shell --master yarn-client Read more over here http://spark.apache.org/docs/1.0.0/running-on-yarn.html Thanks Best Regards On Sun, Sep 28, 2014 at 7:08 AM, jamborta wrote: > hi all, > > I have a job that works ok in yarn-cl

Re: Using one sql query's result inside another sql query

2014-09-28 Thread Cheng Lian

This workaround looks good to me. In this way, all queries are still executed lazily within a single DAG, and Spark SQL is capable to optimize the query plan as a whole. On 9/29/14 11:26 AM, twinkle sachdeva wrote: Thanks Cheng. For the time being , As a work around, I had applied the schema

Re: Using one sql query's result inside another sql query

2014-09-28 Thread twinkle sachdeva

Thanks Cheng. For the time being , As a work around, I had applied the schema to Queryresult1, and then registered the result as temp table. Although that works, but I was not sure of performance impact, as that might block some optimisation in some scenarios. This flow (on spark 1.1 ) works: r

Re: Kinesis receiver & spark streaming partition

2014-09-28 Thread Wei Liu

Chris, Think I will check back with you to see if you made progress on this issue. Any good news so far? Thanks. Once again, I really appreciate you look into this issue. Thanks, Wei On Thu, Aug 28, 2014 at 4:44 PM, Chris Fregly wrote: > great question, wei. this is very important to understa

Re: spark multi-node cluster

2014-09-28 Thread codeoedoc

Figured this out... documented here and hope can help others: http://koobehub.wordpress.com/2014/09/29/spark-the-standalone-cluster-deployment/ On Sun, Sep 28, 2014 at 12:36 AM, codeoedoc wrote: > Hi guys, > > This is a spark fresh user... > > I'm trying to setup a spark cluster with multiple no

Re: Spark SQL question: how to control the storage level of cached SchemaRDD?

2014-09-28 Thread Michael Armbrust

You might consider instead storing the data using saveAsParquetFile and then querying that after running sqlContext.parquetFile(...).registerTempTable(...). On Sun, Sep 28, 2014 at 6:43 PM, Michael Armbrust wrote: > This is not possible until https://github.com/apache/spark/pull/2501 is > merged

Re: Spark SQL question: how to control the storage level of cached SchemaRDD?

2014-09-28 Thread Michael Armbrust

This is not possible until https://github.com/apache/spark/pull/2501 is merged. On Sun, Sep 28, 2014 at 6:39 PM, Haopu Wang wrote: > Thanks for the response. From Spark Web-UI's Storage tab, I do see > cached RDD there. > > > > But the storage level is "Memory Deserialized 1x Replicated". How

Spark SQL question: how to control the storage level of cached SchemaRDD?

2014-09-28 Thread Haopu Wang

Thanks for the response. From Spark Web-UI's Storage tab, I do see cached RDD there. But the storage level is "Memory Deserialized 1x Replicated". How can I change the storage level? Because I have a big table there. Thanks! From: Cheng Lian [mailto:l

Re: driver memory management

2014-09-28 Thread Reynold Xin

The storage fraction only limits the amount of memory used for storage. It doesn't actually limit anything else. I.e you can use all the memory if you want in collect. On Sunday, September 28, 2014, Brad Miller wrote: > Hi All, > > I am interested to collect() a large RDD so that I can run a lea

Spark meetup on Oct 15 in NYC

2014-09-28 Thread Reynold Xin

Hi Spark users and developers, Some of the most active Spark developers (including Matei Zaharia, Michael Armbrust, Joseph Bradley, TD, Paco Nathan, and me) will be in NYC for Strata NYC. We are working with the Spark NYC meetup group and Bloomberg to host a meetup event. This might be the event w

Re: view not supported in spark thrift server?

2014-09-28 Thread Du Li

Thanks, Michael, for your quick response. View is critical for my project that is migrating from shark to spark SQL. I have implemented and tested everything else. It would be perfect if view could be implemented soon. Du From: Michael Armbrust mailto:mich...@databricks.com>> Date: Sunday, Se

Re: view not supported in spark thrift server?

2014-09-28 Thread Michael Armbrust

Views are not supported yet. Its not currently on the near term roadmap, but that can change if there is sufficient demand or someone in the community is interested in implementing them. I do not think it would be very hard. Michael On Sun, Sep 28, 2014 at 11:59 AM, Du Li wrote: > > Can anyb

view not supported in spark thrift server?

2014-09-28 Thread Du Li

Can anybody confirm whether or not view is currently supported in spark? I found “create view translate” in the blacklist of HiveCompatibilitySuite.scala and also the following scenario threw NullPointerException on beeline/thriftserver (1.1.0). Any plan to support it soon? > create table src

driver memory management

2014-09-28 Thread Brad Miller

Hi All, I am interested to collect() a large RDD so that I can run a learning algorithm on it. I've noticed that when I don't increase SPARK_DRIVER_MEMORY I can run out of memory. I've also noticed that it looks like the same fraction of memory is reserved for storage on the driver as on the work

Re: SparkSQL: map type MatchError when inserting into Hive table

2014-09-28 Thread Du Li

It turned out a bug in my code. In the select clause the list of fields is misaligned with the schema of the target table. As a consequence the map data couldn’t be cast to some other type in the schema. Thanks anyway. On 9/26/14, 8:08 PM, "Cheng Lian" wrote: >Would you mind to provide the DDL

Re: How to do broadcast join in SparkSQL

2014-09-28 Thread Jianshi Huang

Yes, looks like it can only be controlled by the parameter spark.sql.autoBroadcastJoinThreshold, which is a little bit weird to me. How am I suppose to know the exact bytes of a table? Let me specify the join algorithm is preferred I think. Jianshi On Sun, Sep 28, 2014 at 11:57 PM, Ted Yu wrote

Re: How to do broadcast join in SparkSQL

2014-09-28 Thread Ted Yu

Have you looked at SPARK-1800 ? e.g. see sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala Cheers On Sun, Sep 28, 2014 at 1:55 AM, Jianshi Huang wrote: > I cannot find it in the documentation. And I have a dozen dimension tables > to (left) join... > > > Cheers, > -- > Jianshi Huang

[SF Machine Learning meetup] talk by Prof. C J Lin, large-scale linear classification: status and changllenges

2014-09-28 Thread Chester At Work

All Sorry this is spark related, but I thought some of you in San Francisco might be interested in this talk. We announced this talk recently, it will be at the end of next month (oct) http://www.meetup.com/sfmachinelearning/events/208078582/ Prof CJ Lin is famous for his work on libsvm an

[MLlib] LogisticRegressionWithSGD and LogisticRegressionWithLBFGS converge with different weights.

2014-09-28 Thread Yanbo Liang

Hi We have used LogisticRegression with two different optimization method SGD and LBFGS in MLlib. With the same dataset and the same training and test split, but get different weights vector. For example, we use spark-1.1.0/data/mllib/sample_binary_classification_data.txt as our training and test

Re: Build spark with Intellij IDEA 13

2014-09-28 Thread Yi Tian

Hi If you want IDEA compile your spark project (version 1.0.0 and above), you should do it with following steps. 1 clone spark project 2 use mvn to compile your spark project ( because you need the generated avro source file in flume-sink module) 3 open spark/pom.xml with IDEA 4 check profiles

How to do broadcast join in SparkSQL

2014-09-28 Thread Jianshi Huang

I cannot find it in the documentation. And I have a dozen dimension tables to (left) join... Cheers, -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/

Re: spark multi-node cluster

2014-09-28 Thread codeoedoc

BTW, I'm using standalone deployment (The name standalone deployment for cluster, is kind of misleading... I think the doc needs to be updated. It's not really standalone, but plain spark only deployment) Thx, cody On Sun, Sep 28, 2014 at 12:36 AM, codeoedoc wrote: > Hi guys, > > This is a spa

Re: Re: problem with patitioning

2014-09-28 Thread qinwei

Thank you for your reply, and your tips on code refactoring is helpful, after a second look on the code, the casts and null check is really unnecessary. qinwei From: Sean OwenDate: 2014-09-28 15:03To: qinweiCC: userSubject: Re: problem with patitioning(Most of this code is not relevant t

spark multi-node cluster

2014-09-28 Thread codeoedoc

Hi guys, This is a spark fresh user... I'm trying to setup a spark cluster with multiple nodes, starting with 2. With one node, it is working fine. When I get a slave node, slave is able to register to the master node. However when I launch a spark shell, and when the executor is launched on the

Re: problem with patitioning

2014-09-28 Thread Sean Owen

(Most of this code is not relevant to the question and can be refactored too. The casts and null checks look unnecessary.) You are unioning RDDs so you have a result with the sum of their partitions. The number of partitions is really a hint to Hadoop only so it is not even necessarily 3 x 1920.

Re: yarn does not accept job in cluster mode

Re: Using one sql query's result inside another sql query

Re: Using one sql query's result inside another sql query

Re: Kinesis receiver & spark streaming partition

Re: spark multi-node cluster

Re: Spark SQL question: how to control the storage level of cached SchemaRDD?

Re: Spark SQL question: how to control the storage level of cached SchemaRDD?

Spark SQL question: how to control the storage level of cached SchemaRDD?

Re: driver memory management

Spark meetup on Oct 15 in NYC

Re: view not supported in spark thrift server?

Re: view not supported in spark thrift server?

view not supported in spark thrift server?

driver memory management

Re: SparkSQL: map type MatchError when inserting into Hive table

Re: How to do broadcast join in SparkSQL

Re: How to do broadcast join in SparkSQL

[SF Machine Learning meetup] talk by Prof. C J Lin, large-scale linear classification: status and changllenges

[MLlib] LogisticRegressionWithSGD and LogisticRegressionWithLBFGS converge with different weights.

Re: Build spark with Intellij IDEA 13

How to do broadcast join in SparkSQL

Re: spark multi-node cluster

Re: Re: problem with patitioning

spark multi-node cluster

Re: problem with patitioning

25 matches

Site Navigation

Mail list logo

Footer information