from:"Wenlei Xie"

Re: [SPAM] Customized Aggregation Query on Spark SQL

2015-04-30 Thread Wenlei Xie

uot;) > JN = ssc.sql("select t.name,t.age,t.other from tab t inner join > (select name,max(age) age from tab group by name) t1 on t.name=t1.name > and t.age=t1.age") > for i in JN.collect(): > print i > > Result: > Row(name=u'A', age=30, oth

Re: Super slow caching in 1.3?

2015-04-27 Thread Wenlei Xie

-- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Wenlei Xie (谢文磊) Ph.D. Candidate Department of Computer Science 456 Gates Hall, Cornell University Ithaca, NY 14853, USA Email: wenlei@gmail.com

Automatic Cache in SparkSQL

2015-04-27 Thread Wenlei Xie

Hi, I am trying to answer a simple query with SparkSQL over the Parquet file. When execute the query several times, the first run will take about 2s while the later run will take <0.1s. By looking at the log file it seems the later runs doesn't load the data from disk. However, I didn't enable an

Understand the running time of SparkSQL queries

2015-04-26 Thread Wenlei Xie

Hi, I am wondering how should we understand the running time of SparkSQL queries? For example the physical query plan and the running time on each stage? Is there any guide talking about this? Thank you! Best, Wenlei

Customized Aggregation Query on Spark SQL

2015-04-24 Thread Wenlei Xie

Hi, I would like to answer the following customized aggregation query on Spark SQL 1. Group the table by the value of Name 2. For each group, choose the tuple with the max value of Age (the ages are distinct for every name) I am wondering what's the best way to do it on Spark SQL? Should I use UD

Re: Number of input partitions in SparkContext.sequenceFile

2015-04-24 Thread Wenlei Xie

ards, > Archit Thakur. > > On Sat, Apr 18, 2015 at 6:20 PM, Wenlei Xie wrote: > >> Hi, >> >> I am wondering the mechanism that determines the number of partitions >> created by SparkContext.sequenceFile ? >> >> For example, although my file has onl

Re: Creating a Row in SparkSQL 1.2 from ArrayList

2015-04-24 Thread Wenlei Xie

Use Object[] in Java just works :). On Fri, Apr 24, 2015 at 4:56 PM, Wenlei Xie wrote: > Hi, > > I am wondering if there is any way to create a Row in SparkSQL 1.2 in Java > by using an List? It looks like > > ArrayList something; > Row.create(something) > > will crea

Creating a Row in SparkSQL 1.2 from ArrayList

2015-04-24 Thread Wenlei Xie

Hi, I am wondering if there is any way to create a Row in SparkSQL 1.2 in Java by using an List? It looks like ArrayList something; Row.create(something) will create a row with single column (and the single column contains the array) Best, Wenlei

Number of input partitions in SparkContext.sequenceFile

2015-04-18 Thread Wenlei Xie

Hi, I am wondering the mechanism that determines the number of partitions created by SparkContext.sequenceFile ? For example, although my file has only 4 splits, Spark would create 16 partitions for it. Is it determined by the file size? Is there any way to control it? (Looks like I can only tune

CPU Usage for Spark Local Mode

2015-04-04 Thread Wenlei Xie

Hi, I am currently testing my application with Spark under local mode, and I set the master to be local[4]. One thing I note is that when there is groupBy/reduceBy operation involved, the CPU usage can sometimes be around 600% to 800%. I am wondering if this is expected? (As only 4 worker threads

Re: [SPAM] Customized Aggregation Query on Spark SQL

Re: Super slow caching in 1.3?

Automatic Cache in SparkSQL

Understand the running time of SparkSQL queries

Customized Aggregation Query on Spark SQL

Re: Number of input partitions in SparkContext.sequenceFile

Re: Creating a Row in SparkSQL 1.2 from ArrayList

Creating a Row in SparkSQL 1.2 from ArrayList

Number of input partitions in SparkContext.sequenceFile

CPU Usage for Spark Local Mode

10 matches

Site Navigation

Mail list logo

Footer information