uot;)
> JN = ssc.sql("select t.name,t.age,t.other from tab t inner join
> (select name,max(age) age from tab group by name) t1 on t.name=t1.name
> and t.age=t1.age")
> for i in JN.collect():
> print i
>
> Result:
> Row(name=u'A', age=30, oth
--
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
--
Wenlei Xie (谢文磊)
Ph.D. Candidate
Department of Computer Science
456 Gates Hall, Cornell University
Ithaca, NY 14853, USA
Email: wenlei@gmail.com
Hi,
I am trying to answer a simple query with SparkSQL over the Parquet file.
When execute the query several times, the first run will take about 2s
while the later run will take <0.1s.
By looking at the log file it seems the later runs doesn't load the data
from disk. However, I didn't enable an
Hi,
I am wondering how should we understand the running time of SparkSQL
queries? For example the physical query plan and the running time on each
stage? Is there any guide talking about this?
Thank you!
Best,
Wenlei
Hi,
I would like to answer the following customized aggregation query on Spark
SQL
1. Group the table by the value of Name
2. For each group, choose the tuple with the max value of Age (the ages are
distinct for every name)
I am wondering what's the best way to do it on Spark SQL? Should I use
UD
ards,
> Archit Thakur.
>
> On Sat, Apr 18, 2015 at 6:20 PM, Wenlei Xie wrote:
>
>> Hi,
>>
>> I am wondering the mechanism that determines the number of partitions
>> created by SparkContext.sequenceFile ?
>>
>> For example, although my file has onl
Use Object[] in Java just works :).
On Fri, Apr 24, 2015 at 4:56 PM, Wenlei Xie wrote:
> Hi,
>
> I am wondering if there is any way to create a Row in SparkSQL 1.2 in Java
> by using an List? It looks like
>
> ArrayList something;
> Row.create(something)
>
> will crea
Hi,
I am wondering if there is any way to create a Row in SparkSQL 1.2 in Java
by using an List? It looks like
ArrayList something;
Row.create(something)
will create a row with single column (and the single column contains the
array)
Best,
Wenlei
Hi,
I am wondering the mechanism that determines the number of partitions
created by SparkContext.sequenceFile ?
For example, although my file has only 4 splits, Spark would create 16
partitions for it. Is it determined by the file size? Is there any way to
control it? (Looks like I can only tune
Hi,
I am currently testing my application with Spark under local mode, and I
set the master to be local[4]. One thing I note is that when there is
groupBy/reduceBy operation involved, the CPU usage can sometimes be around
600% to 800%. I am wondering if this is expected? (As only 4 worker threads
10 matches
Mail list logo