答复: 答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-08-11 Thread Yan Zhou.sc
We are using MR-based bulk loading on Spark. For filter pushdown, Astro does partition-pruning, scan range pruning, and use Gets as much as possible. Thanks, 发件人: Ted Malaska [mailto:ted.mala...@cloudera.com] 发送时间: 2015年8月12日 9:14 收件人: Yan Zhou.sc 抄送: dev@spark.apache.org; Bing Xiao (Bing

RE: 答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-08-11 Thread Yan Zhou.sc
No, Astro bulkloader does not use its own shuffle. But map/reduce-side processing is somewhat different from HBase’s bulk loader that are used by many HBase apps I believe. From: Ted Malaska [mailto:ted.mala...@cloudera.com] Sent: Wednesday, August 12, 2015 8:56 AM To: Yan Zhou.sc Cc: dev

RE: 答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-08-11 Thread Yan Zhou.sc
To: Yan Zhou.sc Cc: user; dev@spark.apache.org; Bing Xiao (Bing); Ted Yu Subject: RE: 答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro" Hey Yan, I've been the one building out this spark functionality in hbase so maybe I can help clarify. The hbase-spark module is

RE: 答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-08-11 Thread Yan Zhou.sc
in some coprocessor/custom filter combos), and add support of querying string columns in HBase as integers from Astro. Thanks, From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Wednesday, August 12, 2015 7:02 AM To: Yan Zhou.sc Cc: Bing Xiao (Bing); dev@spark.apache.org; u...@spark.apache.org

答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-08-11 Thread Yan Zhou.sc
integration with Spark. It will be interesting to see performance comparisons when HBase-14181 is ready. Thanks, From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday, August 11, 2015 3:28 PM To: Yan Zhou.sc Cc: Bing Xiao (Bing); dev@spark.apache.org; u...@spark.apache.org Subject: Re: 答复: Package

答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-08-11 Thread Yan Zhou.sc
[mailto:yuzhih...@gmail.com] 发送时间: 2015年8月11日 15:28 收件人: Yan Zhou.sc 抄送: Bing Xiao (Bing); dev@spark.apache.org; u...@spark.apache.org 主题: Re: 答复: Package Release Annoucement: Spark SQL on HBase "Astro" HBase will not have query engine. It will provide better support to query engines. Cheers On Au

答复: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-08-10 Thread Yan Zhou.sc
om] 发送时间: 2015年8月11日 8:54 收件人: Bing Xiao (Bing) 抄送: dev@spark.apache.org; u...@spark.apache.org; Yan Zhou.sc 主题: Re: Package Release Annoucement: Spark SQL on HBase "Astro" Yan / Bing: Mind taking a look at HBASE-14181<https://issues.apache.org/jira/browse/HBASE-14181> 'Add

RE: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-08-03 Thread Yan Zhou.sc
...@spark.apache.org; Yan Zhou.sc Subject: Re: Package Release Annoucement: Spark SQL on HBase "Astro" When I tried to compile against hbase 1.1.1, I got: [ERROR] /home/hbase/ssoh/src/main/scala/org/apache/spark/sql/hbase/SparkSqlRegionObserver.scala:124: overloaded method next needs result t

RE: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-07-27 Thread Yan Zhou.sc
imizer-> HBase Scans/Gets -> … -> HBase Region server Regards, Yan From: Debasish Das [mailto:debasish.da...@gmail.com] Sent: Monday, July 27, 2015 10:02 PM To: Yan Zhou.sc Cc: Bing Xiao (Bing); dev; user Subject: RE: Package Release Annoucement: Spark SQL on HBase "Astro" H

RE: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-07-22 Thread Yan Zhou.sc
Yes, but not all SQL-standard insert variants . From: Debasish Das [mailto:debasish.da...@gmail.com] Sent: Wednesday, July 22, 2015 7:36 PM To: Bing Xiao (Bing) Cc: user; dev; Yan Zhou.sc Subject: Re: Package Release Annoucement: Spark SQL on HBase "Astro" Does it also support insert

RE: python converter in HBaseConverter.scala(spark/examples)

2015-01-05 Thread Yan Zhou.sc
We are planning to support HBase as a "native" data source to Spark SQL in 1.3 (SPARK-3880). More details will come soon. -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Monday, January 05, 2015 7:37 AM To: tgbaggio Cc: dev@spark.apache.org Subject: Re: python conver

RE: NullWritable not serializable

2014-09-16 Thread Yan Zhou.sc
There appears to be a newly added Boolean in DAGScheduler default to "False": private val localExecutionEnabled = sc.getConf.getBoolean("spark.localExecution.enabled", false) Then val shouldRunLocally = localExecutionEnabled && allowLocal && finalStage.parents.isEmpty && partitions.len

RE: Spark SQL Query and join different data sources.

2014-08-21 Thread Yan Zhou.sc
I doubt it will work as expected. Note that hiveContext.hql("select ...").regAsTable("a") will create a SchemaRDD before register the SchemaRDD with the (Hive) catalog; While sqlContext.jsonFile("xxx").regAsTable("b") will create a SchemaRDD before register the SchemaRDD with the SparkSQL catalo

RE: pre-filtered hadoop RDD use case

2014-07-29 Thread Yan Zhou.sc
is something we'd want to do in a hurry, because there is a clear workaround right now (subclass RDD) and it is very hard to change that once the project is committed to that API. On Tue, Jul 29, 2014 at 11:35 AM, Yan Zhou.sc wrote: > PartitionPruningRDD.scala still only handles, as sai

RE: pre-filtered hadoop RDD use case

2014-07-29 Thread Yan Zhou.sc
PartitionPruningRDD.scala still only handles, as said, the partition portion of the issue. On the "record pruning" portion, although cheap fixes could be available for this issue as reported, but I believe a fundamental issue is lack of a mechanism of processing merging/pushdown. Given the pop

RE: IntelliJ IDEA cannot compile TreeNode.scala

2014-06-27 Thread Yan Zhou.sc
One question, then, is what to use to debug Spark if Intellij can only be used for code browsing for the sake of unresolved symbols as mentioned by Ron? More specifically, if one builds from command line, but would like to debug a running Spark from a IDE, Intellij, e.g., what could he do? Anot

Opiq for SParkSQL?

2014-06-06 Thread Yan Zhou.sc
Can anybody share your thoughts/comments/interests of applicability of the "optiq" framework to Spark, and SparkSQL in particular? Thanks,