We are using MR-based bulk loading on Spark.
For filter pushdown, Astro does partition-pruning, scan range pruning, and use
Gets as much as possible.
Thanks,
发件人: Ted Malaska [mailto:ted.mala...@cloudera.com]
发送时间: 2015年8月12日 9:14
收件人: Yan Zhou.sc
抄送: dev@spark.apache.org; Bing Xiao (Bing
No, Astro bulkloader does not use its own shuffle. But map/reduce-side
processing is somewhat different from HBase’s bulk loader that are used by many
HBase apps I believe.
From: Ted Malaska [mailto:ted.mala...@cloudera.com]
Sent: Wednesday, August 12, 2015 8:56 AM
To: Yan Zhou.sc
Cc: dev
To: Yan Zhou.sc
Cc: user; dev@spark.apache.org; Bing Xiao (Bing); Ted Yu
Subject: RE: 答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro"
Hey Yan,
I've been the one building out this spark functionality in hbase so maybe I can
help clarify.
The hbase-spark module is
in some coprocessor/custom filter combos), and add support of querying
string columns in HBase as integers from Astro.
Thanks,
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Wednesday, August 12, 2015 7:02 AM
To: Yan Zhou.sc
Cc: Bing Xiao (Bing); dev@spark.apache.org; u...@spark.apache.org
integration with Spark.
It will be interesting to see performance comparisons when HBase-14181 is ready.
Thanks,
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Tuesday, August 11, 2015 3:28 PM
To: Yan Zhou.sc
Cc: Bing Xiao (Bing); dev@spark.apache.org; u...@spark.apache.org
Subject: Re: 答复: Package
[mailto:yuzhih...@gmail.com]
发送时间: 2015年8月11日 15:28
收件人: Yan Zhou.sc
抄送: Bing Xiao (Bing); dev@spark.apache.org; u...@spark.apache.org
主题: Re: 答复: Package Release Annoucement: Spark SQL on HBase "Astro"
HBase will not have query engine.
It will provide better support to query engines.
Cheers
On Au
om]
发送时间: 2015年8月11日 8:54
收件人: Bing Xiao (Bing)
抄送: dev@spark.apache.org; u...@spark.apache.org; Yan Zhou.sc
主题: Re: Package Release Annoucement: Spark SQL on HBase "Astro"
Yan / Bing:
Mind taking a look at
HBASE-14181<https://issues.apache.org/jira/browse/HBASE-14181> 'Add
...@spark.apache.org; Yan Zhou.sc
Subject: Re: Package Release Annoucement: Spark SQL on HBase "Astro"
When I tried to compile against hbase 1.1.1, I got:
[ERROR]
/home/hbase/ssoh/src/main/scala/org/apache/spark/sql/hbase/SparkSqlRegionObserver.scala:124:
overloaded method next needs result t
imizer-> HBase Scans/Gets -> … -> HBase Region server
Regards,
Yan
From: Debasish Das [mailto:debasish.da...@gmail.com]
Sent: Monday, July 27, 2015 10:02 PM
To: Yan Zhou.sc
Cc: Bing Xiao (Bing); dev; user
Subject: RE: Package Release Annoucement: Spark SQL on HBase "Astro"
H
Yes, but not all SQL-standard insert variants .
From: Debasish Das [mailto:debasish.da...@gmail.com]
Sent: Wednesday, July 22, 2015 7:36 PM
To: Bing Xiao (Bing)
Cc: user; dev; Yan Zhou.sc
Subject: Re: Package Release Annoucement: Spark SQL on HBase "Astro"
Does it also support insert
We are planning to support HBase as a "native" data source to Spark SQL in 1.3
(SPARK-3880).
More details will come soon.
-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Monday, January 05, 2015 7:37 AM
To: tgbaggio
Cc: dev@spark.apache.org
Subject: Re: python conver
There appears to be a newly added Boolean in DAGScheduler default to "False":
private val localExecutionEnabled =
sc.getConf.getBoolean("spark.localExecution.enabled", false)
Then
val shouldRunLocally =
localExecutionEnabled && allowLocal && finalStage.parents.isEmpty &&
partitions.len
I doubt it will work as expected.
Note that hiveContext.hql("select ...").regAsTable("a") will create a SchemaRDD
before register the SchemaRDD with the (Hive) catalog;
While sqlContext.jsonFile("xxx").regAsTable("b") will create a SchemaRDD before
register the SchemaRDD with the SparkSQL catalo
is something we'd want to do
in a hurry, because there is a clear workaround right now (subclass RDD) and it
is very hard to change that once the project is committed to that API.
On Tue, Jul 29, 2014 at 11:35 AM, Yan Zhou.sc
wrote:
> PartitionPruningRDD.scala still only handles, as sai
PartitionPruningRDD.scala still only handles, as said, the partition portion of
the issue.
On the "record pruning" portion, although cheap fixes could be available for
this issue as reported, but I believe a
fundamental issue is lack of a mechanism of processing merging/pushdown. Given
the pop
One question, then, is what to use to debug Spark if Intellij can only be used
for code browsing for the sake of unresolved symbols as mentioned by Ron?
More specifically, if one builds from command line, but would like to debug a
running Spark from a IDE, Intellij, e.g., what could he do?
Anot
Can anybody share your thoughts/comments/interests of applicability of the
"optiq" framework to Spark, and SparkSQL in particular?
Thanks,
17 matches
Mail list logo