Hi All,
I am trying to run a series of transformation over 3 DataFrames. After each
transformation, I want to persist DF and save it to text file. The steps I
am doing is as follows.
*Step0:*
Create DF1
Create DF2
Create DF3
Create DF4
(no persist no save yet)
*Step1:*
Create RESULT-DF1 by joini
fileStream has a parameter "newFilesOnly". By default, it's true and means
processing only new files and ignore existing files in the directory. So
you need to ***move*** the files into the directory, otherwise it will
ignore existing files.
You can also set "newFilesOnly" to false. Then in the fi
HI All,
I am trying to run HdfsWordCount example from github.
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala
i am using ubuntu to run the program, but dont see any data getting printed
after ,
--
Hi,
I have only encountered 'code too large' errors when changing grammars. I
am using SBT/Idea, no Eclipse.
The size of an ANTLR Parser/Lexer is dependent on the rules inside the
source grammar and the rules it depends on. So we should take a look at the
IdentifiersParser.g/ExpressionParser.g; t
Thanks for the pointer. It seems to be really a pathological case, since
the file that's in error is part of the splinter file (the smaller one,
IndetifiersParser). I'll see if I can work around by splitting it some more.
iulian
On Thu, Jan 28, 2016 at 4:43 PM, Ted Yu wrote:
> After this change
After this change:
[SPARK-12681] [SQL] split IdentifiersParser.g into two files
the biggest file under
sql/catalyst/src/main/antlr3/org/apache/spark/sql/catalyst/parser is
SparkSqlParser.g
Maybe split SparkSqlParser.g up as well ?
On Thu, Jan 28, 2016 at 5:21 AM, Iulian Dragoș
wrote:
> Hi,
Thanks Ted ,I will try on this version.
-- 原始邮件 --
发件人: "Ted Yu";;
发送时间: 2016年1月28日(星期四) 晚上11:35
收件人: "开心延年";
抄送: "Jörn Franke"; "Julio Antonio Soto de
Vicente"; "Maciej Bryński";
"dev";
主题: Re: 回复: Spark 1.6.0 + Hive + HBase
Under sql/hive/src/main/scala/
Under sql/hive/src/main/scala/org/apache/spark/sql/hive/execution , I only
see HiveTableScan and HiveNativeCommand
At the beginning of HiveTableScan :
* The Hive table scan operator. Column and partition pruning are both
handled.
Looks like filter pushdown hasn't been implemented.
As far as I
Hey spark-devs,
I'm in the process of writing a DataSource for what is essentially a java
web service. Each relation which we create will consist of a series of
queries to this webservice which returns a pretty much known amount of data
(eg. 2000 rows, 5 string columns or similar which we can calc
Hi,
Has anyone seen this error?
The code of method specialStateTransition(int, IntStream) is exceeding
the 65535 bytes limitSparkSqlParser_IdentifiersParser.java:39907
The error is in ANTLR generated files and it’s (according to Stack
Overflow) due to state explosion in parser (or lexer). Th
This not hive`s bug .I test hive on my storage is ok.
but when i test it on spark-sql is not pass TableScanDesc.FILTER_EXPR_CONF_STR
params;
so that is the reason cause the full scan.
the source code in HiveHBaseTableInputFormat is as follows,that is the reason
caused full scan.
private Sca
Probably a newer Hive version makes a lot of sense here - at least 1.2.1. What
storage format are you using?
I think the old Hive version had a bug where it always scanned all partitions
unless you limit it in the on clause of the query to a certain partition (eg on
date=20201119)
> On 28 Jan 2
If we support TableScanDesc.FILTER_EXPR_CONF_STR like hive
we may write sql LIKE this
select ydb_sex from ydb_example_shu where ydbpartion='20151110' limit 10
select ydb_sex from ydb_example_shu where ydbpartion='20151110' and
(ydb_sex='??' or ydb_province='' or ydb_day>='20151217') limit
Is there any body can solve Problem 4)? thanks.
Problem 4)
Spark don't push down predicates for HiveTableScan, which means that every
query is full scan.
-- --
??: "Julio Antonio Soto de Vicente";;
: 2016??1??28??(??) 8:09
??: "
we always used Sql like below.
select count(*) from ydb_example_shu where ydbpartion='20151110' and
(ydb_sex='' or ydb_province='LIAONING' or ydb_day>='20151217') limit 10
Spark don't push down predicates for TableScanDesc.FILTER_EXPR_CONF_STR, which
means that every query is full scan can`t us
Hi,
Indeed, Hive is not able to perform predicate pushdown through a HBase table.
Nor Hive or Impala can.
Broadly speaking, if you need to query your HBase table through a field other
than de rowkey:
A) Try to "encode" as much info as possible in the rowkey field and use it as
your predicate
Ted,
You're right.
hbase-site.xml resolved problems 2 and 3, but...
Problem 4)
Spark don't push down predicates for HiveTableScan, which means that every
query is full scan.
== Physical Plan ==
TungstenAggregate(key=[],
functions=[(count(1),mode=Final,isDistinct=false)],
output=[count#144L])
+- T
Dear spark
I am test StorageHandler on Spark-SQL.
but i find the TableScanDesc.FILTER_EXPR_CONF_STR is miss ,but i need it ,is
three any where i could found it?
I really want to get some filter information from Spark Sql, so that I could
make a pre filter by my Index ;
so where is the
TableScanD
Hi all,
Could anyone provide pointers on how to extend the SPARK FPGrowth
implementation with either of the following stopping criteria:
* maximum number of generated itemsets,
* maximum length of generated itemsets (i.e. number of items in itemset).
The second criterion is e.g. available in th
For the last two problems, hbase-site.xml seems not to be on classpath.
Once hbase-site.xml is put on classpath, you should be able to make progress.
Cheers
> On Jan 28, 2016, at 1:14 AM, Maciej Bryński wrote:
>
> Hi,
> I'm trying to run SQL query on Hive table which is stored on HBase.
> I'
Hi,
I'm trying to run SQL query on Hive table which is stored on HBase.
I'm using:
- Spark 1.6.0
- HDP 2.2
- Hive 0.14.0
- HBase 0.98.4
I managed to configure working classpath, but I have following problems:
1) I have UDF defined in Hive Metastore (FUNCS table).
Spark cannot use it..
File "/op
21 matches
Mail list logo