date:20160128

Persisting of DataFrames in transformation workflows

2016-01-28 Thread Gireesh Puthumana

Hi All, I am trying to run a series of transformation over 3 DataFrames. After each transformation, I want to persist DF and save it to text file. The steps I am doing is as follows. *Step0:* Create DF1 Create DF2 Create DF3 Create DF4 (no persist no save yet) *Step1:* Create RESULT-DF1 by joini

Re: Data not getting printed in Spark Streaming with print().

2016-01-28 Thread Shixiong(Ryan) Zhu

fileStream has a parameter "newFilesOnly". By default, it's true and means processing only new files and ignore existing files in the directory. So you need to ***move*** the files into the directory, otherwise it will ignore existing files. You can also set "newFilesOnly" to false. Then in the fi

Data not getting printed in Spark Streaming with print().

2016-01-28 Thread satyajit vegesna

HI All, I am trying to run HdfsWordCount example from github. https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala i am using ubuntu to run the program, but dont see any data getting printed after , --

Re: build error: code too big: specialStateTransition(int, IntStream)

2016-01-28 Thread Herman van Hövell tot Westerflier

Hi, I have only encountered 'code too large' errors when changing grammars. I am using SBT/Idea, no Eclipse. The size of an ANTLR Parser/Lexer is dependent on the rules inside the source grammar and the rules it depends on. So we should take a look at the IdentifiersParser.g/ExpressionParser.g; t

Re: build error: code too big: specialStateTransition(int, IntStream)

2016-01-28 Thread Iulian Dragoș

Thanks for the pointer. It seems to be really a pathological case, since the file that's in error is part of the splinter file (the smaller one, IndetifiersParser). I'll see if I can work around by splitting it some more. iulian On Thu, Jan 28, 2016 at 4:43 PM, Ted Yu wrote: > After this change

Re: build error: code too big: specialStateTransition(int, IntStream)

2016-01-28 Thread Ted Yu

After this change: [SPARK-12681] [SQL] split IdentifiersParser.g into two files the biggest file under sql/catalyst/src/main/antlr3/org/apache/spark/sql/catalyst/parser is SparkSqlParser.g Maybe split SparkSqlParser.g up as well ? On Thu, Jan 28, 2016 at 5:21 AM, Iulian Dragoș wrote: > Hi,

回复：回复： Spark 1.6.0 + Hive + HBase

2016-01-28 Thread 开心延年

Thanks Ted ,I will try on this version. -- 原始邮件 -- 发件人: "Ted Yu";; 发送时间: 2016年1月28日(星期四) 晚上11:35 收件人: "开心延年"; 抄送: "Jörn Franke"; "Julio Antonio Soto de Vicente"; "Maciej Bryński"; "dev"; 主题: Re: 回复： Spark 1.6.0 + Hive + HBase Under sql/hive/src/main/scala/

Re: 回复： Spark 1.6.0 + Hive + HBase

2016-01-28 Thread Ted Yu

Under sql/hive/src/main/scala/org/apache/spark/sql/hive/execution , I only see HiveTableScan and HiveNativeCommand At the beginning of HiveTableScan : * The Hive table scan operator. Column and partition pruning are both handled. Looks like filter pushdown hasn't been implemented. As far as I

Heuristics for Partitioning Non-Local Data

2016-01-28 Thread Hamel Kothari

Hey spark-devs, I'm in the process of writing a DataSource for what is essentially a java web service. Each relation which we create will consist of a series of queries to this webservice which returns a pretty much known amount of data (eg. 2000 rows, 5 string columns or similar which we can calc

build error: code too big: specialStateTransition(int, IntStream)

2016-01-28 Thread Iulian Dragoș

Hi, Has anyone seen this error? The code of method specialStateTransition(int, IntStream) is exceeding the 65535 bytes limitSparkSqlParser_IdentifiersParser.java:39907 The error is in ANTLR generated files and it’s (according to Stack Overflow) due to state explosion in parser (or lexer). Th

回复：回复： Spark 1.6.0 + Hive + HBase

2016-01-28 Thread 开心延年

This not hive`s bug .I test hive on my storage is ok. but when i test it on spark-sql is not pass TableScanDesc.FILTER_EXPR_CONF_STR params; so that is the reason cause the full scan. the source code in HiveHBaseTableInputFormat is as follows,that is the reason caused full scan. private Sca

Re: 回复： Spark 1.6.0 + Hive + HBase

2016-01-28 Thread Jörn Franke

Probably a newer Hive version makes a lot of sense here - at least 1.2.1. What storage format are you using? I think the old Hive version had a bug where it always scanned all partitions unless you limit it in the on clause of the query to a certain partition (eg on date=20201119) > On 28 Jan 2

??????Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?

2016-01-28 Thread ????????

If we support TableScanDesc.FILTER_EXPR_CONF_STR like hive we may write sql LIKE this select ydb_sex from ydb_example_shu where ydbpartion='20151110' limit 10 select ydb_sex from ydb_example_shu where ydbpartion='20151110' and (ydb_sex='??' or ydb_province='' or ydb_day>='20151217') limit

?????? Spark 1.6.0 + Hive + HBase

2016-01-28 Thread ????????

Is there any body can solve Problem 4)? thanks. Problem 4) Spark don't push down predicates for HiveTableScan, which means that every query is full scan. -- -- ??: "Julio Antonio Soto de Vicente";; : 2016??1??28??(??) 8:09 ??: "

??????Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?

2016-01-28 Thread ????????

we always used Sql like below. select count(*) from ydb_example_shu where ydbpartion='20151110' and (ydb_sex='' or ydb_province='LIAONING' or ydb_day>='20151217') limit 10 Spark don't push down predicates for TableScanDesc.FILTER_EXPR_CONF_STR, which means that every query is full scan can`t us

Re: Spark 1.6.0 + Hive + HBase

2016-01-28 Thread Julio Antonio Soto de Vicente

Hi, Indeed, Hive is not able to perform predicate pushdown through a HBase table. Nor Hive or Impala can. Broadly speaking, if you need to query your HBase table through a field other than de rowkey: A) Try to "encode" as much info as possible in the rowkey field and use it as your predicate

Re: Spark 1.6.0 + Hive + HBase

2016-01-28 Thread Maciej Bryński

Ted, You're right. hbase-site.xml resolved problems 2 and 3, but... Problem 4) Spark don't push down predicates for HiveTableScan, which means that every query is full scan. == Physical Plan == TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[count#144L]) +- T

Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?

2016-01-28 Thread ????????

Dear spark I am test StorageHandler on Spark-SQL. but i find the TableScanDesc.FILTER_EXPR_CONF_STR is miss ,but i need it ,is three any where i could found it? I really want to get some filter information from Spark Sql, so that I could make a pre filter by my Index ; so where is the TableScanD

FPGrowth: adding a stopping criterion (max. literal length or itemset count)

2016-01-28 Thread Tomas Kliegr

Hi all, Could anyone provide pointers on how to extend the SPARK FPGrowth implementation with either of the following stopping criteria: * maximum number of generated itemsets, * maximum length of generated itemsets (i.e. number of items in itemset). The second criterion is e.g. available in th

Re: Spark 1.6.0 + Hive + HBase

2016-01-28 Thread Ted Yu

For the last two problems, hbase-site.xml seems not to be on classpath. Once hbase-site.xml is put on classpath, you should be able to make progress. Cheers > On Jan 28, 2016, at 1:14 AM, Maciej Bryński wrote: > > Hi, > I'm trying to run SQL query on Hive table which is stored on HBase. > I'

Spark 1.6.0 + Hive + HBase

2016-01-28 Thread Maciej Bryński

Hi, I'm trying to run SQL query on Hive table which is stored on HBase. I'm using: - Spark 1.6.0 - HDP 2.2 - Hive 0.14.0 - HBase 0.98.4 I managed to configure working classpath, but I have following problems: 1) I have UDF defined in Hive Metastore (FUNCS table). Spark cannot use it.. File "/op

Persisting of DataFrames in transformation workflows

Re: Data not getting printed in Spark Streaming with print().

Data not getting printed in Spark Streaming with print().

Re: build error: code too big: specialStateTransition(int, IntStream)

Re: build error: code too big: specialStateTransition(int, IntStream)

Re: build error: code too big: specialStateTransition(int, IntStream)

回复：回复： Spark 1.6.0 + Hive + HBase

Re: 回复： Spark 1.6.0 + Hive + HBase

Heuristics for Partitioning Non-Local Data

build error: code too big: specialStateTransition(int, IntStream)

回复：回复： Spark 1.6.0 + Hive + HBase

Re: 回复： Spark 1.6.0 + Hive + HBase

??????Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?

?????? Spark 1.6.0 + Hive + HBase

??????Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?

Re: Spark 1.6.0 + Hive + HBase

Re: Spark 1.6.0 + Hive + HBase

Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?

FPGrowth: adding a stopping criterion (max. literal length or itemset count)

Re: Spark 1.6.0 + Hive + HBase

Spark 1.6.0 + Hive + HBase

21 matches

Site Navigation

Mail list logo

Footer information