Re: HiveContext is Serialized?

2016-10-26 Thread Mich Talebzadeh
Thanks Sean. I believe you are referring to below statement "You can't use the HiveContext or SparkContext in a distribution operation. It has nothing to do with for loops. The fact that they're serializable is misleading. It's there, I believe, because these objects may be inadvertently referen

Re: HiveContext is Serialized?

2016-10-26 Thread Sean Owen
Yes, but the question here is why the context objects are marked serializable when they are not meant to be sent somewhere as bytes. I tried to answer that apparent inconsistency below. On Wed, Oct 26, 2016, 10:21 Mich Talebzadeh wrote: > Hi, > > Sorry for asking this rather naïve question. > >

Re: HiveContext is Serialized?

2016-10-26 Thread Mich Talebzadeh
Hi, Sorry for asking this rather naïve question. The notion of serialisation in Spark and where it can be serialised or not. Does this generally refer to the concept of serialisation in the context of data storage? In this context for example with reference to RDD operations is it process of tra

Re: HiveContext is Serialized?

2016-10-26 Thread Sean Owen
It is the driver that has the info needed to schedule and manage distributed jobs and that is by design. This is narrowly about using the HiveContext or SparkContext directly. Of course SQL operations are distributed. On Wed, Oct 26, 2016, 10:03 Mich Talebzadeh wrote: > Hi Sean, > > Your point:

Re: HiveContext is Serialized?

2016-10-26 Thread ayan guha
In your use case, your dedf need not to be a data frame. You could use SC.textFile().collect. Even better you can just read off a local file, as your file is very small, unless you are planning to use yarn cluster mode. On 26 Oct 2016 16:43, "Ajay Chander" wrote: > Sean, thank you for making it c

Re: HiveContext is Serialized?

2016-10-26 Thread Mich Talebzadeh
Hi Sean, Your point: "You can't use the HiveContext or SparkContext in a distribution operation..." Is this because of design issue? Case in point if I created a DF from RDD and register it as a tempTable, does this imply that any sql calls on that table islocalised and not distributed among ex

Re: HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
Sean, thank you for making it clear. It was helpful. Regards, Ajay On Wednesday, October 26, 2016, Sean Owen wrote: > This usage is fine, because you are only using the HiveContext locally on > the driver. It's applied in a function that's used on a Scala collection. > > You can't use the HiveC

Re: HiveContext is Serialized?

2016-10-25 Thread Sunita Arvind
Thanks for the response Sean. I have seen the NPE on similar issues very consistently and assumed that could be the reason :) Thanks for clarifying. regards Sunita On Tue, Oct 25, 2016 at 10:11 PM, Sean Owen wrote: > This usage is fine, because you are only using the HiveContext locally on > the

Re: HiveContext is Serialized?

2016-10-25 Thread Sean Owen
This usage is fine, because you are only using the HiveContext locally on the driver. It's applied in a function that's used on a Scala collection. You can't use the HiveContext or SparkContext in a distribution operation. It has nothing to do with for loops. The fact that they're serializable is

Re: HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
Sunita, Thanks for your time. In my scenario, based on each attribute from deDF(1 column with just 66 rows), I have to query a Hive table and insert into another table. Thanks, Ajay On Wed, Oct 26, 2016 at 12:21 AM, Sunita Arvind wrote: > Ajay, > > Afaik Generally these contexts cannot be acces

Re: HiveContext is Serialized?

2016-10-25 Thread Sunita Arvind
Ajay, Afaik Generally these contexts cannot be accessed within loops. The sql query itself would run on distributed datasets so it's a parallel execution. Putting them in foreach would make it nested in nested. So serialization would become hard. Not sure I could explain it right. If you can crea

Re: HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
Jeff, Thanks for your response. I see below error in the logs. You think it has to do anything with hiveContext ? Do I have to serialize it before using inside foreach ? 16/10/19 15:16:23 ERROR scheduler.LiveListenerBus: Listener SQLListener threw an exception java.lang.NullPointerException

Re: HiveContext is Serialized?

2016-10-25 Thread Jeff Zhang
In your sample code, you can use hiveContext in the foreach as it is scala List foreach operation which runs in driver side. But you cannot use hiveContext in RDD.foreach Ajay Chander 于2016年10月26日周三 上午11:28写道: > Hi Everyone, > > I was thinking if I can use hiveContext inside foreach like below,

Re: HiveContext

2016-07-01 Thread Mich Talebzadeh
hi, In general if your ORC tables is not bucketed it is not going to do much. the idea is that using predicate pushdown you will only get the data from the partition concerned and avoid expensive table scans! Orc provides what is known as store index at file, stripe and rowset levels (default 10

Re: hivecontext error

2016-06-14 Thread Ted Yu
Which release of Spark are you using ? Can you show the full error trace ? Thanks On Tue, Jun 14, 2016 at 6:33 PM, Tejaswini Buche < tejaswini.buche0...@gmail.com> wrote: > I am trying to use hivecontext in spark. The following statements are > running fine : > > from pyspark.sql import HiveCon

Re: HiveContext: Unable to load AWS credentials from any provider in the chain

2016-06-10 Thread Daniel Haviv
I'm using EC2 instances Thank you. Daniel > On 9 Jun 2016, at 16:49, Gourav Sengupta wrote: > > Hi, > > are you using EC2 instances or local cluster behind firewall. > > > Regards, > Gourav Sengupta > >> On Wed, Jun 8, 2016 at 4:34 PM, Daniel Haviv >> wrote: >> Hi, >> I'm trying to creat

Re: HiveContext: Unable to load AWS credentials from any provider in the chain

2016-06-09 Thread Gourav Sengupta
Hi, are you using EC2 instances or local cluster behind firewall. Regards, Gourav Sengupta On Wed, Jun 8, 2016 at 4:34 PM, Daniel Haviv < daniel.ha...@veracity-group.com> wrote: > Hi, > > I'm trying to create a table on s3a but I keep hitting the following error: > > Exception in thread "main"

Re: HiveContext: Unable to load AWS credentials from any provider in the chain

2016-06-09 Thread Steve Loughran
On 9 Jun 2016, at 06:17, Daniel Haviv mailto:daniel.ha...@veracity-group.com>> wrote: Hi, I've set these properties both in core-site.xml and hdfs-site.xml with no luck. Thank you. Daniel That's not good. I'm afraid I don't know what version of s3a is in the cloudera release —I can see that

Re: HiveContext: Unable to load AWS credentials from any provider in the chain

2016-06-08 Thread Daniel Haviv
Hi, I've set these properties both in core-site.xml and hdfs-site.xml with no luck. Thank you. Daniel > On 9 Jun 2016, at 01:11, Steve Loughran wrote: > > >> On 8 Jun 2016, at 16:34, Daniel Haviv >> wrote: >> >> Hi, >> I'm trying to create a table on s3a but I keep hitting the following err

Re: HiveContext: Unable to load AWS credentials from any provider in the chain

2016-06-08 Thread Steve Loughran
On 8 Jun 2016, at 16:34, Daniel Haviv mailto:daniel.ha...@veracity-group.com>> wrote: Hi, I'm trying to create a table on s3a but I keep hitting the following error: Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:com.cloudera.com.amazonaws.Ama

Re: hivecontext and date format

2016-06-01 Thread Mich Talebzadeh
Try this SELECT TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(paymentdate,'dd/MM/'),'-MM-dd')) AS paymentdate FROM HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: HiveContext standalone => without a Hive metastore

2016-05-30 Thread Michael Segel
Going from memory… Derby is/was Cloudscape which IBM acquired from Informix who bought the company way back when. (Since IBM released it under Apache licensing, Sun Microsystems took it and created JavaDB…) I believe that there is a networking function so that you can either bring it up in st

Re: HiveContext standalone => without a Hive metastore

2016-05-30 Thread Gerard Maas
Michael, Mitch, Silvio, Thanks! The own directoy is the issue. We are running the Spark Notebook, which uses the same dir per server (i.e. for all notebooks). So this issue prevents us from running 2 notebooks using HiveContext. I'll look in a proper Hive installation and I'm glad to know that t

Re: HiveContext standalone => without a Hive metastore

2016-05-26 Thread Michael Armbrust
You can also just make sure that each user is using their own directory. A rough example can be found in TestHive. Note: in Spark 2.0 there should be no need to use HiveContext unless you need to talk to a metastore. On Thu, May 26, 2016 at 1:36 PM, Mich Talebzadeh wrote: > Well make sure than

Re: HiveContext standalone => without a Hive metastore

2016-05-26 Thread Gerard Maas
Thanks a lot for the advice!. I found out why the standalone hiveContext would not work: it was trying to deploy a derby db and the user had no rights to create the dir where there db is stored: Caused by: java.sql.SQLException: Failed to create database 'metastore_db', see the next exception fo

Re: HiveContext standalone => without a Hive metastore

2016-05-26 Thread Mich Talebzadeh
To use HiveContext witch is basically an sql api within Spark without proper hive set up does not make sense. It is a super set of Spark SQLContext In addition simple things like registerTempTable may not work. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAA

Re: HiveContext standalone => without a Hive metastore

2016-05-26 Thread Silvio Fiorito
Hi Gerard, I’ve never had an issue using the HiveContext without a hive-site.xml configured. However, one issue you may have is if multiple users are starting the HiveContext from the same path, they’ll all be trying to store the default Derby metastore in the same location. Also, if you want t

Re: HiveContext standalone => without a Hive metastore

2016-05-26 Thread Mich Talebzadeh
Hi Gerald, I am not sure the so called independence is will. I gather you want to use HiveContext for your SQL queries and sqlContext only provides a subset of HiveContext. try this val sc = new SparkContext(conf) // Create sqlContext based on HiveContext val sqlContext = new HiveContext(sc)

Re: HiveContext unable to recognize the delimiter of Hive table in textfile partitioned by date

2016-04-11 Thread Shiva Achari
Hi All, In the above scenario if the field delimiter is default of hive then Spark is able to parse the data as expected , hence i believe this is a bug. ​Regards, Shiva Achari​ On Tue, Apr 5, 2016 at 8:15 PM, Shiva Achari wrote: > Hi, > > I have created a hive external table stored as textfi

Re: HiveContext Self join not reading from cache

2015-12-18 Thread Ted Yu
The picture is a bit hard to read. I did a brief search but haven't found JIRA for this issue. Consider logging a SPARK JIRA. Cheers On Fri, Dec 18, 2015 at 4:37 AM, Gourav Sengupta wrote: > Hi, > > the attached DAG shows that for the same table (self join) SPARK is > unnecessarily getting da

Re: HiveContext Self join not reading from cache

2015-12-18 Thread Gourav Sengupta
Hi, the attached DAG shows that for the same table (self join) SPARK is unnecessarily getting data from S3 for one side of the join where as its able to use cache for the other side. Regards, Gourav On Fri, Dec 18, 2015 at 10:29 AM, Gourav Sengupta wrote: > Hi, > > I have a table which is dir

Re: HiveContext Self join not reading from cache

2015-12-18 Thread Gourav Sengupta
Hi, I have a table which is directly from S3 location and even a self join on that cached table is causing the data to be read from S3 again. The query plan in mentioned below: == Parsed Logical Plan == Aggregate [count(1) AS count#1804L] Project [user#0,programme_key#515] Join Inner, Some((p

Re: HiveContext Self join not reading from cache

2015-12-18 Thread Gourav Sengupta
hi, I think that people have reported the same issue elsewhere, and this should be registered as a bug in SPARK https://forums.databricks.com/questions/2142/self-join-in-spark-sql.html Regards, Gourav On Thu, Dec 17, 2015 at 10:52 AM, Gourav Sengupta wrote: > Hi Ted, > > The self join works

Re: HiveContext Self join not reading from cache

2015-12-17 Thread Gourav Sengupta
Hi Ted, The self join works fine on tbales where the hivecontext tables are direct hive tables, therefore table1 = hiveContext.sql("select columnA, columnB from hivetable1") table1.registerTempTable("table1") table1.cache() table1.count() and if I do a self join on table1 things are quite fine

Re: HiveContext Self join not reading from cache

2015-12-16 Thread Ted Yu
I did the following exercise in spark-shell ("c" is cached table): scala> sqlContext.sql("select x.b from c x join c y on x.a = y.a").explain == Physical Plan == Project [b#4] +- BroadcastHashJoin [a#3], [a#125], BuildRight :- InMemoryColumnarTableScan [b#4,a#3], InMemoryRelation [a#3,b#4,c#5],

Re: hiveContext: storing lookup of partitions

2015-12-16 Thread Gourav Sengupta
Hi Jeff, sadly that does not resolve the issue. I am sure that the memory mapping to physical files locations can be saved and recovered in SPARK. Regards, Gourav Sengupta On Wed, Dec 16, 2015 at 12:13 PM, Jeff Zhang wrote: > oh, you are using S3. As I remember, S3 has performance issue when

Re: hiveContext: storing lookup of partitions

2015-12-16 Thread Jeff Zhang
oh, you are using S3. As I remember, S3 has performance issue when processing large amount of files. On Wed, Dec 16, 2015 at 7:58 PM, Gourav Sengupta wrote: > The HIVE table has very large number of partitions around 365 * 5 * 10 and > when I say hivemetastore to start running queries on it (

Re: hiveContext: storing lookup of partitions

2015-12-16 Thread Gourav Sengupta
The HIVE table has very large number of partitions around 365 * 5 * 10 and when I say hivemetastore to start running queries on it (the one with .count() or .show()) then it takes around 2 hours before the job starts in SPARK. On the pyspark screen I can see that it is parsing the S3 locations for

Re: hiveContext: storing lookup of partitions

2015-12-15 Thread Jeff Zhang
>>> Currently it takes around 1.5 hours for me just to cache in the partition information and after that I can see that the job gets queued in the SPARK UI. I guess you mean the stage of getting the split info. I suspect it might be your cluster issue (or metadata store), unusually it won't take su

Re: HiveContext creation failed with Kerberos

2015-12-09 Thread Neal Yin
:09 AM To: "user@spark.apache.org<mailto:user@spark.apache.org>" mailto:user@spark.apache.org>> Subject: Re: HiveContext creation failed with Kerberos On 8 Dec 2015, at 06:52, Neal Yin mailto:neal@workday.com>> wrote: 15/12/08 04:12:28 ERROR transport.TSaslT

Re: HiveContext creation failed with Kerberos

2015-12-08 Thread Steve Loughran
On 8 Dec 2015, at 06:52, Neal Yin mailto:neal@workday.com>> wrote: 15/12/08 04:12:28 ERROR transport.TSaslTransport: SASL negotiation failure javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any

Re: HiveContext ignores ("skip.header.line.count"="1")

2015-10-26 Thread Daniel Haviv
I will Thank you. > On 27 באוק׳ 2015, at 4:54, Felix Cheung wrote: > > Please open a JIRA? > > > Date: Mon, 26 Oct 2015 15:32:42 +0200 > Subject: HiveContext ignores ("skip.header.line.count"="1") > From: daniel.ha...@veracity-group.com > To: user@spark.apache.org > > Hi, > I have a csv tab

RE: HiveContext ignores ("skip.header.line.count"="1")

2015-10-26 Thread Cheng, Hao
I am not sure if we really want to support that with HiveContext, but a workround is to use the Spark package at https://github.com/databricks/spark-csv From: Felix Cheung [mailto:felixcheun...@hotmail.com] Sent: Tuesday, October 27, 2015 10:54 AM To: Daniel Haviv; user Subject: RE: HiveContext

RE: HiveContext ignores ("skip.header.line.count"="1")

2015-10-26 Thread Felix Cheung
Please open a JIRA? Date: Mon, 26 Oct 2015 15:32:42 +0200 Subject: HiveContext ignores ("skip.header.line.count"="1") From: daniel.ha...@veracity-group.com To: user@spark.apache.org Hi,I have a csv table in Hive which is configured to skip the header row using TBLPROPERTIES("skip.header.line.c

Re: hiveContext sql number of tasks

2015-10-07 Thread Deng Ching-Mallete
Hi, You can do coalesce(N), where N is the number of partitions you want it reduced to, after loading the data into an RDD. HTH, Deng On Wed, Oct 7, 2015 at 6:34 PM, patcharee wrote: > Hi, > > I do a sql query on about 10,000 partitioned orc files. Because of the > partition schema the files c

RE: HiveContext throws org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

2015-07-07 Thread Cheng, Hao
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.derby.jdbc.EmbeddedDriver It will be included in the assembly jar usually, not sure what's wrong. But can you try add the derby jar into the driver classpath and try again? -Original Message- From: bdev [m

Re: HiveContext saveAsTable create wrong partition

2015-06-18 Thread Yin Huai
If you are writing to an existing hive table, our insert into operator follows hive's requirement, which is "*the dynamic partition columns must be specified last among the columns in the SELECT statement and in the same order** in which they appear in the PARTITION() clause*." You can find requir

Re: HiveContext saveAsTable create wrong partition

2015-06-18 Thread Yin Huai
Are you writing to an existing hive orc table? On Wed, Jun 17, 2015 at 3:25 PM, Cheng Lian wrote: > Thanks for reporting this. Would you mind to help creating a JIRA for this? > > > On 6/16/15 2:25 AM, patcharee wrote: > >> I found if I move the partitioned columns in schemaString and in Row to

Re: HiveContext saveAsTable create wrong partition

2015-06-17 Thread Cheng Lian
Thanks for reporting this. Would you mind to help creating a JIRA for this? On 6/16/15 2:25 AM, patcharee wrote: I found if I move the partitioned columns in schemaString and in Row to the end of the sequence, then it works correctly... On 16. juni 2015 11:14, patcharee wrote: Hi, I am using

Re: HiveContext saveAsTable create wrong partition

2015-06-16 Thread patcharee
I found if I move the partitioned columns in schemaString and in Row to the end of the sequence, then it works correctly... On 16. juni 2015 11:14, patcharee wrote: Hi, I am using spark 1.4 and HiveContext to append data into a partitioned hive table. I found that the data insert into the tab

Re: HiveContext test, "Spark Context did not initialize after waiting 10000ms"

2015-05-26 Thread Nitin kak
That is a much better solution than how I resolved it. I got around it by placing comma separated jar paths for all the hive related jars in --jars clause. I will try your solution. Thanks for sharing it. On Tue, May 26, 2015 at 4:14 AM, Mohammad Islam wrote: > I got a similar problem. > I'm no

Re: HiveContext test, "Spark Context did not initialize after waiting 10000ms"

2015-05-26 Thread Mohammad Islam
I got a similar problem.I'm not sure if your problem is already resolved. For the record, I solved this type of error by calling sc..setMaster("yarn-cluster");  If you find the solution, please let us know. Regards,Mohammad On Friday, March 6, 2015 2:47 PM, nitinkak001 wrote: I am

Re: HiveContext fails when querying large external Parquet tables

2015-05-22 Thread Andrew Otto
What is also strange is that this seems to work on external JSON data, but not Parquet. I’ll try to do more verification of that next week. > On May 22, 2015, at 16:24, yana wrote: > > There is an open Jira on Spark not pushing predicates to metastore. I have a > large dataset with many part

RE: HiveContext fails when querying large external Parquet tables

2015-05-22 Thread yana
There is an open Jira on Spark not pushing predicates to metastore. I have a large dataset with many partitions but doing anything with it 8s very slow...But I am surprised Spark 1.2 worked for you: it has this problem... Original message From: Andrew Otto Date:05/22/2015 3:5

Re: Re: HiveContext setConf seems not stable

2015-04-23 Thread guoqing0...@yahoo.com.hk
r.setConf(key, value) runSqlHive(s"SET $key=$value") } From: madhu phatak Date: 2015-04-23 02:17 To: Michael Armbrust CC: Ophir Cohen; Hao Ren; user Subject: Re: HiveContext setConf seems not stable Hi, calling getConf don't solve the issue. Even many hive specific queries are broken. Se

Re: HiveContext setConf seems not stable

2015-04-22 Thread madhu phatak
Hi, calling getConf don't solve the issue. Even many hive specific queries are broken. Seems like no hive configurations are getting passed properly. Regards, Madhukara Phatak http://datamantra.io/ On Wed, Apr 22, 2015 at 2:19 AM, Michael Armbrust wrote: > As a workaround, can you call getCo

Re: HiveContext setConf seems not stable

2015-04-21 Thread Michael Armbrust
As a workaround, can you call getConf first before any setConf? On Tue, Apr 21, 2015 at 1:58 AM, Ophir Cohen wrote: > I think I encounter the same problem, I'm trying to turn on the > compression of Hive. > I have the following lines: > def initHiveContext(sc: SparkContext): HiveContext = { >

Re: HiveContext setConf seems not stable

2015-04-21 Thread Ophir Cohen
I think I encounter the same problem, I'm trying to turn on the compression of Hive. I have the following lines: def initHiveContext(sc: SparkContext): HiveContext = { val hc: HiveContext = new HiveContext(sc) hc.setConf("hive.exec.compress.output", "true") hc.setConf("mapreduce.output.

Re: HiveContext vs SQLContext

2015-04-20 Thread Himanshu Parashar
The default dialect of HiveContext is hiveql which is already in use in HIVE for many years, whereas it is sql for SQLContext which uses a simple SQL parser provided by Spark SQL. Since hiveql parser is more robust and complete than sql parser, so HiveContext is preferred over SQLContext. On Tue,

Re: HiveContext vs SQLContext

2015-04-20 Thread Lan Jiang
Daniel, HiveContext is a subclass of SQLContext, thus offers a superset of features not available in SQLContext, such as access to Hive UDF, Hive table, Hive Serde, etc. This does not change in 1.3.1. Quote from 1.3.1 documentation “… using HiveContext is recommended for the 1.3 release of Spa

Re: HiveContext setConf seems not stable

2015-04-02 Thread Hao Ren
Hi, Jira created: https://issues.apache.org/jira/browse/SPARK-6675 Thank you. On Wed, Apr 1, 2015 at 7:50 PM, Michael Armbrust wrote: > Can you open a JIRA please? > > On Wed, Apr 1, 2015 at 9:38 AM, Hao Ren wrote: > >> Hi, >> >> I find HiveContext.setConf does not work correctly. Here are s

Re: HiveContext setConf seems not stable

2015-04-01 Thread Michael Armbrust
Can you open a JIRA please? On Wed, Apr 1, 2015 at 9:38 AM, Hao Ren wrote: > Hi, > > I find HiveContext.setConf does not work correctly. Here are some code > snippets showing the problem: > > snippet 1: > > -

Re: HiveContext can't find registered function

2015-03-17 Thread Ophir Cohen
Very helpful! Thank you On Mar 17, 2015 9:24 PM, "Yin Huai" wrote: > Initially, an attribute reference (column reference), like selecting a > column from a table, is not resolved since we do not know if the reference > is valid or not (if this column exists in the underlying table). In the > quer

Re: HiveContext can't find registered function

2015-03-17 Thread Yin Huai
Initially, an attribute reference (column reference), like selecting a column from a table, is not resolved since we do not know if the reference is valid or not (if this column exists in the underlying table). In the query compilation process, we will first analyze this query and resolved those at

Re: HiveContext can't find registered function

2015-03-17 Thread Ophir Cohen
Thanks you for the answer and one more question: what does it mean 'resolved attribute'? On Mar 17, 2015 8:14 PM, "Yin Huai" wrote: > The number is an id we used internally to identify an resolved Attribute. > Looks like basic_null_diluted_d was not resolved since there is no id > associated with

Re: HiveContext can't find registered function

2015-03-17 Thread Yin Huai
The number is an id we used internally to identify an resolved Attribute. Looks like basic_null_diluted_d was not resolved since there is no id associated with it. On Tue, Mar 17, 2015 at 2:08 PM, Ophir Cohen wrote: > Interesting, I thought the problem is with the method itself. > I will check i

Re: HiveContext can't find registered function

2015-03-17 Thread Ophir Cohen
Interesting, I thought the problem is with the method itself. I will check it soon and update. Can you elaborate what does it mean the # and the number? Is that a reference to the field in the rdd? Thank you, Ophir On Mar 17, 2015 7:06 PM, "Yin Huai" wrote: > Seems "basic_null_diluted_d" was not

Re: HiveContext can't find registered function

2015-03-17 Thread Yin Huai
Seems "basic_null_diluted_d" was not resolved? Can you check if basic_null_diluted_d is in you table? On Tue, Mar 17, 2015 at 9:34 AM, Ophir Cohen wrote: > Hi Guys, > I'm registering a function using: > sqlc.registerFunction("makeEstEntry",ReutersDataFunctions.makeEstEntry _) > > Then I register

Re: HiveContext test, "Spark Context did not initialize after waiting 10000ms"

2015-03-06 Thread Marcelo Vanzin
On Fri, Mar 6, 2015 at 2:47 PM, nitinkak001 wrote: > I am trying to run a Hive query from Spark using HiveContext. Here is the > code > > / val conf = new SparkConf().setAppName("HiveSparkIntegrationTest") > > > conf.set("spark.executor.extraClassPath", > "/opt/cloudera/parcels/CDH-5.2.0-1.cdh

Re: HiveContext in SparkSQL - concurrency issues

2015-02-24 Thread Harika
Hi Sreeharsha, My data is in HDFS. I am trying to use Spark HiveContext (instead of SQLContext) to fire queries on my data just because HiveContext supports more operations. Sreeharsha wrote > Change derby to mysql and check once me to faced the same issue I am pretty new to Spark and H

Re: HiveContext created SchemaRDD's saveAsTable is not working on 1.2.0

2015-02-15 Thread matroyd
It works now using 1.2.1. Thanks for all the help. Spark rocks !! - Thanks, Roy -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Re-HiveContext-created-SchemaRDD-s-saveAsTable-is-not-working-on-1-2-0-tp21442p21664.html Sent from the Apache Spark User

Re: HiveContext in SparkSQL - concurrency issues

2015-02-12 Thread Felix C
-client mode against the hive metastore service. That should give you ability to run multiple concurrently. Be sure to copy hive-site.XML to SPARK_HOME/conf --- Original Message --- From: "Harika" Sent: February 12, 2015 8:22 PM To: user@spark.apache.org Subject: Re: HiveContext i

Re: HiveContext in SparkSQL - concurrency issues

2015-02-12 Thread Harika
Hi, I've been reading about Spark SQL and people suggest that using HiveContext is better. So can anyone please suggest a solution to the above problem. This is stopping me from moving forward with HiveContext. Thanks Harika -- View this message in context: http://apache-spark-user-list.10015

Re: HiveContext created SchemaRDD's saveAsTable is not working on 1.2.0

2015-01-30 Thread Cheng Lian
46)561-0844 350 Madison Ave., FL 16, New York, NY 10017. -------- View this message in context: Re: HiveContext created SchemaRDD's saveAsTable is not working on 1.2.0 <http://apache-spark-user-list.1001560.n3.nabb

Re: HiveContext created SchemaRDD's saveAsTable is not working on 1.2.0

2015-01-30 Thread Ayoub
dT7HLIcII6QrK6zBxNB55cSztNWXX3bbVJ6XxEVvjKOMedywEjS1454WNDm1yIiJendAWNDm1yIiJendS76zB4xpx_HYO-OMe7tuVtdBBzD1NEVKCqeumKzp55l55zBgY-F6lK1FJASOrLOtXTLuZXTdTdw0ZmSNfOZ5JylbJQ-9s-l9YzuWbXB-wPQd843_Vv3rtfynzYSUMbFLIfLBNgk9X0ysjH6to6aNaQVs5kmS9nJmSNf-00CShPX9IbOc-9lxfM-pEwJ7ZDW1Ew6b4_lQ5xMsq9W6y2CCv4a0Di1_Cy2l3h12

Re: HiveContext created SchemaRDD's saveAsTable is not working on 1.2.0

2015-01-29 Thread Zhan Zhang
I think it is expected. Refer to the comments in saveAsTable "Note that this currently only works with SchemaRDDs that are created from a HiveContext”. If I understand correctly, here the SchemaRDD means those generated by HiveContext.sql, instead of applySchema. Thanks. Zhan Zhang On Jan 29

Re: HiveContext: cache table not supported for partitioned table?

2014-10-03 Thread Du Li
"user@spark.apache.org<mailto:user@spark.apache.org>" mailto:user@spark.apache.org>> Subject: Re: HiveContext: cache table not supported for partitioned table? Cache table works with partitioned table. I guess you’re experimenting with a default local metastore and the metastor

Re: HiveContext: cache table not supported for partitioned table?

2014-10-02 Thread Cheng Lian
Cache table works with partitioned table. I guess you’re experimenting with a default local metastore and the metastore_db directory doesn’t exist at the first place. In this case, all metastore tables/views don’t exist at first and will throw the error message you saw when the |PARTITIONS| me

Re: HiveContext ouput log file

2014-08-26 Thread S Malligarjunan
Hello Michel, I get the following error if i execute the query with collect method Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found. at org.apache.hadoop.io.compress.CompressionCod   Thanks and Regards, Sankar S.   On Tuesday, 26 Au

Re: HiveContext ouput log file

2014-08-25 Thread Michael Armbrust
Just like with normal Spark Jobs, that command returns an RDD that contains the lineage for computing the answer but does not actually compute the answer. You'll need to run collect() on the RDD in order to get the result. On Mon, Aug 25, 2014 at 11:46 AM, S Malligarjunan < smalligarju...@yahoo.

Re: HiveContext is creating metastore warehouse locally instead of in hdfs

2014-08-01 Thread chenjie
I used the web ui of spark and could see the conf directory is in CLASSPATH. An abnormal thing is that when start spark-shell I always get the following info: WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable At first, I th

Re: HiveContext is creating metastore warehouse locally instead of in hdfs

2014-07-31 Thread Andrew Lee
Could you enable HistoryServer and provide the properties and CLASSPATH for the spark-shell? And 'env' command to list your environment variables? By the way, what does the spark logs says? Enable debug mode to see what's going on in spark-shell when it tries to interact and init HiveContext.

Re: HiveContext is creating metastore warehouse locally instead of in hdfs

2014-07-31 Thread chenjie
Hi, Yin and Andrew, thank you for your reply. When I create table in hive cli, it works correctly and the table will be found in hdfs. I forgot start hiveserver2 before and I started it today. Then I run the command below: spark-shell --master spark://192.168.40.164:7077 --driver-class-path co

Re: HiveContext is creating metastore warehouse locally instead of in hdfs

2014-07-31 Thread Yin Huai
Another way is to set "hive.metastore.warehouse.dir" explicitly to the HDFS dir storing Hive tables by using SET command. For example: hiveContext.hql("SET hive.metastore.warehouse.dir=hdfs://localhost:54310/user/hive/warehouse") On Thu, Jul 31, 2014 at 8:05 AM, Andrew Lee wrote: > Hi All, >

Re: HiveContext is creating metastore warehouse locally instead of in hdfs

2014-07-31 Thread Andrew Lee
Hi All, It has been awhile, but what I did to make it work is to make sure the followings: 1. Hive is working when you run Hive CLI and JDBC via Hiveserver2 2. Make sure you have the hive-site.xml from above Hive configuration. The problem here is that you want the hive-site.xml from the Hive

Re: HiveContext is creating metastore warehouse locally instead of in hdfs

2014-07-30 Thread chenjie
Hi, Michael. I Have the same problem. My warehouse directory is always created locally. I copied the default hive-site.xml into the $SPARK_HOME/conf directory on each node. After I executed the code below, val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) hiveContext.hql("CREA

Re: HiveContext is creating metastore warehouse locally instead of in hdfs

2014-07-29 Thread Michael Armbrust
The warehouse and the metastore directories are two different things. The metastore holds the schema information about the tables and will by default be a local directory. With javax.jdo.option.ConnectionURL you can configure it to be something like mysql. The warehouse directory is the default

RE: HiveContext is creating metastore warehouse locally instead of in hdfs

2014-07-29 Thread nikroy16
Thanks for the response... hive-site.xml is in the classpath so that doesn't seem to be the issue. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838p10871.html Sent from the Apache

RE: HiveContext is creating metastore warehouse locally instead of in hdfs

2014-07-28 Thread Cheng, Hao
I ran this before, actually the hive-site.xml works in this way for me (the tricky happens in the new HiveConf(classOf[SessionState]), can you double check if hive-site.xml can be loaded in the class path? It supposes to appear in the root of the class path. -Original Message- From: nik