Re: HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
Sean, thank you for making it clear. It was helpful. Regards, Ajay On Wednesday, October 26, 2016, Sean Owen wrote: > This usage is fine, because you are only using the HiveContext locally on > the driver. It's applied in a function that's used on a Scala collection. > > You can't use the HiveC

Re: HiveContext is Serialized?

2016-10-25 Thread Sunita Arvind
Thanks for the response Sean. I have seen the NPE on similar issues very consistently and assumed that could be the reason :) Thanks for clarifying. regards Sunita On Tue, Oct 25, 2016 at 10:11 PM, Sean Owen wrote: > This usage is fine, because you are only using the HiveContext locally on > the

Re: HiveContext is Serialized?

2016-10-25 Thread Sean Owen
This usage is fine, because you are only using the HiveContext locally on the driver. It's applied in a function that's used on a Scala collection. You can't use the HiveContext or SparkContext in a distribution operation. It has nothing to do with for loops. The fact that they're serializable is

Re: HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
Sunita, Thanks for your time. In my scenario, based on each attribute from deDF(1 column with just 66 rows), I have to query a Hive table and insert into another table. Thanks, Ajay On Wed, Oct 26, 2016 at 12:21 AM, Sunita Arvind wrote: > Ajay, > > Afaik Generally these contexts cannot be acces

Re: HiveContext is Serialized?

2016-10-25 Thread Sunita Arvind
Ajay, Afaik Generally these contexts cannot be accessed within loops. The sql query itself would run on distributed datasets so it's a parallel execution. Putting them in foreach would make it nested in nested. So serialization would become hard. Not sure I could explain it right. If you can crea

Re: HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
Jeff, Thanks for your response. I see below error in the logs. You think it has to do anything with hiveContext ? Do I have to serialize it before using inside foreach ? 16/10/19 15:16:23 ERROR scheduler.LiveListenerBus: Listener SQLListener threw an exception java.lang.NullPointerException

Re: HiveContext is Serialized?

2016-10-25 Thread Jeff Zhang
In your sample code, you can use hiveContext in the foreach as it is scala List foreach operation which runs in driver side. But you cannot use hiveContext in RDD.foreach Ajay Chander 于2016年10月26日周三 上午11:28写道: > Hi Everyone, > > I was thinking if I can use hiveContext inside foreach like below,

HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
Hi Everyone, I was thinking if I can use hiveContext inside foreach like below, object Test { def main(args: Array[String]): Unit = { val conf = new SparkConf() val sc = new SparkContext(conf) val hiveContext = new HiveContext(sc) val dataElementsFile = args(0) val deDF =

Re: [Spark 2.0.1] Error in generated code, possible regression?

2016-10-25 Thread Efe Selcuk
I'd like to do that, though are there any guidelines of tracking down the context of the generated code? On Mon, Oct 24, 2016 at 11:44 PM Kazuaki Ishizaki wrote: Can you have a smaller program that can reproduce the same error? If you also create a JIRA entry, it would be great. Kazuaki Ishizak

Re: Operator push down through JDBC driver

2016-10-25 Thread AnilKumar B
I thought, we can use sqlContext.sql("some join query") API with jdbc, that's why I have asked the above question. But as we can only use sqlContext.read().format("jdbc").options(options).load() and here we can use actual join query of ORACLE source. So this question is not valid. Please ignore

Re: Zero Data Loss in Spark with Kafka

2016-10-25 Thread Sunita Arvind
The error in the file I just shared is here: val partitionOffsetPath:String = topicDirs.consumerOffsetDir + "/" + partition._2(0); --> this was just partition and hence there was an error fetching the offset. Still testing. Somehow Cody, your code never lead to file already exists sort of error

Operator push down through JDBC driver

2016-10-25 Thread AnilKumar B
Hi, I am using Spark SQL to transform data. My Source is ORACLE, In general, I am extracting multiple tables and joining them and then doing some other transformations in Spark. Is there any possibility for pushing down join operator to ORACLE using SPARK SQL, instead of fetching and joining in S

Re: Zero Data Loss in Spark with Kafka

2016-10-25 Thread Sunita Arvind
Attached is the edited code. Am I heading in right direction? Also, I am missing something due to which, it seems to work well as long as the application is running and the files are created right. But as soon as I restart the application, it goes back to fromOffset as 0. Any thoughts? regards Sun

Re: Zero Data Loss in Spark with Kafka

2016-10-25 Thread Sunita Arvind
Thanks for confirming Cody. To get to use the library, I had to do: val offsetsStore = new ZooKeeperOffsetsStore(conf.getString("zkHosts"), "/consumers/topics/"+ topics + "/0") It worked well. However, I had to specify the partitionId in the zkPath. If I want the library to pick all the partition

Re: Zero Data Loss in Spark with Kafka

2016-10-25 Thread Cody Koeninger
You are correct that you shouldn't have to worry about broker id. I'm honestly not sure specifically what else you are asking at this point. On Tue, Oct 25, 2016 at 1:39 PM, Sunita Arvind wrote: > Just re-read the kafka architecture. Something that slipped my mind is, it > is leader based. So to

Re: Zero Data Loss in Spark with Kafka

2016-10-25 Thread Sunita Arvind
Just re-read the kafka architecture. Something that slipped my mind is, it is leader based. So topic/partitionId pair will be same on all the brokers. So we do not need to consider brokerid while storing offsets. Still exploring rest of the items. regards Sunita On Tue, Oct 25, 2016 at 11:09 AM, S

Re: Getting the IP address of Spark Driver in yarn-cluster mode

2016-10-25 Thread Masood Krohy
Thanks Steve. Here is the Python pseudo code that got it working for me: import time; import urllib2 nodes= ({'worker1_hostname':'worker1_ip', ... }) YARN_app_queue = 'default' YARN_address = 'http://YARN_IP:8088' YARN_app_startedTimeBegin = str(int(time.time() - 3600)) # We allow

Re: Zero Data Loss in Spark with Kafka

2016-10-25 Thread Sunita Arvind
Hello Experts, I am trying to use the saving to ZK design. Just saw Sudhir's comments that it is old approach. Any reasons for that? Any issues observed with saving to ZK. The way we are planning to use it is: 1. Following http://aseigneurin.github.io/2016/05/07/spark-kafka- achieving-zero-data-lo

Getting only results out of Spark Shell

2016-10-25 Thread Mich Talebzadeh
Is it possible using Spark Shell to printout the actual output without commands passed through? Below all I am interested are those three numbers in red Spark context Web UI available at http://50.140.197.217:5 Spark context available as 'sc' (master = local, app id = local-1477410914051). Sp

Spark Sql - "broadcast-exchange-1" java.lang.OutOfMemoryError: Java heap space

2016-10-25 Thread Selvam Raman
Hi, Need a help to figure out and solve heap space problem. I have query which contains 15+ table and when i trying to print out the result(Just 23 rows) it throws heap space error. Following command i tried in standalone mode: (My mac machine having 8 core and 15GB ram) spark.conf().set("spark

Transforming Spark SQL AST with extraOptimizations

2016-10-25 Thread Michael David Pedersen
Hi, I'm wanting to take a SQL string as a user input, then transform it before execution. In particular, I want to modify the top-level projection (select clause), injecting additional columns to be retrieved by the query. I was hoping to achieve this by hooking into Catalyst using sparkSession.e

How can I log the moment an action is called on a DataFrame?

2016-10-25 Thread coldhyll
Hello, I'm building an ML Pipeline which extract features from a DataFrame and I'd like it to behave like the following : Log "Extracting feature 1" Extract feature 1 Log "Extracting feature 2" Extract feature 2 ... Log "Extracting feature n" Extract feature n The things is, transformations bein

Re: Need help with SVM

2016-10-25 Thread Aseem Bansal
Is there any labeled point with label 0 in your dataset? On Tue, Oct 25, 2016 at 2:13 AM, aditya1702 wrote: > Hello, > I am using linear SVM to train my model and generate a line through my > data. > However my model always predicts 1 for all the feature examples. Here is my > code: > > print da

Re: Spark streaming communication with different versions of kafka

2016-10-25 Thread Cody Koeninger
Kafka consumers should be backwards compatible with kafka brokers, so at the very least you should be able to use the streaming-spark-kafka-0-10 to do what you're talking about. On Tue, Oct 25, 2016 at 4:30 AM, Prabhu GS wrote: > Hi, > > I would like to know if the same spark streaming job can co

i get the error of Py4JJavaError: An error occurred while calling o177.showString while running code below

2016-10-25 Thread muhammet pakyürek
i used spark 2.0.1 and work pypsaprk.sql dataframe lower = arguments["lower"] lower_udf = udf(lambda x: lower if x

Re: [Spark ML] Using GBTClassifier in OneVsRest

2016-10-25 Thread eliasah
Well as for now, the GBTClassifier is considered as a Predictor and not a Classifier. That's why you get that error. Unless you'd want to re-write your own GBTClassifier that extends Classifier there is no solution for now to use the OneVsAll Estimator on it. Nevertheless, there is a associated JI

Re: Spark 1.2

2016-10-25 Thread ayan guha
Thank you both. On Tue, Oct 25, 2016 at 11:30 PM, Sean Owen wrote: > archive.apache.org will always have all the releases: > http://archive.apache.org/dist/spark/ > > On Tue, Oct 25, 2016 at 1:17 PM ayan guha wrote: > >> Just in case, anyone knows how I can download Spark 1.2? It is not >> show

Re: Making more features in Logistic Regression

2016-10-25 Thread eliasah
Your question isn't clear. Would you care elaborate ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Making-more-features-in-Logistic-Regression-tp27915p27960.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --

Re: Proper saving/loading of MatrixFactorizationModel

2016-10-25 Thread eliasah
I know that this haven't been accepted yet but any news on it ? How can we cache the product and user factor ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Proper-saving-loading-of-MatrixFactorizationModel-tp23952p27959.html Sent from the Apache Spark Use

Re: Spark 1.2

2016-10-25 Thread Luciano Resende
All previous releases are available on the Release Archives http://archive.apache.org/dist/spark/ On Tue, Oct 25, 2016 at 2:17 PM, ayan guha wrote: > Just in case, anyone knows how I can download Spark 1.2? It is not showing > up in Spark download page drop down > > -- > Best Regards, > Ayan Gu

Re: Spark 1.2

2016-10-25 Thread Sean Owen
archive.apache.org will always have all the releases: http://archive.apache.org/dist/spark/ On Tue, Oct 25, 2016 at 1:17 PM ayan guha wrote: > Just in case, anyone knows how I can download Spark 1.2? It is not showing > up in Spark download page drop down > > > -- > Best Regards, > Ayan Guha >

Spark 1.2

2016-10-25 Thread ayan guha
Just in case, anyone knows how I can download Spark 1.2? It is not showing up in Spark download page drop down -- Best Regards, Ayan Guha

Re: Passing command line arguments to Spark-shell in Spark 2.0.1

2016-10-25 Thread Mich Talebzadeh
Hi, The correct way of doing it for a String argument is using eche ' ' passing the string directly as below spark-shell -i <(echo 'val ticker = "tsco"' ; cat stock.scala) Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Passing command line arguments to Spark-shell in Spark 2.0.1

2016-10-25 Thread Mich Talebzadeh
Hi guys, Besides using shell parameters is there anyway of passing a parameter to Spark-shell like in Zeppelin val ticker = z.input("Ticker to analyze? default MSFT", "msft").toString I gather this can be done in Spark shell export TICKER="msft" spark-shell -i <(echo val ticker = $TICKER ; cat

Spark streaming communication with InfluxDB

2016-10-25 Thread Gioacchino
Hi, I wouild like to know if there is code example to write data in InfluxDB from Spark Streaming in Scala / Python. Thanks in advance Gioacchino - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Spark streaming communication with different versions of kafka

2016-10-25 Thread Prabhu GS
Hi, I would like to know if the same spark streaming job can consume from kafka 0.8.1 and write the data to kafka 0.9. Just trying to replicate the kafka server. Yes, Kafka's MirrorMaker can be used to replicate, but was curious to know if that can be achieved by spark streaming. Please share yo

Re: Spark SQL is slower when DataFrame is cache in Memory

2016-10-25 Thread Chin Wei Low
Hi Kazuaki, I print a debug log right before I call the collect, and use that to compare against the job start log (it is available when turning on debug log). Anyway, I test that in Spark 2.0.1 and never see it happen. But, the query on cached dataframe is still slightly slower than the one witho

Re: java.lang.NoSuchMethodError - GraphX

2016-10-25 Thread Brian Wilson
I have discovered that this dijkstra's function was written for scala 1.6. The remainder of my code is 2.11. I have checked the functions within the dijkstra function and can’t see any that are illegal. For example `mapVertices`, `aggregateMessages` and `outerJoinVertices` are all being used co