read only specific jsons

2016-07-26 Thread vr spark
i am reading data from kafka using spark streaming. I am reading json and creating dataframe. kvs = KafkaUtils.createDirectStream(ssc, kafkaTopic1, kafkaParams) lines = kvs.map(lambda x: x[1]) lines.foreachRDD(mReport) def mReport(clickRDD): clickDF = sqlContext.jsonRDD(clickRDD) clickDF.reg

read only specific jsons

2016-07-26 Thread vr spark
i am reading data from kafka using spark streaming. I am reading json and creating dataframe. I am using pyspark kvs = KafkaUtils.createDirectStream(ssc, kafkaTopic1, kafkaParams) lines = kvs.map(lambda x: x[1]) lines.foreachRDD(mReport) def mReport(clickRDD): clickDF = sqlContext.jsonRDD(

Re: read only specific jsons

2016-07-27 Thread vr spark
x27; On Tue, Jul 26, 2016 at 12:05 PM, Cody Koeninger wrote: > Have you tried filtering out corrupt records with something along the > lines of > > df.filter(df("_corrupt_record").isNull) > > On Tue, Jul 26, 2016 at 1:53 PM, vr spark wrote: > > i am readi

dataframe row list question

2016-08-11 Thread vr spark
I have data which is json in this format myList: array |||-- elem: struct ||||-- nm: string (nullable = true) ||||-- vList: array (nullable = true) |||||-- element: string (containsNull = true) from my kafka stream, i created a dataframe usin

Re: dataframe row list question

2016-08-12 Thread vr spark
Hi Experts, Please suggest On Thu, Aug 11, 2016 at 7:54 AM, vr spark wrote: > > I have data which is json in this format > > myList: array > |||-- elem: struct > ||||-- nm: string (nullable = true) > ||||-- vList: a

Undefined function json_array_to_map

2016-08-17 Thread vr spark
Hi, I am getting error on below scenario. Please suggest. i have a virtual view in hive view name log_data it has 2 columns query_map map parti_date int Here is my snippet for the spark data frame my dataframe res=sqlcont.sql("select parti_date FROM log_data WHERE par

Attempting to accept an unknown offer

2016-08-17 Thread vr spark
W0816 23:17:01.984846 16360 sched.cpp:1195] Attempting to accept an unknown offer b859f2f3-7484-482d-8c0d-35bd91c1ad0a-O162910492 W0816 23:17:01.984987 16360 sched.cpp:1195] Attempting to accept an unknown offer b859f2f3-7484-482d-8c0d-35bd91c1ad0a-O162910493 W0816 23:17:01.985124 16360 sched.cpp

Re: Attempting to accept an unknown offer

2016-08-17 Thread vr spark
pting to accept an unknown offer b859f2f3-7484-482d-8c0d-35bd91c1ad0a-O168676558. and many more lines like this on the screen with similar message On Wed, Aug 17, 2016 at 9:08 AM, Ted Yu wrote: > Please include user@ in your reply. > > Can you reveal the snippet of hive sql

Re: Undefined function json_array_to_map

2016-08-17 Thread vr spark
o raise AnalysisException(s.split(': ', 1)[1], stackTrace) AnalysisException: u'undefined function json_array_to_map; line 28 pos 73' On Wed, Aug 17, 2016 at 8:59 AM, vr spark wrote: > spark 1.6.1 > python > > I0817 08:51:59.099356 15189 detector.cpp:48

spark-submit failing but job running from scala ide

2016-09-24 Thread vr spark
Hi, I have this simple scala app which works fine when i run it as scala application from the scala IDE for eclipse. But when i export is as jar and run it from spark-submit i am getting below error. Please suggest *bin/spark-submit --class com.x.y.vr.spark.first.SimpleApp test.jar* 16/09/24 23:1

Re: spark-submit failing but job running from scala ide

2016-09-25 Thread vr spark
> > You've got two Spark runtimes up that may or may not contribute to the > issue. > > Pozdrawiam, > Jacek Laskowski > > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/j

Running jobs against remote cluster from scala eclipse ide

2016-09-26 Thread vr spark
Hi, I use scala IDE for eclipse. I usually run job against my local spark installed on my mac and then export the jars and copy it to spark cluster of my company and run spark submit on it. This works fine. But i want to run the jobs from scala ide directly using the spark cluster of my company. t

Re: spark-submit failing but job running from scala ide

2016-09-26 Thread vr spark
Apache Spark 2.0 http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Sun, Sep 25, 2016 at 4:32 PM, vr spark wrote: > > yes, i have both spark 1.6 and spark 2.0. > > I unset the spark home environment variable and pointed spark submit

receiving stream data options

2016-10-12 Thread vr spark
Hi, I have a continuous rest api stream which keeps spitting out data in form of json. I access the stream using python requests.get(url, stream=True, headers=headers). I want to receive them using spark and do further processing. I am not sure which is best way to receive it in spark. What are

covert local tsv file to orc file on distributed cloud storage(openstack).

2016-11-19 Thread vr spark
Hi, I am looking for scala or python code samples to covert local tsv file to orc file and store on distributed cloud storage(openstack). So, need these 3 samples. Please suggest. 1. read tsv 2. convert to orc 3. store on distributed cloud storage thanks VR

Re: covert local tsv file to orc file on distributed cloud storage(openstack).

2016-11-24 Thread vr spark
Hi, The source file i have is on local machine and its pretty huge like 150 gb. How to go about it? On Sun, Nov 20, 2016 at 8:52 AM, Steve Loughran wrote: > > On 19 Nov 2016, at 17:21, vr spark wrote: > > Hi, > I am looking for scala or python code samples to covert local ts

spark streaming kafka not displaying data in local eclipse

2018-01-16 Thread vr spark
Hi, I have a simple Java program to read data from kafka using spark streaming. When i run it from eclipse on my mac, it is connecting to the zookeeper, bootstrap nodes, But its not displaying any data. it does not give any error. it just shows 18/01/16 20:49:15 INFO Executor: Finished task 96.