i am reading data from kafka using spark streaming.
I am reading json and creating dataframe.
kvs = KafkaUtils.createDirectStream(ssc, kafkaTopic1, kafkaParams)
lines = kvs.map(lambda x: x[1])
lines.foreachRDD(mReport)
def mReport(clickRDD):
clickDF = sqlContext.jsonRDD(clickRDD)
clickDF.reg
i am reading data from kafka using spark streaming.
I am reading json and creating dataframe.
I am using pyspark
kvs = KafkaUtils.createDirectStream(ssc, kafkaTopic1, kafkaParams)
lines = kvs.map(lambda x: x[1])
lines.foreachRDD(mReport)
def mReport(clickRDD):
clickDF = sqlContext.jsonRDD(
x27;
On Tue, Jul 26, 2016 at 12:05 PM, Cody Koeninger wrote:
> Have you tried filtering out corrupt records with something along the
> lines of
>
> df.filter(df("_corrupt_record").isNull)
>
> On Tue, Jul 26, 2016 at 1:53 PM, vr spark wrote:
> > i am readi
I have data which is json in this format
myList: array
|||-- elem: struct
||||-- nm: string (nullable = true)
||||-- vList: array (nullable = true)
|||||-- element: string (containsNull = true)
from my kafka stream, i created a dataframe usin
Hi Experts,
Please suggest
On Thu, Aug 11, 2016 at 7:54 AM, vr spark wrote:
>
> I have data which is json in this format
>
> myList: array
> |||-- elem: struct
> ||||-- nm: string (nullable = true)
> ||||-- vList: a
Hi,
I am getting error on below scenario. Please suggest.
i have a virtual view in hive
view name log_data
it has 2 columns
query_map map
parti_date int
Here is my snippet for the spark data frame
my dataframe
res=sqlcont.sql("select parti_date FROM log_data WHERE par
W0816 23:17:01.984846 16360 sched.cpp:1195] Attempting to accept an unknown
offer b859f2f3-7484-482d-8c0d-35bd91c1ad0a-O162910492
W0816 23:17:01.984987 16360 sched.cpp:1195] Attempting to accept an unknown
offer b859f2f3-7484-482d-8c0d-35bd91c1ad0a-O162910493
W0816 23:17:01.985124 16360 sched.cpp
pting to accept an unknown
offer b859f2f3-7484-482d-8c0d-35bd91c1ad0a-O168676558.
and many more lines like this on the screen with similar message
On Wed, Aug 17, 2016 at 9:08 AM, Ted Yu wrote:
> Please include user@ in your reply.
>
> Can you reveal the snippet of hive sql
o
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
AnalysisException: u'undefined function json_array_to_map; line 28 pos 73'
On Wed, Aug 17, 2016 at 8:59 AM, vr spark wrote:
> spark 1.6.1
> python
>
> I0817 08:51:59.099356 15189 detector.cpp:48
Hi,
I have this simple scala app which works fine when i run it as scala
application from the scala IDE for eclipse.
But when i export is as jar and run it from spark-submit i am getting below
error. Please suggest
*bin/spark-submit --class com.x.y.vr.spark.first.SimpleApp test.jar*
16/09/24 23:1
>
> You've got two Spark runtimes up that may or may not contribute to the
> issue.
>
> Pozdrawiam,
> Jacek Laskowski
>
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/j
Hi,
I use scala IDE for eclipse. I usually run job against my local spark
installed on my mac and then export the jars and copy it to spark cluster
of my company and run spark submit on it.
This works fine.
But i want to run the jobs from scala ide directly using the spark cluster
of my company.
t
Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Sun, Sep 25, 2016 at 4:32 PM, vr spark wrote:
> > yes, i have both spark 1.6 and spark 2.0.
> > I unset the spark home environment variable and pointed spark submit
Hi,
I have a continuous rest api stream which keeps spitting out data in form
of json.
I access the stream using python requests.get(url, stream=True,
headers=headers).
I want to receive them using spark and do further processing. I am not sure
which is best way to receive it in spark.
What are
Hi,
I am looking for scala or python code samples to covert local tsv file to
orc file and store on distributed cloud storage(openstack).
So, need these 3 samples. Please suggest.
1. read tsv
2. convert to orc
3. store on distributed cloud storage
thanks
VR
Hi, The source file i have is on local machine and its pretty huge like 150
gb. How to go about it?
On Sun, Nov 20, 2016 at 8:52 AM, Steve Loughran
wrote:
>
> On 19 Nov 2016, at 17:21, vr spark wrote:
>
> Hi,
> I am looking for scala or python code samples to covert local ts
Hi,
I have a simple Java program to read data from kafka using spark streaming.
When i run it from eclipse on my mac, it is connecting to the zookeeper,
bootstrap nodes,
But its not displaying any data. it does not give any error.
it just shows
18/01/16 20:49:15 INFO Executor: Finished task 96.
17 matches
Mail list logo