date:20170513

Re: Spark <--> S3 flakiness

2017-05-13 Thread Miguel Morales

Some things just didn't work as i had first expected it. For example, when writing from a spark collection to an alluxio destination didn't persist them to s3 automatically. I remember having to use the alluxio library directly to force the files to persist to s3 after spark finished writing to a

Re: what is the difference between json format vs kafka format?

2017-05-13 Thread kant kodali

Hi, Here is a little bit of background. I've been using stateless streaming API's for a while like using JavaDstream and so on and they worked well. It's has come to a point where we need to do realtime stateful streaming based on event time and other things but for now I am just trying to get us

Re: what is the difference between json format vs kafka format?

2017-05-13 Thread Tathagata Das

You cant do ".count()" directly on streaming DataFrames. This is because "count" is an Action (remember RDD actions) that executes and returns a result immediately which can be done only when the data is bounded (e.g. batch/interactive queries). For streaming queries, you have to let it run in the

Re: what is the difference between json format vs kafka format?

2017-05-13 Thread kant kodali

Hi! Thanks for the response. Looks like from_json requires schema ahead of time. Is there any function I can use to infer schema from the json messages I am receiving through Kafka? I tried with the code below however I get the following exception. org.apache.spark.sql.AnalysisException: Queries

Re: what is the difference between json format vs kafka format?

2017-05-13 Thread Tathagata Das

I understand the confusing. "json" format is for json encoded files being written in a directory. For Kafka, use "kafk" format. Then you decode the binary data as a json, you can use the function "from_json" (spark 2.1 and above). Here is our blog post on this. https://databricks.com/blog/2017/04/

Re: Is GraphX really deprecated?

2017-05-13 Thread Jacek Laskowski

Hi, I'd like to hear the official statement too. My take on GraphX and Spark Streaming is that they are long dead projects with GraphFrames and Structured Streaming taking their place, respectively. Jacek On 13 May 2017 3:00 p.m., "Sergey Zhemzhitsky" wrote: > Hello Spark users, > > I just wo

Is GraphX really deprecated?

2017-05-13 Thread Sergey Zhemzhitsky

Hello Spark users, I just would like to know whether the GraphX component should be considered deprecated and no longer actively maintained and should not be considered when starting new graph-processing projects on top of Spark in favour of other graph-processing frameworks? I'm asking because

Re: what does this error mean?

2017-05-13 Thread Zeming Yu

Another error. Anyone have any idea? this one happens when I tried to convert a spark dataframe to pandas: ---Py4JError Traceback (most recent call last)/home/ubuntu/spark-2.1.1-bin-hadoop2.7/p

what does this error mean?

2017-05-13 Thread Zeming Yu

My code runs error free on my local pc. Just tried running the same code on a ubuntu machine on ec2, and got the error below. Any idea where to start in terms of debugging? ---Py4JError Tracebac

what is the difference between json format vs kafka format?

2017-05-13 Thread kant kodali

HI All, What is the difference between sparkSession.readStream.format("kafka") vs sparkSession.readStream.format("json") ? I am sending json encoded messages in Kafka and I am not sure which one of the above I should use? Thanks!

Re: Spark <--> S3 flakiness

Re: what is the difference between json format vs kafka format?

Re: what is the difference between json format vs kafka format?

Re: what is the difference between json format vs kafka format?

Re: what is the difference between json format vs kafka format?

Re: Is GraphX really deprecated?

Is GraphX really deprecated?

Re: what does this error mean?

what does this error mean?

what is the difference between json format vs kafka format?

10 matches

Site Navigation

Mail list logo

Footer information