Re: Spark Language / Data Base Question

2015-06-25 Thread pandees waran
There’s no one best for these questions. The question can be more refined with a specific use case and for that which is the best data store. > On Jun 25, 2015, at 12:02 AM, Sinha, Ujjawal (SFO-MAP) > wrote: > > Hi Guys > > > I am very new for spark , I have 2 question > > > 1) which lan

identifying newly arrived files in s3 in spark streaming

2016-06-06 Thread pandees waran
I am fairly new to spark streaming and i have a basic question on how spark streaming works on s3 bucket which is periodically getting new files once in 10 mins. When i use spark streaming to process these files in this s3 path, will it process all the files in this path (old+new files) every batch

spark streaming questions

2016-06-22 Thread pandees waran
Hello all, I have few questions regarding spark streaming : * I am wondering anyone uses spark streaming with workflow orchestrators such as data pipeline/SWF/any other framework. Is there any advantages /drawbacks on using a workflow orchestrator for spark streaming? *How do you guys manage the

Re: spark streaming questions

2016-06-22 Thread pandees waran
gt; Cheers, > > > > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > http://talebzadehmich.wordpress.com > > >> On 22 June 2016 at 15:54, pandees waran wrote: >> Hi

Re: spark streaming questions

2016-06-22 Thread pandees waran
For my question (2), From my understanding checkpointing ensures the recovery from failures. Sent from my iPhone > On Jun 22, 2016, at 10:27 AM, pandees waran wrote: > > In general, if you have multiple steps in a workflow : > For every batch > 1.stream data from s3 > 2.wri

Processing ion formatted messages in spark

2016-07-11 Thread pandees waran
All, did anyone ever work on processing Ion formatted messages in Spark? Ion format is superset of JSON. All JSONs are valid IONs, but the reverse is not true. For more details on Ion; http://amznlabs.github.io/ion-docs/ Thanks.

Re: Is "spark streaming" streaming or mini-batch?

2016-08-23 Thread pandees waran
It's based on "micro batching" model. Sent from my iPhone > On Aug 23, 2016, at 8:41 AM, Aseem Bansal wrote: > > I was reading this article https://www.inovex.de/blog/storm-in-a-teacup/ and > it mentioned that spark streaming actually mini-batch not actual streaming. > > I have not used stre

Recommended way to run spark streaming in production in EMR

2016-10-11 Thread pandees waran
All, We have an use case in which 2 spark streaming jobs in same EMR cluster. I am thinking of allowing multiple streaming contexts and run them as 2 separate spark-submit with wait for app completion set to false. With this, the failure detection and monitoring seems obscure and doesn't seem to

Re: How do I access the nested field in a dataframe, spark Streaming app... Please help.

2016-11-20 Thread pandees waran
have you tried using "." access method? e.g: ds1.select("name","addresses[0].element.city") On Sun, Nov 20, 2016 at 9:59 AM, shyla deshpande wrote: > The following my dataframe schema > > root > |-- name: string (nullable = true) > |-- addresses: array (nullable = true) > |

Re: Spark parquet file read problem !

2017-07-30 Thread pandees waran
I have encountered the similar error when the schema / datatypes are conflicting in those 2 parquet files. Are you sure that the 2 individual files are in the same structure with similar datatypes. If not you have to fix this by enforcing the default values for the missing values to make the str

Re: ConcurrentModificationException using Kafka Direct Stream

2017-09-17 Thread pandees waran
All, May I know what exactly changed in 2.1.1 which solved this problem? Sent from my iPhone > On Sep 17, 2017, at 11:08 PM, Anastasios Zouzias wrote: > > Hi, > > I had a similar issue using 2.1.0 but not with Kafka. Updating to 2.1.1 > solved my issue. Can you try with 2.1.1 as well and repo

Read all the columns from a file in spark sql

2014-07-16 Thread pandees waran
Hi, I am newbie to spark sql and i would like to know about how to read all the columns from a file in spark sql. I have referred the programming guide here: http://people.apache.org/~tdas/spark-1.0-docs/sql-programming-guide.html The example says: val people = sc.textFile("examples/src/main/re

Equivalent functions for NVL() and CASE expressions in Spark SQL

2014-07-17 Thread pandees waran
Do we have any equivalent scala functions available for NVL() and CASE expressions to use in spark sql?