Re: Best practises to storing data in Parquet files

2016-08-29 Thread Mich Talebzadeh
Hi Kevin. When you say Kafka interacting with Oracle database (if I understand you correctly) are you using GoldenGate with Kafka interface to push data from Oracle to Kafka? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Best practises to storing data in Parquet files

2016-08-28 Thread Chanh Le
> Does parquet file has limit in size ( 1TB ) ? I did’t see any problem but 1TB is too big to operation need to divide into small pieces. > Should we use SaveMode.APPEND for long running streaming app ? Yes, but you need to partition it by time so it easy to maintain like update or delete a spec

Re: Best practises to storing data in Parquet files

2016-08-28 Thread Kevin Tran
Hi Mich, My stack is as following: Data sources: * IBM MQ * Oracle database Kafka to store all messages from data sources Spark Streaming fetching messages from Kafka and do a bit transform and write parquet files to HDFS Hive / SparkSQL / Impala will query on parquet files. Do you have any re

Re: Best practises to storing data in Parquet files

2016-08-28 Thread Mich Talebzadeh
Hi, Can you explain about you particular stack. Example what is the source of streaming data and the role that Spark plays. Are you dealing with Real Time and Batch and why Parquet and not something like Hbase to ingest data real time. HTH Dr Mich Talebzadeh LinkedIn * https://www.linked

Re: Best practises around spark-scala

2016-08-08 Thread Deepak Sharma
Thanks Vaquar. My intention is to find something which can help stress test the code in spark , measure the performance and suggest some improvements. Is there any such framework or tool I can use here ? Thanks Deepak On 8 Aug 2016 9:14 pm, "vaquar khan" wrote: > I found following links are goo

Re: Best practises around spark-scala

2016-08-08 Thread vaquar khan
I found following links are good as I am using same. http://spark.apache.org/docs/latest/tuning.html https://spark-summit.org/2014/testing-spark-best-practices/ Regards, Vaquar khan On 8 Aug 2016 10:11, "Deepak Sharma" wrote: > Hi All, > Can anyone please give any documents that may be there

Re: Best practises of share Spark cluster over few applications

2016-02-14 Thread Alex Kozlov
Praveen, the mode in which you run spark (standalone, yarn, mesos) is determined when you create SparkContext . You are right that spark-submit and spark-shell create different SparkContexts. In general, resour

Re: Best practises of share Spark cluster over few applications

2016-02-14 Thread praveen S
Even i was trying to launch spark jobs from webservice : But I thought you could run spark jobs in yarn mode only through spark-submit. Is my understanding not correct? Regards, Praveen On 15 Feb 2016 08:29, "Sabarish Sasidharan" wrote: > Yes you can look at using the capacity scheduler or the

Re: Best practises of share Spark cluster over few applications

2016-02-14 Thread Sabarish Sasidharan
Yes you can look at using the capacity scheduler or the fair scheduler with YARN. Both allow using full cluster when idle. And both allow considering cpu plus memory when allocating resources which is sort of necessary with Spark. Regards Sab On 13-Feb-2016 10:11 pm, "Eugene Morozov" wrote: > Hi

Re: Best practises of share Spark cluster over few applications

2016-02-13 Thread Jörn Franke
This is possible with yarn. You also need to think about preemption in case one web service starts doing something and after a while another web service wants also to do something. > On 13 Feb 2016, at 17:40, Eugene Morozov wrote: > > Hi, > > I have several instances of the same web-service

Re: Best practises

2015-11-02 Thread Sushrut Ikhar
This presentation may clarify many of your doubts. https://www.youtube.com/watch?v=7ooZ4S7Ay6Y Regards, Sushrut Ikhar [image: https://]about.me/sushrutikhar On Mon, Nov 2, 2015 at 7:15 PM, Denny Lee wrote: > In addition, you may want to check ou

Re: Best practises

2015-11-02 Thread Denny Lee
In addition, you may want to check out Tuning and Debugging in Apache Spark (https://sparkhub.databricks.com/video/tuning-and-debugging-apache-spark/) On Mon, Nov 2, 2015 at 05:27 Stefano Baghino wrote: > There is this interesting book from Databricks: > https://www.gitbook.com/book/databricks/d

Re: Best practises

2015-11-02 Thread Stefano Baghino
There is this interesting book from Databricks: https://www.gitbook.com/book/databricks/databricks-spark-knowledge-base/details What do you think? Does it contain the info you're looking for? :) On Mon, Nov 2, 2015 at 2:18 PM, satish chandra j wrote: > HI All, > Yes, any such doc will be a grea

Re: Best practises

2015-11-02 Thread satish chandra j
HI All, Yes, any such doc will be a great help!!! On Fri, Oct 30, 2015 at 4:35 PM, huangzheng <1106944...@qq.com> wrote: > I have the same question.anyone help us. > > > -- 原始邮件 -- > *发件人:* "Deepak Sharma"; > *发送时间:* 2015年10月30日(星期五) 晚上7:23 > *收件人:* "user"; > *主题