Re: Spark job server pros and cons

2016-12-09 Thread Shak S
eed to have some learning curve and trouble shooting. On Fri, Dec 9, 2016 at 4:31 PM, Cassa L wrote: > Hi, > So far, I ran spark jobs directly using spark-submit options. I have a > use case to use Spark Job server to run the job. I wanted to find out PROS > and CONs of using this

Spark job server pros and cons

2016-12-09 Thread Cassa L
Hi, So far, I ran spark jobs directly using spark-submit options. I have a use case to use Spark Job server to run the job. I wanted to find out PROS and CONs of using this job server? If anyone can share it, it will be great. My jobs usually connected to multiple data sources like Kafka, Custom

Pros and cons of using different persistence layers for Spark

2016-10-03 Thread Ashok Kumar
What are the pros and cons of using different persistence layers for Spark, such as S3,Cassandra, and HDFS? Thanks

Re: Pros and Cons

2016-05-27 Thread Teng Qiu
yes, only for engine, but maybe newer version has more optimization from tungsten project? at least since spark 1.6? > -- Forwarded message -- > From: Mich Talebzadeh > Date: 27 May 2016 at 17:09 > Subject: Re: Pros and Cons > To: Teng Qiu > Cc: Ted Yu , Ko

Fwd: Pros and Cons

2016-05-27 Thread Mich Talebzadeh
: Mich Talebzadeh Date: 27 May 2016 at 17:09 Subject: Re: Pros and Cons To: Teng Qiu Cc: Ted Yu , Koert Kuipers , Jörn Franke , user , Aakash Basu < raj2coo...@gmail.com>, Reynold Xin not worth spending time really. The only version that works is Spark 1.3.1 with Hive 2 To be perfectly hone

Re: Pros and Cons

2016-05-27 Thread Teng Qiu
tried spark 2.0.0 preview, but no assembly jar there... then just gave up... :p 2016-05-27 17:39 GMT+02:00 Ted Yu : > Teng: > Why not try out the 2.0 SANPSHOT build ? > > Thanks > >> On May 27, 2016, at 7:44 AM, Teng Qiu wrote: >> >> ah, yes, the version is another mess!... no vendor's product >>

Re: Pros and Cons

2016-05-27 Thread Mich Talebzadeh
Hi Ted, do you mean Hive 2 with spark 2 snapshot build as the execution engine just binaries for snapshot (all ok)? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Pros and Cons

2016-05-27 Thread Ted Yu
Teng: Why not try out the 2.0 SANPSHOT build ? Thanks > On May 27, 2016, at 7:44 AM, Teng Qiu wrote: > > ah, yes, the version is another mess!... no vendor's product > > i tried hadoop 2.6.2, hive 1.2.1 with spark 1.6.1, doesn't work. > > hadoop 2.6.2, hive 2.0.1 with spark 1.6.1, works, but

Re: Pros and Cons

2016-05-27 Thread Teng Qiu
ah, yes, the version is another mess!... no vendor's product i tried hadoop 2.6.2, hive 1.2.1 with spark 1.6.1, doesn't work. hadoop 2.6.2, hive 2.0.1 with spark 1.6.1, works, but need to fix this from hive side https://issues.apache.org/jira/browse/HIVE-13301 the jackson-databind lib from calci

Re: Pros and Cons

2016-05-27 Thread Mich Talebzadeh
Hi Teng, what version of spark are using as the execution engine. are you using a vendor's product here? thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Pros and Cons

2016-05-27 Thread Teng Qiu
I agree with Koert and Reynold, spark works well with large dataset now. back to the original discussion, compare SparkSQL vs Hive in Spark vs Spark API. SparkSQL vs Spark API you can simply imagine you are in RDBMS world, SparkSQL is pure SQL, and Spark API is language for writing stored procedu

Re: Pros and Cons

2016-05-26 Thread Koert Kuipers
We do disk-to-disk iterative algorithms in spark all the time, on datasets that do not fit in memory, and it works well for us. I usually have to do some tuning of number of partitions for a new dataset but that's about it in terms of inconveniences. On May 26, 2016 2:07 AM, "Jörn Franke" wrote:

Re: Pros and Cons

2016-05-25 Thread Jörn Franke
Spark can handle this true, but it is optimized for the idea that it works it works on the same full dataset in-memory due to the underlying nature of machine learning algorithms (iterative). Of course, you can spill over, but that you should avoid. That being said you should have read my fina

Re: Pros and Cons

2016-05-25 Thread Reynold Xin
On Wed, May 25, 2016 at 9:52 AM, Jörn Franke wrote: > Spark is more for machine learning working iteravely over the whole same > dataset in memory. Additionally it has streaming and graph processing > capabilities that can be used together. > Hi Jörn, The first part is actually no true. Spark c

Re: Pros and Cons

2016-05-25 Thread Jörn Franke
lebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > http://talebzadehmich.wordpress.com > > >> On 25 May 2016 at 16:34, Aakash Basu wrote: >> Hi, >> >> >> >> I’m new to th

Re: Pros and Cons

2016-05-25 Thread Mich Talebzadeh
I’m new to the Spark Ecosystem, need to understand the *Pros and Cons *of > fetching data using *SparkSQL vs Hive in Spark vs Spark API.* > > > > *PLEASE HELP!* > > > > Thanks, > > Aakash Basu. >

Pros and Cons

2016-05-25 Thread Aakash Basu
Hi, I’m new to the Spark Ecosystem, need to understand the *Pros and Cons *of fetching data using *SparkSQL vs Hive in Spark vs Spark API.* *PLEASE HELP!* Thanks, Aakash Basu.

Re: Pros and cons -Saving spark data in hive

2015-12-15 Thread Sabarish Sasidharan
I am exploring option and pros and cons which > one will work best in spark and hive context.My dataset inputs are CSV > files, using spark to process the my data and saving it in hive using > hivecontext > > 1) Process the CSV file using spark-csv package and create temptable and

Pros and cons -Saving spark data in hive

2015-12-15 Thread Divya Gehlot
Hi, I am new bee to Spark and I am exploring option and pros and cons which one will work best in spark and hive context.My dataset inputs are CSV files, using spark to process the my data and saving it in hive using hivecontext 1) Process the CSV file using spark-csv package and create