Re: Pros and Cons

Teng Qiu Fri, 27 May 2016 07:46:46 -0700

ah, yes, the version is another mess!... no vendor's product

i tried hadoop 2.6.2, hive 1.2.1 with spark 1.6.1, doesn't work.


hadoop 2.6.2, hive 2.0.1 with spark 1.6.1, works, but need to fix this
from hive side https://issues.apache.org/jira/browse/HIVE-13301

the jackson-databind lib from calcite-avatica.jar is too old.

will try hadoop 2.7, hive 2.0.1 and spark 2.0.0, when spark 2.0.0 released.


2016-05-27 16:16 GMT+02:00 Mich Talebzadeh <mich.talebza...@gmail.com>:
> Hi Teng,
>
>
> what version of spark are using as the execution engine. are you using a
> vendor's product here?
>
> thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
>
> On 27 May 2016 at 13:05, Teng Qiu <teng...@gmail.com> wrote:
>>
>> I agree with Koert and Reynold, spark works well with large dataset now.
>>
>> back to the original discussion, compare SparkSQL vs Hive in Spark vs
>> Spark API.
>>
>> SparkSQL vs Spark API you can simply imagine you are in RDBMS world,
>> SparkSQL is pure SQL, and Spark API is language for writing stored
>> procedure
>>
>> Hive on Spark is similar to SparkSQL, it is a pure SQL interface that
>> use spark as spark as execution engine, SparkSQL uses Hive's syntax,
>> so as a language, i would say they are almost the same.
>>
>> but Hive on Spark has a much better support for hive features,
>> especially hiveserver2 and security features, hive features in
>> SparkSQL is really buggy, there is a hiveserver2 impl in SparkSQL, but
>> in latest release version (1.6.x), hiveserver2 in SparkSQL doesn't
>> work with hivevar and hiveconf argument anymore, and the username for
>> login via jdbc doesn't work either...
>> see https://issues.apache.org/jira/browse/SPARK-13983
>>
>> i believe hive support in spark project is really very low priority
>> stuff...
>>
>> sadly Hive on spark integration is not that easy, there are a lot of
>> dependency conflicts... such as
>> https://issues.apache.org/jira/browse/HIVE-13301
>>
>> our requirement is using spark with hiveserver2 in a secure way (with
>> authentication and authorization), currently SparkSQL alone can not
>> provide this, we are using ranger/sentry + Hive on Spark.
>>
>> hope this can help you to get a better idea which direction you should go.
>>
>> Cheers,
>>
>> Teng
>>
>>
>> 2016-05-27 2:36 GMT+02:00 Koert Kuipers <ko...@tresata.com>:
>> > We do disk-to-disk iterative algorithms in spark all the time, on
>> > datasets
>> > that do not fit in memory, and it works well for us. I usually have to
>> > do
>> > some tuning of number of partitions for a new dataset but that's about
>> > it in
>> > terms of inconveniences.
>> >
>> > On May 26, 2016 2:07 AM, "Jörn Franke" <jornfra...@gmail.com> wrote:
>> >
>> >
>> > Spark can handle this true, but it is optimized for the idea that it
>> > works
>> > it works on the same full dataset in-memory due to the underlying nature
>> > of
>> > machine learning algorithms (iterative). Of course, you can spill over,
>> > but
>> > that you should avoid.
>> >
>> > That being said you should have read my final sentence about this. Both
>> > systems develop and change.
>> >
>> >
>> > On 25 May 2016, at 22:14, Reynold Xin <r...@databricks.com> wrote:
>> >
>> >
>> > On Wed, May 25, 2016 at 9:52 AM, Jörn Franke <jornfra...@gmail.com>
>> > wrote:
>> >>
>> >> Spark is more for machine learning working iteravely over the whole
>> >> same
>> >> dataset in memory. Additionally it has streaming and graph processing
>> >> capabilities that can be used together.
>> >
>> >
>> > Hi Jörn,
>> >
>> > The first part is actually no true. Spark can handle data far greater
>> > than
>> > the aggregate memory available on a cluster. The more recent versions
>> > (1.3+)
>> > of Spark have external operations for almost all built-in operators, and
>> > while things may not be perfect, those external operators are becoming
>> > more
>> > and more robust with each version of Spark.
>> >
>> >
>> >
>> >
>> >
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Pros and Cons

Reply via email to