Re: Is there such thing as cache fusion with the underlying tables/files on HDFS

Mich Talebzadeh Sat, 17 Sep 2016 15:01:06 -0700

Thanks Todd

As I thought Apache Ignite is a data fabric much like Oracle Coherence
cache or HazelCast.


The use case is different between an in-memory-database (IMDB) and Data
Fabric. The build that I am dealing with has a 'database centric' view of
its data (i.e. it accesses its data using Spark sql and JDBC) so an
in-memory database will be a better fit. On the other hand If the
application deals solely with Java objects and does not have any notion of
a 'database', does not need SQL style queries and really just wants a
distributed, high performance object storage grid, then I think Ignite would
likely be the preferred choice.

So will likely go if needed for an in-memory database like Alluxio. I have
seen a rather debatable comparison between Spark and Ignite
<http://drcos.boudnik.org/2015/04/apache-ignite-vs-apache-spark.html>that
looks to be like a one sided rant.

HTH



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 17 September 2016 at 20:53, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Thanks Todd.
>
> I will have a look.
>
> Regards
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 17 September 2016 at 20:45, Todd Nist <tsind...@gmail.com> wrote:
>
>> Hi Mich,
>>
>> Have you looked at Apache Ignite?  https://apacheignite-fs.readme.io/docs.
>>
>>
>> This looks like something that may be what your looking for:
>>
>> http://apacheignite.gridgain.org/docs/data-analysis-with-apache-zeppelin
>>
>> HTH.
>>
>> -Todd
>>
>>
>> On Sat, Sep 17, 2016 at 12:53 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am seeing similar issues when I was working on Oracle with Tableau as
>>> the dashboard.
>>>
>>> Currently I have a batch layer that gets streaming data from
>>>
>>> source -> Kafka -> Flume -> HDFS
>>>
>>> It stored on HDFS as text files and a cron process sinks Hive table with
>>> the the external table build on the directory. I tried both ORC and Parquet
>>> but I don't think the query itself is the issue.
>>>
>>> Meaning it does not matter how clever your execution engine is, the fact
>>> you still have to do  considerable amount of Physical IO (PIO) as opposed
>>> to Logical IO (LIO) to get the data to Zeppelin is on the critical path.
>>>
>>> One option is to limit the amount of data in Zeppelin to certain number
>>> of rows or something similar. However, you cannot tell a user he/she cannot
>>> see the full data.
>>>
>>> We resolved this with Oracle by using Oracle TimesTen
>>> <http://www.oracle.com/technetwork/database/database-technologies/timesten/overview/index.html>IMDB
>>> to cache certain tables in memory and get them refreshed (depending on
>>> refresh frequency) from the underlying table in Oracle when data is
>>> updated). That is done through cache fusion.
>>>
>>> I was looking around and came across Alluxio <http://www.alluxio.org/>.
>>> Ideally I like to utilise such concept like TimesTen. Can one distribute
>>> Hive table data (or any table data) across the nodes cached. In that case
>>> we will be doing Logical IO which is about 20 times or more lightweight
>>> compared to Physical IO.
>>>
>>> Anyway this is the concept.
>>>
>>> Thanks
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>
>>
>

Re: Is there such thing as cache fusion with the underlying tables/files on HDFS

Reply via email to