Having done some digging on this 4th generation Data Warehouse topic, I believe it is just a misdemeanor of some sort. However, I came across this note in Linkedin which although had some element of truth about market differentiation etc, it showed plaine ignorance of why on premise Hadoop data lake and its newer cloud Data Lakehouse were created in the first place. The author takes a simplistic and rejectionist view of these concepts. It pointedly states and I quote:
"... Like as if data warehousing can’t hack data related to brands? And you’d need Spark, data streaming from social media and the Hadoop ecosphere to make that magic sauce work..." And I would say yes we do need it. Anyway read for yourself as I found it interesting in some aspects. It is light reading I guess and the reader's discretion is needed and admittedly not for everyone's taste. Bullshit at the Data Lakehouse | GOOD STRATEGY <https://goodstrat.com/2020/04/15/bullshit-at-the-data-lakehouse/> HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Sun, 18 Apr 2021 at 21:17, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > > You must forgive me for this seemingly pseudo technical question. Last > week I came across a client manager who mentioned developing 4th generation > data warehousing with Spark. And I was wondering whether the individual > pointedly made a reference to the new data lakehouse concept and how it was > different with the current concept of Real time data pipeline, Batch data > pipeline, Lambda Architecture or just plain Data enrichment. Spark can be > used for all these. Can anyone throw some light on the notion of 4th > Generation Data Warehousing with Spark? The D in Spark RDD for Dataset can > handle structured, semi-structured and equally unstructured data so what is > new? > > Thanks > > Mich > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > >