Thanks Chris and Mich for replying. Sorry for not explaining my problem clearly. Yes i am talking about a flexibke dashboard when mention Zeppelin.
Here is the problem i am having: I am running a comercial website where we selle many products and we have many branchs in many place. We have a lots of realtime transactions and want to anaylyze it in realtime. We dont want every time doing analytics we have to aggregate every single transactions ( each transaction have BranchID, ProductID, Qty, Price). So, we maintain intermediate data which contains : BranchID, ProducrID, totalQty, totalDollar Ideally, we have 2 tables: Transaction ( BranchID, ProducrID, Qty, Price, Timestamp) And intermediate table Stats is just sum of every transaction group by BranchID and ProductID( i am using Sparkstreaming to calculate this table realtime) My thinking is that doing statistics ( realtime dashboard) on Stats table is much easier, this table is also not enough for maintain. I'm just wondering, whats the best way to store Stats table( a database or parquet file?) What exactly are you trying to do? Zeppelin is for interactive analysis of a dataset. What do you mean "realtime analytics" -- do you mean build a report or dashboard that automatically updates as new data comes in? -- Chris Miller On Sat, Mar 12, 2016 at 3:13 PM, trung kien <kient...@gmail.com> wrote: > Hi all, > > I've just viewed some Zeppenlin's videos. The intergration between > Zeppenlin and Spark is really amazing and i want to use it for my > application. > > In my app, i will have a Spark streaming app to do some basic realtime > aggregation ( intermediate data). Then i want to use Zeppenlin to do some > realtime analytics on the intermediate data. > > My question is what's the most efficient storage engine to store realtime > intermediate data? Is parquet file somewhere is suitable? >