Re: Which [open-souce] SQL engine atop Hadoop?

2015-02-02 Thread Saurabh B
This is not open source but we are using Vertica and it works very nicely for us. There is a 1TB community edition but above that it costs money. It has really advanced SQL (analytical functions, etc), works like an RDBMS, has R/Java/C++ SDK and scales nicely. There is a similar option of Redshift

Re: Converting from textfile to sequencefile using Hive

2013-09-30 Thread Saurabh B
ividual text documents), but it does > get through all the mechanics of exactly what you state you want. > > The meetup page also has links to video, if the slides don't give enough > context. > > HTH > > [1]: http://www.meetup.com/Data-Science-MD/events/111081282/ &

Re: Converting from textfile to sequencefile using Hive

2013-09-30 Thread Saurabh B
Hi Nitin, No offense taken. Thank you for your response. Part of this is also trying to find the right tool for the job. I am doing queries to determine the cuts of tweets that I want, then doing some modest normalization (through a python script) and then I want to create sequenceFiles from that

Converting from textfile to sequencefile using Hive

2013-09-30 Thread Saurabh B
Hi, I have a lot of tweets saved as text. I created an external table on top of it to access it as textfile. I need to convert these to sequencefiles with each tweet as its own record. To do this, I created another table as a sequencefile table like so - CREATE EXTERNAL TABLE tweetseq( tweet ST