This is not open source but we are using Vertica and it works very nicely
for us. There is a 1TB community edition but above that it costs money.
It has really advanced SQL (analytical functions, etc), works like an
RDBMS, has R/Java/C++ SDK and scales nicely. There is a similar option of
Redshift
ividual text documents), but it does
> get through all the mechanics of exactly what you state you want.
>
> The meetup page also has links to video, if the slides don't give enough
> context.
>
> HTH
>
> [1]: http://www.meetup.com/Data-Science-MD/events/111081282/
&
Hi Nitin,
No offense taken. Thank you for your response. Part of this is also trying
to find the right tool for the job.
I am doing queries to determine the cuts of tweets that I want, then doing
some modest normalization (through a python script) and then I want to
create sequenceFiles from that
Hi,
I have a lot of tweets saved as text. I created an external table on top of
it to access it as textfile. I need to convert these to sequencefiles with
each tweet as its own record. To do this, I created another table as a
sequencefile table like so -
CREATE EXTERNAL TABLE tweetseq(
tweet ST