Re: Searching for a string off a group by query

2012-07-17 Thread Tharindu Mathew
//cwiki.apache.org/Hive/presentations.data/WhatsNewInHive090HadoopSummit2012BoF.pdf > > On the array_contains: > > > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-CollectionFunctions > > boolean array_contains(Array, value)Returns TRUE if the array contains > value >

Searching for a string off a group by query

2012-07-16 Thread Tharindu Mathew
Hi everyone, I'd like to do $subject and was approaching it with the following query: select activityId, count(activityId), *find_in_set("CCC", collect_set(msgBody))* from ActivityStream group by activityId; But find_in_set doesn't seem to accept arrays. Is there a way to cast this string array

Embedding Hive

2012-01-24 Thread Tharindu Mathew
Hi, I was wondering whether it's possible to embed a Hive runtime inside another Java program. Basically, if the hadoop*xml are present in the classpath I expect it to submit the job to the Hadoop cluster, but if not it will default to local mode and execute the job. Thanks in advance. -- Rega

Re: Recommendations on moving to Hadoop/Hive with Cassandra + RDBMS

2011-08-30 Thread Tharindu Mathew
might be helpful: > http://www.datastax.com/docs/0.8/brisk/about_pig > > On Aug 30, 2011, at 1:30 PM, Tharindu Mathew wrote: > > > Thanks Jeremy for your response. That gives me some encouragement, that I > might be on that right track. > > > > I think I need to t

Re: Recommendations on moving to Hadoop/Hive with Cassandra + RDBMS

2011-08-30 Thread Tharindu Mathew
s I use. Here's a sample data analysis I do with my language. Maybe, there is no generic way to do what I want to do. > 2011/8/29 Tharindu Mathew > >> Hi, >> >> I have an already running system where I define a simple data flow (using

Recommendations on moving to Hadoop/Hive with Cassandra + RDBMS

2011-08-29 Thread Tharindu Mathew
Hi, I have an already running system where I define a simple data flow (using a simple custom data flow language) and configure jobs to run against stored data. I use quartz to schedule and run these jobs and the data exists on various data stores (mainly Cassandra but some data exists in RDBMS li