Your e-mail addressed get caught by spammers

2012-01-19 Thread Martin Kuhn
ses in the body won't be processed and can be crawled by spam spiders: "On Mon, Jan 9, 2012 at 8:47 PM, Martin Kuhn wrote:" This gives me two options: 1) Don't post to any mailing list 2) Use a one-was address for every post. Or does anyone here have other good ideas? Be

Re: how to avoid scan the same table multi times?

2012-01-12 Thread Martin Kuhn
ULL) FROM t WHERE dt in ("2012-1-12-02", "2012-1-12-03") GROUP BY type ORDER BY type ; Good luck :) Martin Kuhn P.S. You'ge got a strange date format there. For sorting purposes it would be more appropriate to use something like "2012-01-12-02".

Re: need help in Mapreduce(urgent)

2012-01-09 Thread Martin Kuhn
> one more i wanna ask like how i can write output in different directories > according to key values. It would be good to know your use case, but maybe you can partition your results according to the keys http://developer.yahoo.com/hadoop/tutorial/module5.html#partitioning and use a cust

Re: need help in Mapreduce(urgent)

2012-01-09 Thread Martin Kuhn
Hi Vikas, > 1:- How to format output from reduce( like default is tab separator can we > make it "," separator) If you want this behaviour for all your Hadoop jobs, you have to put this into your mapred-site.xml: mapred.textoutputformat.separator , (see https://issue

Re: What is best way to load data into hive tables/hadoop file system

2011-11-02 Thread Martin Kuhn
You could try to use splittable LZO compression instead: https://github.com/kevinweil/hadoop-lzo (a gz file can't be split) > We have multiple terabytes of data (currently in gz format approx size 2GB > per file). What is best way to load that data into Hadoop? > We have seen that (especially