Which Pig Version with Hadoop 0.22

2013-07-17 Thread vivek thakre
Hello All, Which Apache Pig Release would work wiht Hadoop 0.22 release? Thank you

DISTINCT and paritioner

2013-07-17 Thread William Oberman
The docs say DISTINCT can take a custom partitioner. How does that work? What is "K" and "V"? I'm having some doubts the docs are correct. I wrote a test partitioner that does a System.out of K and V. I then wrote simple scripts to do JOIN, GROUP and DISTINCT. For JOIN and GROUP I see my syste

timezone option for timestamp to datetime conversion

2013-07-17 Thread anup ahire
Hello, ToDate function on long timestamp always uses default time zone. Why option of specifying timezone for timestamp to datetime conversion is not available like its available for customized string to date time conversion ? Thanks, Anup

Re: DISTINCT and paritioner

2013-07-17 Thread William Oberman
I forgot to PS my (*). (*) For JOIN, my test was basically: JOIN A by $0, B by $0 And my system.out showed K = $0, and V = A less $0 (or B less $0). E.g. if A = (1,2,3), then K = 1, and V = (2,3) For GROUP: GROUP A by $0 Showed K = $0, V = A less $0. E.g. if A = (1,2,3), then K=1 and V = (2,3)

Re: Which Pig Version with Hadoop 0.22

2013-07-17 Thread Johnny Zhang
from the release note, looks like none of the release works with 0.22 http://pig.apache.org/releases.html#1+April%2C+2013%3A+release+0.11.1+available Johnny On Wed, Jul 17, 2013 at 10:41 AM, vivek thakre wrote: > Hello All, > > Which Apache Pig Release would work wiht Hadoop 0.22 release? > > T

Re: timezone option for timestamp to datetime conversion

2013-07-17 Thread Johnny Zhang
I haven't try, but looks like you can use ToDate(String dtStr, String format, String timezone) ? Johnny On Wed, Jul 17, 2013 at 11:28 AM, anup ahire wrote: > Hello, > > ToDate function on long timestamp always uses default time zone. > > Why option of specifying timezone for timestamp to datet

Re: timezone option for timestamp to datetime conversion

2013-07-17 Thread anup ahire
Thanks for reply. What I want to know is why ToDate(Long timestamp , String timezone) is not available and is there any reason behind it. ToDate(Long timestamp) always uses default time zone on pig client which I know is configurable. Best , Anup On Wed, Jul 17, 2013 at 12:12 PM, Johnny Zhang

Re: Which Pig Version with Hadoop 0.22

2013-07-17 Thread Alan Gates
We have never produced a release that works with Hadoop 0.22. There were some patches for it, see https://issues.apache.org/jira/browse/PIG-2277 You might be able to build your own version. Alan. On Jul 17, 2013, at 10:41 AM, vivek thakre wrote: > Hello All, > > Which Apache Pig Release woul

Re: timezone option for timestamp to datetime conversion

2013-07-17 Thread Shahab Yunus
I am not sure but have you looked at other libs from linkedin or twitter? Regards, Shahab On Wed, Jul 17, 2013 at 3:26 PM, anup ahire wrote: > Thanks for reply. > What I want to know is why ToDate(Long timestamp , String timezone) is not > available and is there any reason behind it. > > ToDat

python version with Jython/Pig

2013-07-17 Thread Dexin Wang
When I do Python UDF with Pig, how do we know which version of Python it is using? Is it possible to use a specific version of Python? Specifically my problem is in my UDF, I need to use a function in math module math.erf() which is newly introduced in Python version 2.7. I have Python 2.7 install

Re: python version with Jython/Pig

2013-07-17 Thread Cheolsoo Park
Hi Dexin, Unfortunately, Pig is on Jython 2.5, so you won't be able to use Python 2.7 modules. A while back, someone posted a hack to get Jython 2.7-b1 working with Pig. You might give it a try: http://search-hadoop.com/m/BnZs3MmH5y/jython+2.7&subj=informational+getting+jython+2+7+b1+to+work Tha

Getting dimension values for Facts

2013-07-17 Thread Something Something
There must be a better way to do this in Pig. Here's how my script looks like right now: (omitted some snippet for saving space, but you will get the idea). FACT_TABLE = LOAD 'XYZ' as (col1 :chararray,………. col30: chararray); FACT_TABLE1 = FOREACH FACT_TABLE GENERATE col1, udf1(col2) as col2,…