Re: How can I access secure HBase in UDF

2012-09-25 Thread Rohini Palaniswamy
Ray, Looking at the EvalFunc interface, I can not see a way or loophole to do it. EvalFunc does not have a reference to Job or JobConf object to add credentials to it. It has getCacheFiles() to add files to DistributedCache, but no method to add credentials. We should probably add one. The not

Re: How can I access secure HBase in UDF

2012-09-25 Thread Rohini Palaniswamy
:09 PM, Alan Gates wrote: > You can use the UDFContext to pass information for the UDF in the JobConf > without writing files. > > Alan. > > On Sep 25, 2012, at 10:48 AM, Rohini Palaniswamy wrote: > > > Ray, > > Looking at the EvalFunc interface, I can not see

Re: How can I access secure HBase in UDF

2012-09-27 Thread Rohini Palaniswamy
; On Tue, Sep 25, 2012 at 5:09 PM, Alan Gates > > wrote: > > > > > > > You can use the UDFContext to pass information for the UDF in the > > JobConf > > > > without writing files. > > > > > > > > Alan. > > > > >

Re: Problems when building the Java project using pig script built using Maven.

2012-10-28 Thread Rohini Palaniswamy
Hi Pankaj, Pig depends on jline-0.9.94.jar and is packaged as part of pig-withouthadoop.jar or pig.jar when you build pig and will be part of an installation of pig. But the pig jar in maven does not include jline. Please add jline (http://repo1.maven.org/maven2/jline/jline/0.9.94/) to your mav

Re: Jar Conflicts

2012-11-01 Thread Rohini Palaniswamy
Hadoop has an option to set the user classpath first - mapreduce.user.classpath.first Regards, Rohini On Wed, Oct 31, 2012 at 11:49 AM, Mohit Anchlia wrote: > Any suggestions on how one can override the jar files in hadoop path to > give preference to jars used in "register" command? > > On Tue,

Re: How can I access secure HBase in UDF

2012-11-05 Thread Rohini Palaniswamy
t; Thanks, > Ray > > On Fri, Sep 28, 2012 at 1:54 AM, Rohini Palaniswamy < > rohini.adi...@gmail.com > > wrote: > > > Ray, > >In the frontend, you can do a new JobConf(HBaseConfiguration.create()) > > and pass that to TableMapReduceUtil.initCredentials(

Re: Multiple input file

2012-12-27 Thread Rohini Palaniswamy
You can also use a comma separated list of file paths in the load command. On Sat, Dec 22, 2012 at 6:13 AM, Alan Gates wrote: > Yes. See http://pig.apache.org/docs/r0.10.0/basic.html#load for a > discussion of how to use globs in file paths. > > Alan. > > On Dec 21, 2012, at 10:38 PM, Mohit An

Re: Replacing string in input parameter

2012-12-27 Thread Rohini Palaniswamy
Take a look at http://pig.apache.org/docs/r0.10.0/cont.html#Parameter-Sub- Specifying Parameters Using the Declare Statement. You can do this in your case %declare page_input_path `echo $input_path | sed 's/output/output\/page/g'` Or you can use embedded python ( http://pig.apache.org/docs/r0.10.

Re: pig ship tar files

2012-12-27 Thread Rohini Palaniswamy
You can also use -Dmapred.cache.archives= to ship the tar file using distributed cache. Hadoop will take care of untarring the file and putting it in the current directory if the extension is one of .zip, .tar, .tgz or .tar.gz. This is a feature of hadoop's distributed cache. Regards, Rohini On

Re: tmp directory for Pig

2013-01-25 Thread Rohini Palaniswamy
java.io.tmpdir should work. If you are running python script, more space is required as the jars will cached in the java.io.tmp dir. Alternatively, you can try a different location for it by using -Dpython.cachedir= or you can skip the caching using -Dpython.cachedir.skip=true. My guess is that /f

Re: Run a job async

2013-01-25 Thread Rohini Palaniswamy
Jon, Those are good areas to check. Few things I have seen regarding those are 1) JythonScriptEngine -PythonInterpreter is static and is not suitable for multiple runs if the script names are same (hit this issue in PIG-2433 unit tests). 2) QueryParserDriver - There is a static cache with macr

Re: Pig and DistributedCache

2013-02-06 Thread Rohini Palaniswamy
You should be fine using tmpfiles and that's the way to do it. Else you will have to copy the file to hdfs, and call the DistributedCache.addFileToClassPath yourself (basically what tmpfiles setting is doing). But the problem there as you mentioned is cleaning up the hdfs file after the job compl

Re: Reduce Tasks

2013-02-06 Thread Rohini Palaniswamy
The number of maps depends on the number of input splits. mapred.map.tasks is just a hint and needs to be honored by the InputFormat. With pig, you can try pig.maxCombinedSplitSize configuration to control the number of maps based on input size. For eg: 1G split size can be specified as Dpig.maxCo

Re: Pig and DistributedCache

2013-02-16 Thread Rohini Palaniswamy
r month or so we probably update our CDH4 to whatever is there. > > Will it still work? Will it be safe for the cluster or for my job? Who > > knows what will be implemented there? > > > > You see, I can understand the code, find such a solution, but I won't be > >

Re: Pig and DistributedCache

2013-02-19 Thread Rohini Palaniswamy
; corresponding authentification and cannot access the file, which has been > written with another user. > > Any ideas of what to try? > > On Sun, Feb 17, 2013 at 8:22 AM, Rohini Palaniswamy < > rohini.adi...@gmail.com > > wrote: > > > Hi Eugene, > >

Re: pig shell script does not check for PIG_HOME being set prior to exporting

2013-03-04 Thread Rohini Palaniswamy
We should make PIG_HOME configurable. Can you create a jira and upload a patch? Thanks, Rohini On Sat, Mar 2, 2013 at 6:14 PM, Robert wrote: > It looks like the pig shell script in v0.11 exports PIG_HOME without first > checking to see if it already exists. > > from line 78 in /bin/pig: > # th

Re: Failure to run pig jobs using HbaseStorage in Oozie

2013-03-14 Thread Rohini Palaniswamy
Hi Praveen, Are you running a secure cluster - secure hadoop and hbase? Can you check what is the stacktrace on the pig launcher job log of Hadoop Oozie? Regards, Rohini On Thu, Mar 14, 2013 at 2:28 AM, Praveen Bysani wrote: > Hi, > > I am trying to run a simple pig script that uses HbaseSto

Re: Loader partitioning on field

2013-03-14 Thread Rohini Palaniswamy
Jeff, 1) It should not. If it does push, then it is a bug in pig. 2) I think it should be fine. 3) Look at PColFilterExtractor and PartitionFilterOptimizer Regards, Rohini On Thu, Mar 14, 2013 at 1:31 PM, Jeff Yuan wrote: > I am writing a loader for a storage format, which partitions by a

Re: Loader partitioning on field

2013-03-14 Thread Rohini Palaniswamy
there a way to get a reference > to the logical query plan? > > Thanks again. > > On Thu, Mar 14, 2013 at 1:51 PM, Rohini Palaniswamy > wrote: > > Jeff, > > > > 1) It should not. If it does push, then it is a bug in pig. > > > > 2) I thin

Re: Failure to run pig jobs using HbaseStorage in Oozie

2013-03-15 Thread Rohini Palaniswamy
hadoop.rpc.protection > authentication > > > hadoop.security.auth_to_local > DEFAULT > > > So i guess i may not be using a secure hadoop/hbase. I am not sure what you > meant by the log of pig launcher job of hadoop oozie. Do you mean the log > in Job Tracker for this job

Re: Failure to run pig jobs using HbaseStorage in Oozie

2013-03-19 Thread Rohini Palaniswamy
n%3Ahbasetable.pig > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:468) > > > But when i checked the logs manually at > /var/log/hadoop-mapreduce.0.2/userlogs/ for a similar job, the > stderr and stdout are empty and syslog has no exception/errors.

Re: Anybody using custom Serializer/Deserializer in Pig Streaming?

2013-03-20 Thread Rohini Palaniswamy
Nice summarization Koji. Wish we had some object that has byte[] and length instead of byte[] as the return type of serialize() and method param of deserialize(). That would enable reuse and cut down on some of the copy. At least there is one copy we can cut down without any API changes by having

Re: Pig jobs - get stdout and stderr

2013-03-22 Thread Rohini Palaniswamy
Not sure what you are exactly trying to capture, but one workaround I can think of is writing your own log4j appender and capturing the log information. -Rohini On Thu, Mar 21, 2013 at 10:13 AM, Cheolsoo Park wrote: > Hi Jeff, > > You're right that those methods in HJob.java throw a > Unsuppor

Re: Welcome our newest committer Prashant Kommireddi

2013-05-02 Thread Rohini Palaniswamy
Congrats Prashant !! On Thu, May 2, 2013 at 3:58 PM, Ashutosh Chauhan wrote: > Congrats Prashant. Hopefully your contributions to Pig will keep flowing in > :) > > Ashutosh > > > On Thu, May 2, 2013 at 3:41 PM, Mike Sukmanowsky wrote: > > > Congrats! > > > > > > On Thu, May 2, 2013 at 3:56 PM,

Re: Pig 0.11.1 OutOfMemory error

2013-09-06 Thread Rohini Palaniswamy
I think we should fix it in pig if it is a regression from pig 0.10. Shubam, If the script works fine for you in pig 0.10, can you open a jira for the issue with 0.11 ? Regards, Rohini On Fri, Sep 6, 2013 at 1:51 PM, Bill Graham wrote: > The getSignature method basically generates a string

Re: AvroStorage Issue - Possibly version related

2013-09-30 Thread Rohini Palaniswamy
It hits this error when json-simple-1.1.jar is not in classpath. You can get around that by adding it to PIG_CLASSPATH apart from registering the jar. The problem is with java classloading where it fails to load the exception class(ParseException) thrown by a constructor of the class( AvroStorage

Re: AvroStorage Issue - Possibly version related

2013-10-01 Thread Rohini Palaniswamy
PM, j.barrett Strausser < j.barrett.straus...@gmail.com> wrote: > I ended up just using the .23.9 hadoop release without any issue. > > > > > On Mon, Sep 30, 2013 at 8:54 PM, Rohini Palaniswamy < > rohini.adi...@gmail.com > > wrote: > > > It hits

Re: Error running PIG 12 with Hadoop 0.23.1.

2013-12-06 Thread Rohini Palaniswamy
Can you try with Hadoop 0.23.8 or 0.23.9? -Rohini On Mon, Dec 2, 2013 at 11:26 AM, Uttam Kumar wrote: > Hi All, > > I am trying to run PIG 12 with Hadoop 0.23.1 and getting following error > msg, Can someone please help and suggest what I am missing. I can run PIG > in local mode without any

Pig User Group Meetup at LinkedIn on Fri Mar 14

2014-01-14 Thread Rohini Palaniswamy
Please join us for the Pig User Group Meetup this quarter at LinkedIn on Fri Mar 14. We have some interesting talks lined up on the recent developments in Pig. RSVP at http://www.meetup.com/PigUser/events/160604192/ Tentative lineup for this meetup: Pig on Tez Pig on Storm Intel Graph Builder Pig

Re: Welcome to the new Pig PMC member Aniket Mokashi

2014-01-15 Thread Rohini Palaniswamy
Congrats Aniket! On Wed, Jan 15, 2014 at 10:12 AM, Mona Chitnis wrote: > Congrats Aniket! Good work! > > -- > > Mona Chitnis > Software Engineer, Hadoop Team > Yahoo! > > > > On Wednesday, January 15, 2014 9:17 AM, Xuefu Zhang > wrote: > > Congratulations, Aniket! > > --Xuefu > > > > On Tue, J

Re: Congratulations to Cheolsoo Park the new Apache Pig project chair

2014-03-20 Thread Rohini Palaniswamy
Thanks Julien. Great job last year. Congratulations, Cheolsoo!!! Well deserved. Great job past 2 years with awesome number of commits and reviews. On Thu, Mar 20, 2014 at 2:07 AM, Lorand Bendig wrote: > Congratulations, Cheolsoo! > > --Lorand > > > On 03/20/2014 02:03 AM, Julien Le Dem wrote:

Re: Strange CROSS behavior

2014-05-02 Thread Rohini Palaniswamy
This looks like a bug. Can you please file a jira with steps to reproduce? On Fri, Apr 18, 2014 at 2:45 PM, Alex Rasmussen wrote: > I'm using PigStorage(',') for all stores. > > I agree about the expensiveness of CROSS, but I'm still kind of confused as > to why it would lose records in this cas

Re: [ANNOUNCE] Welcome new Pig Committer - Lorand Bendig

2014-06-22 Thread Rohini Palaniswamy
Congratulations Lorand !!! On Sun, Jun 22, 2014 at 2:47 PM, Xuefu Zhang wrote: > Many congrats, Lorand! > > --Xuefu > > > On Sun, Jun 22, 2014 at 12:54 PM, Daniel Dai > wrote: > > > Congratulations! > > > > On Sun, Jun 22, 2014 at 7:00 AM, Jarek Jarcec Cecho > > wrote: > > > Congratulations L

Re: [ANNOUNCE] Apache Pig 0.13.0 released

2014-07-04 Thread Rohini Palaniswamy
Thanks Daniel and Cheolsoo for wrapping up all the issues and making this release possible. On Fri, Jul 4, 2014 at 10:46 PM, Cheolsoo Park wrote: > Thank you Daniel for all your hard work! 0.13 is a very important release > with many new features. > > > On Fri, Jul 4, 2014 at 10:24 PM, Daniel D

Re: [ANNOUNCE] Apache Pig 0.13.0 released

2014-07-04 Thread Rohini Palaniswamy
Oops. Missed Aniket :) On Fri, Jul 4, 2014 at 11:03 PM, Rohini Palaniswamy wrote: > Thanks Daniel and Cheolsoo for wrapping up all the issues and making this > release possible. > > > > On Fri, Jul 4, 2014 at 10:46 PM, Cheolsoo Park > wrote: > >> Thank you Danie

Re: [ANNOUNCE] Apache Pig 0.13.0 released

2014-07-07 Thread Rohini Palaniswamy
forward. I would like to especially call out and thank Achal Soni and Mark Wagner who did major refactoring of the Pig code to support multiple execution engines. Cheolsoo Park Daniel Dai Aniket Mokashi Rohini Palaniswamy Lorand Bendig Philip (flip) Kromer Jarek Jarcec Cecho Nezih Yigitbasi Prashant

Re: Re: Error "ERROR 2088: Fetch failed. Couldn't retrieve result" happened during "HCatLoader() then DUMP"

2014-10-14 Thread Rohini Palaniswamy
Lorand, Isn't fetch optimization supposed to be only for DUMP and not STORE ? -Rohini On Tue, Oct 14, 2014 at 6:47 PM, lulynn_2008 wrote: > Hi Lorand, > The query run fine is I disable fetch. Thanks for your help. Could you > tell why we need to disable fetch? > BTW, I was using pig-0.13.0 a

Re: [ANNOUNCE] Apache Pig 0.14.0 released

2014-11-21 Thread Rohini Palaniswamy
Thanks Daniel for the hard work, wrapping up so many loose ends and driving it to a very stable release. Thanks to all the other contributors as well without whom the release would not have been possible. There are lot of new contributors in this release and it is very nice to see the Pig communit

Re: New Contributor

2014-12-16 Thread Rohini Palaniswamy
Welcome Karuna. You can find the required information in https://cwiki.apache.org/confluence/display/PIG/HowToContribute. Regards, Rohini On Fri, Dec 12, 2014 at 2:31 AM, Karuna Devanagavi < karuna.devanag...@gmail.com> wrote: > > Hello, > > > I am new to the contribution team and I would like to

Re: use pig in eclipse

2014-12-19 Thread Rohini Palaniswamy
You don't have to do that. You just need to copy the hdfs-site.xml, mapred-site.xml and yarn-site.xml of the cluster configuration and put that in your eclipse classpath. On Thu, Dec 18, 2014 at 6:09 PM, 李运田 wrote: > > hi all. > I want to use pig in eclipse.my hadoop(yarn) cluster and eclipse are

Re: Welcome our new Pig PMC chair Rohini Palaniswamy

2015-03-23 Thread Rohini Palaniswamy
;> > On Mar 18, 2015, at 7:28 PM, Xuefu Zhang > wrote: > > >> > > > >> > Congratulations, Rohini! > > >> > > > >> > --Xuefu > > >> > > > >> > On Wed, Mar 18, 2015 at 6:48 PM, Cheolsoo Park > > > >>

Re: Using secure HBase from Pig UDF?

2015-06-08 Thread Rohini Palaniswamy
Niels, I plan to have PIG-3038 in next two weeks which should simplify accessing secure hbase, but it will only be in 0.16 and that is at least 3-4 months away. In the meantime, a hacky way to get this done is: When running the pig script from commandline, do ## Makes bin/pig add hbase jars

Re: Pig/Hadoop version mismatch

2015-06-08 Thread Rohini Palaniswamy
Do you have HADOOP_HOME and HADOOP_CONF_DIR exported pointing to your installation of hadoop? On Mon, Jun 8, 2015 at 4:54 AM, Karl Beecher wrote: > Hi, > > I have set up Hadoop on a remote machine and have been trying to get my > local instance of Pig to contact it, largely following the instruc

Re: "order by" and "distinct" in one job?

2015-06-08 Thread Rohini Palaniswamy
If order by and distinct have the same key, it is possible to combine them into one mapreduce job. But the current distributed order by uses range partitioning and same keys can go to different reducers. Tagging along distinct to that will require more work and not something we are planning to do

Re: filter by query result

2015-06-08 Thread Rohini Palaniswamy
http://pig.apache.org/docs/r0.14.0/basic.html#cast-relations On Wed, May 27, 2015 at 8:34 AM, pth001 wrote: > Hi, > > I am new to pig. First I queried a hive table (x = LOAD 'x' USING > org.apache.hive.hcatalog.pig.HCatLoader();) and got a single record/value. > How can I used this single value

Re: "order by" and "distinct" in one job?

2015-06-08 Thread Rohini Palaniswamy
; > and/or > > Q: Can you provide a brief explanation of "range partitioning ... same keys > go to different reducers" ? > > > Michael > > > On Mon, Jun 8, 2015 at 4:08 PM, Rohini Palaniswamy < > rohini.adi...@gmail.com> > wrote: > > > I

Re: Same pig script running slower with Tez as compared with run in Mapred mode

2015-07-07 Thread Rohini Palaniswamy
Sachin, Can you attach your pig script and pig client log as well as I asked earlier? Regards, Rohini On Tue, Jul 7, 2015 at 2:43 AM, Sachin Sabbarwal wrote: > Hi Guys > I'm using Apache Pig version 0.14.0 (r1640057) and 0.5.3 TEZ. > I am running a pig script in following 2 sceniors: > 1.

Re: About pig support for hbase-1.0.1.1

2015-09-02 Thread Rohini Palaniswamy
Please send questions like this to d...@pig.apache.org. It is not possible for us to compile against different versions of hbase and publish different set of jars and installation. We already do that for hadoop 1.x and 2.x and adding more to that mix is a pain and increases the number of combinatio

Re: Flatten keyword in Pig

2015-09-02 Thread Rohini Palaniswamy
You can do FLATTEN + TOTUPLE() UDF On Fri, Aug 28, 2015 at 11:20 AM, Arvind S wrote: > read as a singe string field and use REPLACE .. you will have to use it 4 > times ..one for each of (,),{ & } .. > > *Cheers !!* > Arvind > > On Fri, Aug 28, 2015 at 7:29 PM, Simha G wrote: > > > Hi Aravind,

Re: Hive UDF's vs. "native" Pig UDF's

2015-09-09 Thread Rohini Palaniswamy
Daniel, Not sure you saw this. We will have to document the performance implications of hive udfs. Does the wrapping/unwrapping cause significant overhead to impact performance or is it negligible? Regards, Rohini On Mon, Jul 27, 2015 at 8:50 AM, Eyal Allweil < eyal_allw...@yahoo.com.invalid>

Re: group all Reverse order

2015-09-14 Thread Rohini Palaniswamy
You will have to use a ORDER BY inside nested foreach after the GROUP statement. On Sat, Sep 12, 2015 at 8:49 PM, 李运田 wrote: > when I use group all,I find that the order reverses. is there a setting I > can use not to Reverse order. > data: > (1,a) > (2,b) > (all,{(2,b),(1,a)}) > BUT I want to g

Re: Login with Kerberos keytab ?

2016-01-18 Thread Rohini Palaniswamy
Can't you set up a cron to kinit periodically? If you need pig to do it, it will have to be a new jira. None of the clients (hadoop, pig, hive) do it now. On Wed, Jan 6, 2016 at 6:35 AM, Niels Basjes wrote: > Hi, > > When I run a Pig job on a Kerberos secured cluster it uses the tickets > obta

Re: python UDF invocation or memory problem

2016-01-18 Thread Rohini Palaniswamy
Run it in local mode after doing export PIG_OPTS="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heapdump.hprof" . Then you should be able to look into the heapdump and see where you are leaking memory in your UDF. On Thu, Dec 10, 2015 at 9:23 AM, wrote: > Hi Pig community, > > I am runni

Re: Set configuration properties for a Pig script within the script.

2016-02-23 Thread Rohini Palaniswamy
Pig's HBaseStorage will automatically pick the values from hbase-site.xml if it is in classpath and store to that HBase instance. On Mon, Feb 22, 2016 at 10:42 AM, Parth Sawant wrote: > I'm using Pig Hbase storage. I need to utilize parameters set within the > hbase-site.xml to pass the value of

Welcome to our new Pig PMC member Xuefu Zhang

2016-02-24 Thread Rohini Palaniswamy
It is my pleasure to announce that Xuefu Zhang is our newest addition to the Pig PMC. Xuefu is a long time committer of Pig and has been actively involved in driving the Pig on Spark effort for the past year. Please join me in congratulating Xuefu !!! Regards, Rohini

Welcome our new Pig PMC chair Daniel Dai

2016-03-23 Thread Rohini Palaniswamy
Hi folks, I am very happy to announce that we elected Daniel Dai as our new Pig PMC Chair and it is official now. Please join me in congratulating Daniel. Regards, Rohini

Re: Retrieving Pig script from MR job config

2016-05-27 Thread Rohini Palaniswamy
You can find the pig script in pig.script setting. It is base64 encoded and you will have to decode it. If the script is too long, it will be truncated to 10K lines. Regards, Rohini On Tue, May 10, 2016 at 7:27 AM, Harish Gopalan wrote: > Hi, > > Is it possible to retrieve the original pig scri

Re: should I set a different number of mappers?

2016-05-27 Thread Rohini Palaniswamy
15K mappers on a 4 node system will definitely crash it unless you have tuned yarn (RM, NM) well. That many mappers reading data off few disks in parallel can create disk storm and disk can also turn out to be your bottle neck. Pig creates 1 map per 128MB ( pig.maxCombinedSplitSize default value)

Re: ToDate does not parse the date properly

2016-05-27 Thread Rohini Palaniswamy
http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html You need to use '-MM-dd HH:mm:ss.SSS' instead of '-MM-DD HH:mm:ss.SSS'. DD stands for day of the year and dd stands for day of the month. 11th day of the year can only be in January. So month always comes out as Janu

Re: data discrepancies related to parallelism

2016-06-01 Thread Rohini Palaniswamy
Kurt, Did you find the problem? Regards, Rohini On Thu, May 5, 2016 at 1:41 PM, Kurt Muehlner wrote: > Hello all, > > I posted this issue in the Tez user group earlier today, where it was > suggested I also post it here. We have a Pig/Tez application exhibiting > data discrepancies which oc

Re: should I set a different number of mappers?

2016-06-01 Thread Rohini Palaniswamy
. Here it is: > > Block size 128MB, 300TB of raw data storage (100TB if you account for > replication) and each of the 4 nodes has 384GB RAM > > Does that change your answer? > > Thanks again!! > > On 27 May 2016 at 17:09, Rohini Palaniswamy > wrote: > > 15K map

Re: Retrieving Pig script from MR job config

2016-06-01 Thread Rohini Palaniswamy
ry pig script > that is run on the system i.e YARN ? In other words I would like to find > out recurrent pig job executions provided it is the same code that is > executing. I guess I have to match it by retrieving the Abstract Syntax > tree but not very sure. > > Regards >

Re: HBase-Pig issues

2016-06-01 Thread Rohini Palaniswamy
Can you go to the Resource Manager UI and look at the diagnostics and task logs of job_1464584017709_0003 to see what the actual stacktrace is? Most likely there are some connection issues with either hdfs or hbase and these can retry for a really long time before erroring out. Only that can explai

Re: Schema issue while storing multiple pig outputs using CSVExcelStorage

2016-07-11 Thread Rohini Palaniswamy
Can you try in Pig 0.16? Niels fixed this in https://issues.apache.org/jira/browse/PIG-4689 On Mon, Jul 4, 2016 at 7:05 AM, Eyal Allweil wrote: > I can replicate these results on Pig 0.14. > Did anyone open a Jira issue for this? > > > On Thursday, March 10, 2016 12:24 PM, Sarath Sasidharan

Re: Code works in MR but not in Tez

2016-07-12 Thread Rohini Palaniswamy
Are you sure it worked in MR. You should have got an error like *Scalar has more than one row in the output. 1st : (), 2nd :() (common cause: "JOIN" then "FOREACH ... GENERATE foo.bar" should be "foo::bar" )* cd1.first == cd2.second should be written as cd1::first == cd2::second. Refer h

Re: About Multiple Join in Pig

2016-11-29 Thread Rohini Palaniswamy
You can check if your current jar has the class by running jar -tvf /home/hadoop-user/pig-branch-0.15/lib/datafu-pig-incubating-1.3.1.jar | grep Hasher Did you compile datafu after applying patch from https://issues.apache.org/jira/browse/DATAFU-47 ? Only then the class will be in the jar as that

Re: How to test the efficiency of multiple join

2016-12-07 Thread Rohini Palaniswamy
Limit 4 would make processing of join stop after 4 records. It is not a good idea to add it if you are testing performance of join. On Tue, Dec 6, 2016 at 8:13 PM mingda li wrote: > Thanks for your quick reply. If so, I can use the limit operator to compare > > good and bad join plan. It takes t

Re: [ANNOUNCE] Welcome new Pig Committer - Liyun Zhang

2016-12-20 Thread Rohini Palaniswamy
Congratulations Liyun !!! On Mon, Dec 19, 2016 at 10:25 PM, Jianfeng (Jeff) Zhang < jzh...@hortonworks.com> wrote: > Congratulations Liyun! > > > > Best Regard, > Jeff Zhang > > > > > > On 12/20/16, 11:29 AM, "Pallavi Rao" wrote: > > >Congratulations Liyun! > >

[ANNOUNCE] Welcome new Pig Committer - Adam Szita

2017-05-22 Thread Rohini Palaniswamy
Hi all, It is my pleasure to announce that Adam Szita has been voted in as a committer to Apache Pig. Please join me in congratulating Adam. Adam has been actively contributing to core Pig and Pig on Spark. We appreciate all the work he has done and are looking forward to more contributions fro

Re: Intermittent failures using PigTest

2017-06-09 Thread Rohini Palaniswamy
It would help if you have the stacktrace for the job failure. Regards, Rohini On Fri, May 26, 2017 at 11:54 AM, Eli Levine wrote: > Greetings, Pig community. I am using PigUnit (PigTest.java) in a unit > test in Apache Calcite [1] and have observed intermittent test > failures [2]. Happens some

Re: [ANNOUNCE] Apache Pig 0.17.0 released

2017-06-22 Thread Rohini Palaniswamy
Thanks Adam for being the Release Manager and getting this important release out. Pig on Spark is another milestone that will benefit users looking for improved execution times and migrating out of mapreduce . Regards, Rohini On Wed, Jun 21, 2017 at 2:05 AM, Adam Szita wrote: > The Pig team is

Re: HIVE or PIG - For building DQ framework

2017-07-10 Thread Rohini Palaniswamy
If you are loading data once and performing multiple operations on it, Pig should perform better due to its multiquery optimizations. If the data size is very small there might not be a difference and you can go with what is easy for you to code. I would suggest benchmarking with both Pig and Hive

Re: Any idea how to debug this?

2017-08-09 Thread Rohini Palaniswamy
Are the MR jobs succeeding or failing? Is there anything in the stderr logs of the Oozie launcher? On Sun, Aug 6, 2017 at 4:36 AM, Ronald Green wrote: > Hi! > > I have an HDP 1.3 (old, I know...) cluster that's running Pig 0.14 scripts > through Oozie. > > There's a rare nuisance that's driving

Re: Any idea how to debug this?

2017-08-10 Thread Rohini Palaniswamy
unning with the last message in the log predating MR job completion : > > INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer. > MapReduceLauncher > - Running jobs are [job_xxx] > > > On Wed, Aug 9, 2017 at 9:25 PM, Rohini Palaniswamy < > rohini.adi...@gmail.co

Re: Snappy compression with Pig

2018-05-03 Thread Rohini Palaniswamy
Can you give the full stack trace? On Tue, May 1, 2018 at 6:35 AM, Alex Soto wrote: > Hello, > > I am using Pig 0.17.0 and I am trying to enable Snappy compression for > temporary files. > I installed Snappy on all the Hadoop nodes: > > sudo yum install snappy snappy-devel > ln -

Re: Add patches from command line

2018-05-08 Thread Rohini Palaniswamy
You cannot include a patch from commandline. You need to have compile source with the patch applied and use that new jar. Regards, Rohini On Fri, May 4, 2018 at 10:39 AM, Tad Zhang wrote: > Hi All, > > So I found out the ToDate was not including daylight saving changes. > It is fixed by version

Re: tagFile and tagPath Option for orcStorage

2018-08-10 Thread Rohini Palaniswamy
No. It is available only with PigStorage. You can raise a jira if you think that would be useful. On Fri, Aug 3, 2018 at 1:40 PM, Moiz Arafat wrote: > Hi, > > Is there an option available in OrcStorage similar to PigStorage's tagFile > and tagPath? > > thanks, > Moiz >

Re: How to execute pig script from java

2018-08-10 Thread Rohini Palaniswamy
Pig does not have any server. The client directly launches jobs on the YARN cluster. You can just use the APIs in http://pig.apache.org/docs/r0.17.0/api/org/apache/pig/PigServer.html to execute scripts from your java program. On Sun, Jul 29, 2018 at 8:24 PM, Atul Raut wrote: > How to execute pig

Re: How to execute pig script from java

2018-08-10 Thread Rohini Palaniswamy
, Aug 10, 2018 at 1:36 PM, Rohini Palaniswamy wrote: > Pig does not have any server. The client directly launches jobs on the > YARN cluster. You can just use the APIs in http://pig.apache.org/docs/ > r0.17.0/api/org/apache/pig/PigServer.html to execute scripts from your > java program

Re: Submitting multiple Pig Scripts on the same Session

2019-01-22 Thread Rohini Palaniswamy
If you are using PigServer and submitting programmatically via same jvm, it should automatically reuse the application if the requested AM resources are same. https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezSessionManager.java#L242-L245 On Fri, Ja

Re: Blog post on recent Pig content contributed to Apache DataFu

2019-01-22 Thread Rohini Palaniswamy
Thanks Eyal. dedup() sounds interesting and can find good use in nested foreach for picking latest record. Unfortunate that you had to resort to CountDistinctUpTo because of memory issues. We have run into similar issues as well and have plans to optimize the nested count distinct for handling mill

Re: Avro vs Parquet performance on Pig

2019-02-11 Thread Rohini Palaniswamy
You might need https://issues.apache.org/jira/browse/PIG-4092 On Thu, Feb 7, 2019 at 3:54 PM Russell Jurney wrote: > Sorry if this isn't helpful, but the other obvious thing is to store > intermediate data in Parquet whenever you repeat code/data that can be > shared between jobs. If tests indic

Re: Delete hdfs directory afterpig execution

2019-02-11 Thread Rohini Palaniswamy
> However the fs command throws an error What error do you get? Is it "Could not find schema file" ? > Also is there a guarantee that the fs command will be executed in order ? Yes. Whenever fs commands are encountered, pig executes the statements prior to it, executes the fs command and then e

Re: Re: Submitting multiple Pig Scripts on the same Session

2019-02-14 Thread Rohini Palaniswamy
; input files for faster (less time) processing. Unfortunately, I haven't > seen much gain here on 100 megabytes input files when testing with exectype > tez_local. Furthermore, the pig script on tez_local mode wouldn't find the > input files. I had to prefix file paths with hdfs:/

Re: tracking metrics

2019-11-21 Thread Rohini Palaniswamy
You should look at the job counters and start and end time to get that information. PigStats and PigProgressNotificaitonListener ( https://pig.apache.org/docs/r0.17.0/test.html#pig-statistics) are other ways to get that information if you are invoking pig programmatically. On Mon, Nov 18, 2019 at