Ray,
Looking at the EvalFunc interface, I can not see a way or loophole to do
it. EvalFunc does not have a reference to Job or JobConf object to add
credentials to it. It has getCacheFiles() to add files to DistributedCache,
but no method to add credentials. We should probably add one. The not
:09 PM, Alan Gates wrote:
> You can use the UDFContext to pass information for the UDF in the JobConf
> without writing files.
>
> Alan.
>
> On Sep 25, 2012, at 10:48 AM, Rohini Palaniswamy wrote:
>
> > Ray,
> > Looking at the EvalFunc interface, I can not see
; On Tue, Sep 25, 2012 at 5:09 PM, Alan Gates
> > wrote:
> > >
> > > > You can use the UDFContext to pass information for the UDF in the
> > JobConf
> > > > without writing files.
> > > >
> > > > Alan.
> > > >
>
Hi Pankaj,
Pig depends on jline-0.9.94.jar and is packaged as part of
pig-withouthadoop.jar or pig.jar when you build pig and will be part of an
installation of pig. But the pig jar in maven does not include jline.
Please add jline (http://repo1.maven.org/maven2/jline/jline/0.9.94/) to
your mav
Hadoop has an option to set the user classpath first -
mapreduce.user.classpath.first
Regards,
Rohini
On Wed, Oct 31, 2012 at 11:49 AM, Mohit Anchlia wrote:
> Any suggestions on how one can override the jar files in hadoop path to
> give preference to jars used in "register" command?
>
> On Tue,
t; Thanks,
> Ray
>
> On Fri, Sep 28, 2012 at 1:54 AM, Rohini Palaniswamy <
> rohini.adi...@gmail.com
> > wrote:
>
> > Ray,
> >In the frontend, you can do a new JobConf(HBaseConfiguration.create())
> > and pass that to TableMapReduceUtil.initCredentials(
You can also use a comma separated list of file paths in the load command.
On Sat, Dec 22, 2012 at 6:13 AM, Alan Gates wrote:
> Yes. See http://pig.apache.org/docs/r0.10.0/basic.html#load for a
> discussion of how to use globs in file paths.
>
> Alan.
>
> On Dec 21, 2012, at 10:38 PM, Mohit An
Take a look at http://pig.apache.org/docs/r0.10.0/cont.html#Parameter-Sub-
Specifying Parameters Using the Declare Statement.
You can do this in your case
%declare page_input_path `echo $input_path | sed 's/output/output\/page/g'`
Or you can use embedded python (
http://pig.apache.org/docs/r0.10.
You can also use -Dmapred.cache.archives= to
ship the tar file using distributed cache. Hadoop will take care of
untarring the file and putting it in the current directory if the extension
is one of .zip, .tar, .tgz or .tar.gz. This is a feature of
hadoop's distributed cache.
Regards,
Rohini
On
java.io.tmpdir should work. If you are running python script, more space is
required as the jars will cached in the java.io.tmp dir. Alternatively, you
can try a different location for it by using -Dpython.cachedir=
or you can skip the caching using -Dpython.cachedir.skip=true. My guess is
that /f
Jon,
Those are good areas to check. Few things I have seen regarding those are
1) JythonScriptEngine -PythonInterpreter is static and is not suitable for
multiple runs if the script names are same (hit this issue in PIG-2433 unit
tests).
2) QueryParserDriver - There is a static cache with macr
You should be fine using tmpfiles and that's the way to do it.
Else you will have to copy the file to hdfs, and call the
DistributedCache.addFileToClassPath yourself (basically what tmpfiles
setting is doing). But the problem there as you mentioned is cleaning up
the hdfs file after the job compl
The number of maps depends on the number of input splits. mapred.map.tasks
is just a hint and needs to be honored by the InputFormat. With pig, you
can try pig.maxCombinedSplitSize configuration to control the number of
maps based on input size. For eg: 1G split size can be specified
as Dpig.maxCo
r month or so we probably update our CDH4 to whatever is there.
> > Will it still work? Will it be safe for the cluster or for my job? Who
> > knows what will be implemented there?
> >
> > You see, I can understand the code, find such a solution, but I won't be
> >
; corresponding authentification and cannot access the file, which has been
> written with another user.
>
> Any ideas of what to try?
>
> On Sun, Feb 17, 2013 at 8:22 AM, Rohini Palaniswamy <
> rohini.adi...@gmail.com
> > wrote:
>
> > Hi Eugene,
> >
We should make PIG_HOME configurable. Can you create a jira and upload a
patch?
Thanks,
Rohini
On Sat, Mar 2, 2013 at 6:14 PM, Robert wrote:
> It looks like the pig shell script in v0.11 exports PIG_HOME without first
> checking to see if it already exists.
>
> from line 78 in /bin/pig:
> # th
Hi Praveen,
Are you running a secure cluster - secure hadoop and hbase? Can you
check what is the stacktrace on the pig launcher job log of Hadoop Oozie?
Regards,
Rohini
On Thu, Mar 14, 2013 at 2:28 AM, Praveen Bysani wrote:
> Hi,
>
> I am trying to run a simple pig script that uses HbaseSto
Jeff,
1) It should not. If it does push, then it is a bug in pig.
2) I think it should be fine.
3) Look at PColFilterExtractor and PartitionFilterOptimizer
Regards,
Rohini
On Thu, Mar 14, 2013 at 1:31 PM, Jeff Yuan wrote:
> I am writing a loader for a storage format, which partitions by a
there a way to get a reference
> to the logical query plan?
>
> Thanks again.
>
> On Thu, Mar 14, 2013 at 1:51 PM, Rohini Palaniswamy
> wrote:
> > Jeff,
> >
> > 1) It should not. If it does push, then it is a bug in pig.
> >
> > 2) I thin
hadoop.rpc.protection
> authentication
>
>
> hadoop.security.auth_to_local
> DEFAULT
>
>
> So i guess i may not be using a secure hadoop/hbase. I am not sure what you
> meant by the log of pig launcher job of hadoop oozie. Do you mean the log
> in Job Tracker for this job
n%3Ahbasetable.pig
> does not exist
> at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:468)
>
>
> But when i checked the logs manually at
> /var/log/hadoop-mapreduce.0.2/userlogs/ for a similar job, the
> stderr and stdout are empty and syslog has no exception/errors.
Nice summarization Koji. Wish we had some object that has byte[] and length
instead of byte[] as the return type of serialize() and method param of
deserialize(). That would enable reuse and cut down on some of the copy.
At least there is one copy we can cut down without any API changes by
having
Not sure what you are exactly trying to capture, but one workaround I can
think of is writing your own log4j appender and capturing the log
information.
-Rohini
On Thu, Mar 21, 2013 at 10:13 AM, Cheolsoo Park wrote:
> Hi Jeff,
>
> You're right that those methods in HJob.java throw a
> Unsuppor
Congrats Prashant !!
On Thu, May 2, 2013 at 3:58 PM, Ashutosh Chauhan wrote:
> Congrats Prashant. Hopefully your contributions to Pig will keep flowing in
> :)
>
> Ashutosh
>
>
> On Thu, May 2, 2013 at 3:41 PM, Mike Sukmanowsky wrote:
>
> > Congrats!
> >
> >
> > On Thu, May 2, 2013 at 3:56 PM,
I think we should fix it in pig if it is a regression from pig 0.10.
Shubam,
If the script works fine for you in pig 0.10, can you open a jira for
the issue with 0.11 ?
Regards,
Rohini
On Fri, Sep 6, 2013 at 1:51 PM, Bill Graham wrote:
> The getSignature method basically generates a string
It hits this error when json-simple-1.1.jar is not in classpath. You can
get around that by adding it to PIG_CLASSPATH apart from registering the
jar. The problem is with java classloading where it fails to load the
exception class(ParseException) thrown by a constructor of the class(
AvroStorage
PM, j.barrett Strausser <
j.barrett.straus...@gmail.com> wrote:
> I ended up just using the .23.9 hadoop release without any issue.
>
>
>
>
> On Mon, Sep 30, 2013 at 8:54 PM, Rohini Palaniswamy <
> rohini.adi...@gmail.com
> > wrote:
>
> > It hits
Can you try with Hadoop 0.23.8 or 0.23.9?
-Rohini
On Mon, Dec 2, 2013 at 11:26 AM, Uttam Kumar wrote:
> Hi All,
>
> I am trying to run PIG 12 with Hadoop 0.23.1 and getting following error
> msg, Can someone please help and suggest what I am missing. I can run PIG
> in local mode without any
Please join us for the Pig User Group Meetup this quarter at LinkedIn on
Fri Mar 14. We have some interesting talks lined up on the recent
developments in Pig.
RSVP at http://www.meetup.com/PigUser/events/160604192/
Tentative lineup for this meetup:
Pig on Tez
Pig on Storm
Intel Graph Builder
Pig
Congrats Aniket!
On Wed, Jan 15, 2014 at 10:12 AM, Mona Chitnis wrote:
> Congrats Aniket! Good work!
>
> --
>
> Mona Chitnis
> Software Engineer, Hadoop Team
> Yahoo!
>
>
>
> On Wednesday, January 15, 2014 9:17 AM, Xuefu Zhang
> wrote:
>
> Congratulations, Aniket!
>
> --Xuefu
>
>
>
> On Tue, J
Thanks Julien. Great job last year.
Congratulations, Cheolsoo!!! Well deserved. Great job past 2 years with
awesome number of commits and reviews.
On Thu, Mar 20, 2014 at 2:07 AM, Lorand Bendig wrote:
> Congratulations, Cheolsoo!
>
> --Lorand
>
>
> On 03/20/2014 02:03 AM, Julien Le Dem wrote:
This looks like a bug. Can you please file a jira with steps to reproduce?
On Fri, Apr 18, 2014 at 2:45 PM, Alex Rasmussen wrote:
> I'm using PigStorage(',') for all stores.
>
> I agree about the expensiveness of CROSS, but I'm still kind of confused as
> to why it would lose records in this cas
Congratulations Lorand !!!
On Sun, Jun 22, 2014 at 2:47 PM, Xuefu Zhang wrote:
> Many congrats, Lorand!
>
> --Xuefu
>
>
> On Sun, Jun 22, 2014 at 12:54 PM, Daniel Dai
> wrote:
>
> > Congratulations!
> >
> > On Sun, Jun 22, 2014 at 7:00 AM, Jarek Jarcec Cecho
> > wrote:
> > > Congratulations L
Thanks Daniel and Cheolsoo for wrapping up all the issues and making this
release possible.
On Fri, Jul 4, 2014 at 10:46 PM, Cheolsoo Park wrote:
> Thank you Daniel for all your hard work! 0.13 is a very important release
> with many new features.
>
>
> On Fri, Jul 4, 2014 at 10:24 PM, Daniel D
Oops. Missed Aniket :)
On Fri, Jul 4, 2014 at 11:03 PM, Rohini Palaniswamy wrote:
> Thanks Daniel and Cheolsoo for wrapping up all the issues and making this
> release possible.
>
>
>
> On Fri, Jul 4, 2014 at 10:46 PM, Cheolsoo Park
> wrote:
>
>> Thank you Danie
forward. I would like to especially call out and thank Achal Soni and
Mark Wagner who did major refactoring of the Pig code to support multiple
execution engines.
Cheolsoo Park
Daniel Dai
Aniket Mokashi
Rohini Palaniswamy
Lorand Bendig
Philip (flip) Kromer
Jarek Jarcec Cecho
Nezih Yigitbasi
Prashant
Lorand,
Isn't fetch optimization supposed to be only for DUMP and not STORE ?
-Rohini
On Tue, Oct 14, 2014 at 6:47 PM, lulynn_2008 wrote:
> Hi Lorand,
> The query run fine is I disable fetch. Thanks for your help. Could you
> tell why we need to disable fetch?
> BTW, I was using pig-0.13.0 a
Thanks Daniel for the hard work, wrapping up so many loose ends and driving
it to a very stable release.
Thanks to all the other contributors as well without whom the release would
not have been possible. There are lot of new contributors in this release
and it is very nice to see the Pig communit
Welcome Karuna. You can find the required information in
https://cwiki.apache.org/confluence/display/PIG/HowToContribute.
Regards,
Rohini
On Fri, Dec 12, 2014 at 2:31 AM, Karuna Devanagavi <
karuna.devanag...@gmail.com> wrote:
>
> Hello,
>
>
> I am new to the contribution team and I would like to
You don't have to do that. You just need to copy the hdfs-site.xml,
mapred-site.xml and yarn-site.xml of the cluster configuration and put that
in your eclipse classpath.
On Thu, Dec 18, 2014 at 6:09 PM, 李运田 wrote:
>
> hi all.
> I want to use pig in eclipse.my hadoop(yarn) cluster and eclipse are
;> > On Mar 18, 2015, at 7:28 PM, Xuefu Zhang
> wrote:
> > >> >
> > >> > Congratulations, Rohini!
> > >> >
> > >> > --Xuefu
> > >> >
> > >> > On Wed, Mar 18, 2015 at 6:48 PM, Cheolsoo Park >
> > >>
Niels,
I plan to have PIG-3038 in next two weeks which should simplify
accessing secure hbase, but it will only be in 0.16 and that is at least
3-4 months away.
In the meantime, a hacky way to get this done is:
When running the pig script from commandline, do
## Makes bin/pig add hbase jars
Do you have HADOOP_HOME and HADOOP_CONF_DIR exported pointing to your
installation of hadoop?
On Mon, Jun 8, 2015 at 4:54 AM, Karl Beecher wrote:
> Hi,
>
> I have set up Hadoop on a remote machine and have been trying to get my
> local instance of Pig to contact it, largely following the instruc
If order by and distinct have the same key, it is possible to combine them
into one mapreduce job. But the current distributed order by uses range
partitioning and same keys can go to different reducers. Tagging along
distinct to that will require more work and not something we are planning
to do
http://pig.apache.org/docs/r0.14.0/basic.html#cast-relations
On Wed, May 27, 2015 at 8:34 AM, pth001 wrote:
> Hi,
>
> I am new to pig. First I queried a hive table (x = LOAD 'x' USING
> org.apache.hive.hcatalog.pig.HCatLoader();) and got a single record/value.
> How can I used this single value
;
> and/or
>
> Q: Can you provide a brief explanation of "range partitioning ... same keys
> go to different reducers" ?
>
>
> Michael
>
>
> On Mon, Jun 8, 2015 at 4:08 PM, Rohini Palaniswamy <
> rohini.adi...@gmail.com>
> wrote:
>
> > I
Sachin,
Can you attach your pig script and pig client log as well as I asked
earlier?
Regards,
Rohini
On Tue, Jul 7, 2015 at 2:43 AM, Sachin Sabbarwal
wrote:
> Hi Guys
> I'm using Apache Pig version 0.14.0 (r1640057) and 0.5.3 TEZ.
> I am running a pig script in following 2 sceniors:
> 1.
Please send questions like this to d...@pig.apache.org. It is not possible
for us to compile against different versions of hbase and publish different
set of jars and installation. We already do that for hadoop 1.x and 2.x and
adding more to that mix is a pain and increases the number of combinatio
You can do FLATTEN + TOTUPLE() UDF
On Fri, Aug 28, 2015 at 11:20 AM, Arvind S wrote:
> read as a singe string field and use REPLACE .. you will have to use it 4
> times ..one for each of (,),{ & } ..
>
> *Cheers !!*
> Arvind
>
> On Fri, Aug 28, 2015 at 7:29 PM, Simha G wrote:
>
> > Hi Aravind,
Daniel,
Not sure you saw this. We will have to document the performance
implications of hive udfs. Does the wrapping/unwrapping cause significant
overhead to impact performance or is it negligible?
Regards,
Rohini
On Mon, Jul 27, 2015 at 8:50 AM, Eyal Allweil <
eyal_allw...@yahoo.com.invalid>
You will have to use a ORDER BY inside nested foreach after the GROUP
statement.
On Sat, Sep 12, 2015 at 8:49 PM, 李运田 wrote:
> when I use group all,I find that the order reverses. is there a setting I
> can use not to Reverse order.
> data:
> (1,a)
> (2,b)
> (all,{(2,b),(1,a)})
> BUT I want to g
Can't you set up a cron to kinit periodically? If you need pig to do it,
it will have to be a new jira. None of the clients (hadoop, pig, hive) do
it now.
On Wed, Jan 6, 2016 at 6:35 AM, Niels Basjes wrote:
> Hi,
>
> When I run a Pig job on a Kerberos secured cluster it uses the tickets
> obta
Run it in local mode after doing export
PIG_OPTS="-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/tmp/heapdump.hprof" . Then you should be able to look
into the heapdump and see where you are leaking memory in your UDF.
On Thu, Dec 10, 2015 at 9:23 AM, wrote:
> Hi Pig community,
>
> I am runni
Pig's HBaseStorage will automatically pick the values from hbase-site.xml
if it is in classpath and store to that HBase instance.
On Mon, Feb 22, 2016 at 10:42 AM, Parth Sawant
wrote:
> I'm using Pig Hbase storage. I need to utilize parameters set within the
> hbase-site.xml to pass the value of
It is my pleasure to announce that Xuefu Zhang is our newest addition to
the Pig PMC. Xuefu is a long time committer of Pig and has been actively
involved in driving the Pig on Spark effort for the past year.
Please join me in congratulating Xuefu !!!
Regards,
Rohini
Hi folks,
I am very happy to announce that we elected Daniel Dai as our new Pig
PMC Chair and it is official now. Please join me in congratulating Daniel.
Regards,
Rohini
You can find the pig script in pig.script setting. It is base64 encoded and
you will have to decode it. If the script is too long, it will be truncated
to 10K lines.
Regards,
Rohini
On Tue, May 10, 2016 at 7:27 AM, Harish Gopalan
wrote:
> Hi,
>
> Is it possible to retrieve the original pig scri
15K mappers on a 4 node system will definitely crash it unless you have
tuned yarn (RM, NM) well. That many mappers reading data off few disks in
parallel can create disk storm and disk can also turn out to be your bottle
neck. Pig creates 1 map per 128MB ( pig.maxCombinedSplitSize default
value)
http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html
You need to use '-MM-dd HH:mm:ss.SSS' instead of '-MM-DD
HH:mm:ss.SSS'. DD stands for day of the year and dd stands for day of the
month. 11th day of the year can only be in January. So month always comes
out as Janu
Kurt,
Did you find the problem?
Regards,
Rohini
On Thu, May 5, 2016 at 1:41 PM, Kurt Muehlner
wrote:
> Hello all,
>
> I posted this issue in the Tez user group earlier today, where it was
> suggested I also post it here. We have a Pig/Tez application exhibiting
> data discrepancies which oc
. Here it is:
>
> Block size 128MB, 300TB of raw data storage (100TB if you account for
> replication) and each of the 4 nodes has 384GB RAM
>
> Does that change your answer?
>
> Thanks again!!
>
> On 27 May 2016 at 17:09, Rohini Palaniswamy
> wrote:
> > 15K map
ry pig script
> that is run on the system i.e YARN ? In other words I would like to find
> out recurrent pig job executions provided it is the same code that is
> executing. I guess I have to match it by retrieving the Abstract Syntax
> tree but not very sure.
>
> Regards
>
Can you go to the Resource Manager UI and look at the diagnostics and task
logs of job_1464584017709_0003 to see what the actual stacktrace is? Most
likely there are some connection issues with either hdfs or hbase and these
can retry for a really long time before erroring out. Only that can explai
Can you try in Pig 0.16? Niels fixed this in
https://issues.apache.org/jira/browse/PIG-4689
On Mon, Jul 4, 2016 at 7:05 AM, Eyal Allweil wrote:
> I can replicate these results on Pig 0.14.
> Did anyone open a Jira issue for this?
>
>
> On Thursday, March 10, 2016 12:24 PM, Sarath Sasidharan
Are you sure it worked in MR. You should have got an error like
*Scalar has more than one row in the output. 1st : (), 2nd :()
(common cause: "JOIN" then "FOREACH ... GENERATE foo.bar" should be
"foo::bar" )*
cd1.first == cd2.second should be written as cd1::first == cd2::second.
Refer h
You can check if your current jar has the class by running
jar -tvf /home/hadoop-user/pig-branch-0.15/lib/datafu-pig-incubating-1.3.1.jar
| grep Hasher
Did you compile datafu after applying patch from
https://issues.apache.org/jira/browse/DATAFU-47 ? Only then the class will
be in the jar as that
Limit 4 would make processing of join stop after 4 records. It is not a
good idea to add it if you are testing performance of join.
On Tue, Dec 6, 2016 at 8:13 PM mingda li wrote:
> Thanks for your quick reply. If so, I can use the limit operator to compare
>
> good and bad join plan. It takes t
Congratulations Liyun !!!
On Mon, Dec 19, 2016 at 10:25 PM, Jianfeng (Jeff) Zhang <
jzh...@hortonworks.com> wrote:
> Congratulations Liyun!
>
>
>
> Best Regard,
> Jeff Zhang
>
>
>
>
>
> On 12/20/16, 11:29 AM, "Pallavi Rao" wrote:
>
> >Congratulations Liyun!
>
>
Hi all,
It is my pleasure to announce that Adam Szita has been voted in as a
committer to Apache Pig. Please join me in congratulating Adam. Adam has
been actively contributing to core Pig and Pig on Spark. We appreciate all
the work he has done and are looking forward to more contributions fro
It would help if you have the stacktrace for the job failure.
Regards,
Rohini
On Fri, May 26, 2017 at 11:54 AM, Eli Levine wrote:
> Greetings, Pig community. I am using PigUnit (PigTest.java) in a unit
> test in Apache Calcite [1] and have observed intermittent test
> failures [2]. Happens some
Thanks Adam for being the Release Manager and getting this important
release out. Pig on Spark is another milestone that will benefit users
looking for improved execution times and migrating out of mapreduce .
Regards,
Rohini
On Wed, Jun 21, 2017 at 2:05 AM, Adam Szita wrote:
> The Pig team is
If you are loading data once and performing multiple operations on it, Pig
should perform better due to its multiquery optimizations. If the data size
is very small there might not be a difference and you can go with what is
easy for you to code. I would suggest benchmarking with both Pig and Hive
Are the MR jobs succeeding or failing? Is there anything in the stderr logs
of the Oozie launcher?
On Sun, Aug 6, 2017 at 4:36 AM, Ronald Green wrote:
> Hi!
>
> I have an HDP 1.3 (old, I know...) cluster that's running Pig 0.14 scripts
> through Oozie.
>
> There's a rare nuisance that's driving
unning with the last message in the log predating MR job completion :
>
> INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
> MapReduceLauncher
> - Running jobs are [job_xxx]
>
>
> On Wed, Aug 9, 2017 at 9:25 PM, Rohini Palaniswamy <
> rohini.adi...@gmail.co
Can you give the full stack trace?
On Tue, May 1, 2018 at 6:35 AM, Alex Soto wrote:
> Hello,
>
> I am using Pig 0.17.0 and I am trying to enable Snappy compression for
> temporary files.
> I installed Snappy on all the Hadoop nodes:
>
> sudo yum install snappy snappy-devel
> ln -
You cannot include a patch from commandline. You need to have compile
source with the patch applied and use that new jar.
Regards,
Rohini
On Fri, May 4, 2018 at 10:39 AM, Tad Zhang wrote:
> Hi All,
>
> So I found out the ToDate was not including daylight saving changes.
> It is fixed by version
No. It is available only with PigStorage. You can raise a jira if you think
that would be useful.
On Fri, Aug 3, 2018 at 1:40 PM, Moiz Arafat
wrote:
> Hi,
>
> Is there an option available in OrcStorage similar to PigStorage's tagFile
> and tagPath?
>
> thanks,
> Moiz
>
Pig does not have any server. The client directly launches jobs on the YARN
cluster. You can just use the APIs in
http://pig.apache.org/docs/r0.17.0/api/org/apache/pig/PigServer.html to
execute scripts from your java program.
On Sun, Jul 29, 2018 at 8:24 PM, Atul Raut wrote:
> How to execute pig
, Aug 10, 2018 at 1:36 PM, Rohini Palaniswamy
wrote:
> Pig does not have any server. The client directly launches jobs on the
> YARN cluster. You can just use the APIs in http://pig.apache.org/docs/
> r0.17.0/api/org/apache/pig/PigServer.html to execute scripts from your
> java program
If you are using PigServer and submitting programmatically via same jvm, it
should automatically reuse the application if the requested AM resources
are same.
https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezSessionManager.java#L242-L245
On Fri, Ja
Thanks Eyal. dedup() sounds interesting and can find good use in nested
foreach for picking latest record. Unfortunate that you had to resort to
CountDistinctUpTo because of memory issues. We have run into similar issues
as well and have plans to optimize the nested count distinct for handling
mill
You might need https://issues.apache.org/jira/browse/PIG-4092
On Thu, Feb 7, 2019 at 3:54 PM Russell Jurney
wrote:
> Sorry if this isn't helpful, but the other obvious thing is to store
> intermediate data in Parquet whenever you repeat code/data that can be
> shared between jobs. If tests indic
> However the fs command throws an error
What error do you get? Is it "Could not find schema file" ?
> Also is there a guarantee that the fs command will be executed in order ?
Yes. Whenever fs commands are encountered, pig executes the statements
prior to it, executes the fs command and then e
; input files for faster (less time) processing. Unfortunately, I haven't
> seen much gain here on 100 megabytes input files when testing with exectype
> tez_local. Furthermore, the pig script on tez_local mode wouldn't find the
> input files. I had to prefix file paths with hdfs:/
You should look at the job counters and start and end time to get that
information. PigStats and PigProgressNotificaitonListener (
https://pig.apache.org/docs/r0.17.0/test.html#pig-statistics) are other
ways to get that information if you are invoking pig programmatically.
On Mon, Nov 18, 2019 at
85 matches
Mail list logo