pig scripts to java application. the scripts
> have been run in command line.
> I'm wondering how to do this? I just want to run pig file(*.pig) in java
> code.
>
> Any advice would be appreciated.
>
> Thanks,
>
> - Youngwoo
>
--
Best Regards
Jeff Zhang
is implemented a s preprocessor
> on the script.
>
> Olga
>
> -Original Message-
> From: rakesh kothari [mailto:rkothari_...@hotmail.com]
> Sent: Thursday, October 07, 2010 11:47 AM
> To: pig-u...@hadoop.apache.org
> Subject: Passing parameters to Pig Script using Java
>
>
> Hi,
>
> I have a pig script that needs certain parameters (passed using "-p" in pig
> shell) to execute. Is there a way to pass these parameters if I want to
> execute this script using "PigServer" after registering the script using
> PigServer.registerScript() ?
>
> Thanks,
> -Rakesh
>
>
>
--
Best Regards
Jeff Zhang
the output using Hadoop
FileSystem API, and then using org.apache.pig.data.DataReaderWriter to
read the output line by line.
On Fri, Oct 8, 2010 at 3:03 AM, Corbin Hoenes wrote:
> anyone ever read a pig output file with bags/tuples into a java map reduce
> program?
--
Best Regards
of RAM.
>>>> - Intel Quad with 4GB of RAM.
>>>>
>>>> Well I was aware that hadoop has overhead and that it won't be done in
>>>> half
>>>> an hour (time in local divided by number of nodes). But I was surprised
>>>> to
>>>> see this morning it took 7 hours to complete!!!
>>>>
>>>> My configuration was made according to this link:
>>>>
>>>>
>>>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
>>>>
>>>> My question is simple: Is it normal?
>>>>
>>>> Cheers
>>>>
>>>>
>>>> Vincent
>>>>
>>>>
>>
>
>
--
Best Regards
Jeff Zhang
hen mapred.job.tracker is "local".
>
>
>
> Not clear for me what is the reduce capacity of my cluster :)
>
> On 10/08/2010 01:00 PM, Jeff Zhang wrote:
>>
>> I guess maybe your reduce number is 1 which cause the reduce phase very
>> slowly.
>>
>
BTW, you can look at the job tracker web ui to see which part of the
job cost the most of the time
On Fri, Oct 8, 2010 at 5:11 PM, Jeff Zhang wrote:
> No I mean whether your mapreduce job's reduce task number is 1.
>
> And could you share your pig script, then others can rea
r_04
> <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_04>
> 0.00%
>
>
> 8-Oct-2010 14:18:11
>
>
>
> Error: GC overhead limit exceeded
>
>
> 0
> <http://prog7.lan:50030/ta
Vincent,
Just want to remind you that you need to restart your cluster after
the reconfiguration.
On Fri, Oct 8, 2010 at 7:04 PM, Jeff Zhang wrote:
> Try to increase the heap size on of task by setting
> mapred.child.java.opts in mapred-site.xml. The default value is
> -Xmx200m
Xmx1536m but I'm afraid that my nodes will start to
> swap memory...
>
> Should I continue in this direction? Or it's already to much and I should
> search the problem somewhere else?
>
> Thanks
>
> -Vincent
>
>
> On 10/08/2010 03:04 PM, Jeff Zhan
181]
> 2010-10-13 14:58:44,191 [Thread-4-SendThread] INFO
> org.apache.zookeeper.ClientCnxn - Server connection successful
>
> and stays there. Has anyone tried running it against hbase 0.89 or is 0.20.6
> the only last supported version?
>
> -GS
>
--
Best Regards
Jeff Zhang
BTW, I guess you do not do the second thing, it seems it always try to
connect the localhost's zookeeper
On Thu, Oct 14, 2010 at 8:20 AM, Jeff Zhang wrote:
> Hi George
>
> Here's three things should been taken care when you use HBaseStorage()
>
>
> 1. Register
t; 2010-10-13 14:58:44,182 [Thread-4-SendThread] INFO
>>> > org.apache.zookeeper.ClientCnxn - Attempting connection to server
>>> > localhost/127.0.0.1:2181
>>> > 2010-10-13 14:58:44,188 [Thread-4-SendThread] INFO
>>> > org.apache.zookeeper.ClientCnxn - Priming connection to
>>> > java.nio.channels.SocketChannel[connected
>>> > local=/127.0.0.1:54359remote=localhost/
>>> > 127.0.0.1:2181]
>>> > 2010-10-13 14:58:44,191 [Thread-4-SendThread] INFO
>>> > org.apache.zookeeper.ClientCnxn - Server connection successful
>>> >
>>> > and stays there. Has anyone tried running it against hbase 0.89 or is
>>> > 0.20.6
>>> > the only last supported version?
>>> >
>>> > -GS
>>> >
>>>
>>
>>
>
--
Best Regards
Jeff Zhang
t; at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>> > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> > > at
>> > >
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>> > >
>> >
>> > The script is pretty simple right now:
>> >
>> > rows = LOAD 'cassandra://localhost:9160/...' USING CassandraIndexReader()
>> > as
>> > > (col1, col2, col3);
>> > > dump rows;
>> > > grouped = GROUP rows BY col1;
>> > > dump grouped;
>> > >
>> >
>> > The first dump works fine,while the second just dies with the above
>> error.
>> > Strangely when I store it on disc and then load it with PigStorage()
>> again
>> > it just works as expected.
>> >
>> > Am I doing something wrong with my Custom Loader?
>> >
>> > Regards,
>> > Chris
>> >
>>
>
--
Best Regards
Jeff Zhang
casts from byte array are not supported for this
> loader.
>
> * construction
>
> * @throws IOException if there is an exception during LoadCaster
>
> */
>
> public LoadCaster getLoadCaster() throws IOException {
>
> return new Utf8StorageConverter
apred.JobClient.submitJob(JobClient.java:742)
> at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
> at
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> at java.lang.Thread.run(Thread.java:619)
>
>
>
> Please, can someone help me??
>
> Ruth
>
>
--
Best Regards
Jeff Zhang
utFormat.java:55)
> at org.apac
> he.hadoo
> p.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:258)
> ... 7 more3092 [main] ERROR
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Failed to produce result in: "file:///output"3092 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Failed!
--
Best Regards
Jeff Zhang
ckend.hadoop.hbase.HBaseStorage('content:field1
> anchor:field1a anchor:field2a') as (content_field1, anchor_field1a,
> anchor_field2a);
>
> dump raw;
>
> ---
> what else am I missing?
--
Best Regards
Jeff Zhang
where my hbase/conf/hbase-site.xml file is? Not
> sure how would this get passed to the HBaseStorage class?
>
> On Nov 19, 2010, at 5:09 PM, Jeff Zhang wrote:
>
>> Does the mapreduce job start ? Could you check the logs on hadoop side ?
>>
ppreciate any kinds of comments.
>
> doc:
> http://snaprojects.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data
>
> jira:
> https://issues.apache.org/jira/browse/PIG-1748
>
> Many thanks,
> Lin
>
--
Best Regards
Jeff Zhang
te the regular expression?
>
> I tried the following:
>
> A = FILTER B BY (name matches 'abc\|.*');
>
> but it does not work. I cannot use 'abc|.*' because it will match anything.
>
> Any ideas are appreciated.
>
> Thanks,
>
> Zhen
>
--
Best Regards
Jeff Zhang
deas? Is this even possible with piglatin?
>
>
>
--
Best Regards
Jeff Zhang
ic alias dump, from all
>>> the other stuff being logged, to be able to trigger further process.
>>>
>>> STREAM THROUGH seems to be one way to trigger a process, it's
>>> just that it seems not suitable for the kind of process we are looking at,
>>> because the gets run in hadoop cluster.
>>>
>>> any thought?
>>>
>>> J
>>
>
>
--
Best Regards
Jeff Zhang
http://homepages.dcc.ufmg.br/~charles/
> UFMG - ICEx - Dcc
> Cel.: 55 31 87741485
> Tel.: 55 31 34741485
> Lab.: 55 31 34095840
>
--
Best Regards
Jeff Zhang
.apache.org/pig/owl
>
> Thanks
> Alex
>
--
Best Regards
Jeff Zhang
gt;grunt> dump raw;
> >..
> >Input(s):
> >Successfully read 4 records (825 bytes) from:
> >"hdfs://localhost:9000/user/aholmes/test.v1.avro"
> >
> >Output(s):
> >Successfully stored 4 records (46 bytes) in:
> >"hdfs://localhost:9000/tmp/temp2039109003/tmp1924774585"
> >
> >Counters:
> >Total records written : 4
> >Total bytes written : 46
> >..
> >(r1,1)
> >(r2,2)
> >(r1,1)
> >(r2,2)
> >
> >I'm sure I'm doing something wrong (again)!
> >
> >Many thanks,
> >Alex
>
>
>
--
Best Regards
Jeff Zhang
$0;");
>pigServer.store("tmp_table_limit", "/user/hadoop/shi.txt");
> I always get error:
> 14/12/30 17:28:33 WARN hadoop20.PigJobControl: falling back to default
> JobControl (not using hadoop 0.20 ?)
> java.lang.NoSuchFieldException: runnerState
> at java.lang.Class.getDeclaredField(Class.java:1948)
> at
> org.apache.pig.backend.hadoop20.PigJobControl.(PigJobControl.java:51)
>
>
>
>
>
>
>
> help!!
--
Best Regards
Jeff Zhang
>
> >> >> Hi all,
> >> >>
> >> >> Now it's official that Rohini Palaniswamy is our new Pig PMC chair.
> >> Please
> >> >> join me in congratulating Rohini for her new role. Congrats!
> >> >>
> >> >> Thanks!
> >> >> Cheolsoo
> >> >>
> >>
> >>
>
>
--
Best Regards
Jeff Zhang
I try to use pig-0.14 as my dependency in pom file. But it looks like
there's no tez engine in neither pig-0.14.jar or pig-0.14-h2.jar ? Is that
missed or it is intentionally ? Thanks
--
Best Regards
Jeff Zhang
I hit the issue as the following link, seems restarting job history server
can fix the issue. But I am just confused why pig would depend on job
history server. Anyone know that ? Thanks
--
Best Regards
Jeff Zhang
The issue I mentioned
http://stackoverflow.com/questions/29784532/pig-keeps-trying-to-connect-to-job-history-server-and-fails
On Tue, Sep 27, 2016 at 8:38 PM, Jeff Zhang wrote:
>
>
> I hit the issue as the following link, seems restarting job history server
> can fix the issue. B
Can I cancel job if I launch it using PigRunner ? Thanks
--
Best Regards
Jeff Zhang
Hi Folks,
Zeppelin just release 0.7 today which is the first release to support pig.
User can do all the things in zeppelin as you do in grunt shell. Besides,
you can take advantage of zeppelin's visualization feature to visualize the
pig output. Here's one article I wrote for the details. I would
Congratulations, Adam!
Ke, Xianda 于2017年5月23日周二 上午10:22写道:
> Congrats, Adam!
>
> Regards,
> Xianda
>
> -Original Message-
> From: Zhang, Liyun [mailto:liyun.zh...@intel.com]
> Sent: Tuesday, May 23, 2017 9:11 AM
> To: d...@pig.apache.org
> Cc: user@pig.apache.org
> Subject: RE: [ANNOUNCE
Nice job Eyal, it¹s very helpful for the community
Best Regard,
Jeff Zhang
On 3/31/15, 5:19 AM, "Eyal Allweil" wrote:
>Hi all,
>
>I'm glad to announce a new version of Pig-Eclipse. We've been using it
>for a few weeks where I work (PayPal) and I thin
You can use pig¹s java API to debug your script in eclipse.
Here¹s one simple example.
public static void main(String[] args) throws IOException {
PigServer pig = new PigServer(ExecType.LOCAL);
pig.registerScript("myscript.pig");
}
Best Regard,
Jeff Zhang
What¹s your purpose for that ?
Best Regard,
Jeff Zhang
On 7/11/15, 9:29 AM, "Andy Srine" wrote:
>Folks,
>
>Is there an easy way to store the output of the FS commands to a variable
>in a pig script? Either native pig or even a solution using embedded
>Python
>will help.
>
>Thanks,
>Andy
> Application application_1436453941326_0020 failed 2 times due to AM
> Container for appattempt_1436453941326_0020_02 exited with exitCode:
>1
Could you check the yarn app log ?
Best Regard,
Jeff Zhang
On 7/10/15, 5:38 PM, "Divya Gehlot" wrote:
>Hi
>
Is com.pig.udf.PigUDF on your classpath ?
Best Regard,
Jeff Zhang
From: Divya Gehlot mailto:divya.htco...@gmail.com>>
Date: Saturday, July 11, 2015 at 12:16 AM
To: "user@pig.apache.org<mailto:user@pig.apache.org>"
mailto:user@pig.apache.org>>, Jianfeng Zhang
m
This document should be helpful for you
https://wiki.apache.org/pig/PigSkewedJoinSpec
Best Regard,
Jeff Zhang
On 7/14/15, 4:56 AM, "Gagan Juneja" wrote:
>Hi Team,
>
>We are using Pig intensively in our various projects. We are doing
>optimizations for that we w
It should be classpath issue. Did you set the PIG_HOME ? Maybe you still
point PIG_HOME to pig version 0.13
Best Regard,
Jeff Zhang
On 7/14/15, 12:00 PM, "Antoine Lafleur" wrote:
>Evening,
>
> Sorry to bother, in case of anyone do have the same issue, it seem
BTW, You can use the following command to get the classpath
pig -printCmdDebug
Best Regard,
Jeff Zhang
On 7/14/15, 10:17 PM, "Jianfeng (Jeff) Zhang"
wrote:
>It should be classpath issue. Did you set the PIG_HOME ? Maybe you still
>point PIG_HOME to pig version 0.13
>
Do you find the joda jar under
/share/hadoop/tools/lib/joda-time-2.5.jar ?
Best Regard,
Jeff Zhang
On 7/15/15, 1:42 PM, "Antoine Lafleur" wrote:
>Hi,
>
>The result of the command is :
>
>Find hadoop at /usr/local/hadoop/bin/hadoop
>dry run:
>HADOOP_CLASSPA
More context & logs would be helpful
Best Regard,
Jeff Zhang
On 7/23/15, 12:31 AM, "sajid mohammed" wrote:
>i am trying to create totuple but getting some parse errors
Of course, you can. AvroStorage is a built-in UDF, you just need to put
pig jar on the classpath
Best Regard,
Jeff Zhang
On 1/11/16, 4:22 AM, "John Smith" wrote:
>Hi,
>
>I have a java code that generates pig script. I am wondering if there is
>option to execut
Do you use PigServer.registerScript(fileName) ? Then what errors do you
see if you use AvroStorage ?
Best Regard,
Jeff Zhang
On 1/11/16, 9:36 AM, "John Smith" wrote:
>hi,
>
>i think you are answering something different. I need execute whole pig
>script using P
Congratulations Liyun!
Best Regard,
Jeff Zhang
On 12/20/16, 11:29 AM, "Pallavi Rao" wrote:
>Congratulations Liyun!
I would suggest you to use tez which just launch one yarn app for one pig
script.
http://pig.apache.org/docs/r0.16.0/perf.html#enable-tez
Best Regard,
Jeff Zhang
On 3/17/17, 3:24 AM, "Mohammad Tariq" wrote:
>Hi group,
>
>In any real world pig script we end up wi
In that case, you can use keyword parallel to control the number of
reducer task of each mr job. The number of mapper task is determined by
the InputFormat.
See.
http://pig.apache.org/docs/r0.16.0/basic.html
Best Regard,
Jeff Zhang
On 3/18/17, 6:03 PM, "Mohammad Tariq" wrote:
Awesome, I also have integrated spark engine in zeppelin notebook as well.
https://github.com/zjffdu/zeppelin/tree/ZEPPELIN-2615
Best Regard,
Jeff Zhang
On 6/23/17, 5:09 AM, "Rohini Palaniswamy" wrote:
>Thanks Adam for being the Release Manager and getting this important
49 matches
Mail list logo