I do that kind of streaming on hdfs files using Hadoop streaming, outside of pig. I assume you could do it from inside pig too, but haven’t tested.
William F Dowling Sr Technical Specialist, Software Engineering Thomson Reuters 0 +1 215 823 3853 From: Moore, Michael A. [mailto:[email protected]] Sent: Tuesday, June 07, 2011 3:14 PM To: [email protected] Subject: Re: Loading Files with Comment Lines Possibly. Can I do that if the file is already in HDFS? ______________________________________ Michael Moore :: [email protected] <mailto:[email protected]> The Johns Hopkins University Applied Physics Laboratory 0B7B17EE1AE2A80B pgp BC31 A861 9726 8211 F79F 7E21 0B7B 17EE 1AE2 A80B pgp fingerprint On Jun 7, 2011, at 3:12 PM, <[email protected]> wrote: Can you stream it through grep -v ‘^#’ ? William F Dowling Sr Technical Specialist, Software Engineering Thomson Reuters 0 +1 215 823 3853 From: Moore, Michael A. [mailto:[email protected]] Sent: Tuesday, June 07, 2011 3:04 PM To: [email protected] Subject: Loading Files with Comment Lines Hello all- I've got a quick question and Google isn't proving to be much help. I've got a big file, that has a few lines in it prefaced with a pound sign (#) to indicate they are to be ignored. I would like to LOAD this file using PigStorage. Is there a way to do this, or is it handled automatically? The data might look something like this: # Data Source: Project A # Contact MMoore with Questions # SenderId RecipientId 1 2 3 5 6 7 #2 1 3 6 11 7 Thanks! -Michael ______________________________________ Michael Moore :: [email protected] <mailto:[email protected]> The Johns Hopkins University Applied Physics Laboratory 0B7B17EE1AE2A80B pgp BC31 A861 9726 8211 F79F 7E21 0B7B 17EE 1AE2 A80B pgp fingerprint
