Thanks Bill. Any ideas on how to hide the location of HDFS files from the end user?
On Tue, Dec 11, 2012 at 9:42 PM, Bill Graham <[email protected]> wrote: > I think the latter would be better. Since the LoadFunc would be decoupled > from the data exporter you could schedule the exporting independent of the > loading. We do something similar, without the $query part. > > > On Tue, Dec 11, 2012 at 1:10 AM, Prashant Kommireddi <[email protected] > >wrote: > > > I was working on a LoadFunc and needed some ideas/second opinion on the > > best way to do this: > > > > > > 1. We use an API to download data from database as flat-files. > > - A query is given with table name and fields required to extract > > data > > 2. Once 1. is done upload data to HDFS > > 3. Upload the schema file to HDFS > > 4. LoadFunc to read the schema file and parse data > > > > A strict requirement is to hide the details of the location of these HDFS > > files from the user issuing the pig query. For a user it could look as > > simple as: > > > > A = load 'scheme://SampleTable' using CustomLoader('$query'); > > > > User here only issues the load statement on table with a query and API > > calls for importing from database could happen in the background. > > > > What would be the best way to do this? Is it better to do the above as > part > > of LoadFunc, or would it rather be beneficial to do it separate and > somehow > > communicate the location from API import to LoadFunc? > > > > Thanks, > > > > Prashant > > > > > > -- > *Note that I'm no longer using my Yahoo! email address. Please email me at > [email protected] going forward.* >
