[ 
https://issues.apache.org/jira/browse/AVRO-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17130638#comment-17130638
 ] 

Shay Elbaz commented on AVRO-1130:
----------------------------------

Hi [~rskraba],

Thanks for answering.

In our implementation we overwritten SortedKeyValueFile with breaking changes, 
since we didn't need to original implementation.

For not breaking the public API in the PR, I thought of the following design:
 # adding `avro.output.sortedkv.relativeIndexPath` job config. defaults to null.
 # if not set, keep current SKVF behavior - index+data files in the same 
directory:

{code:java}
output_path/part-00000/data
output_path/part-00000/index
output_path/part-00001/data
output_path/part-00001/index
output_path/part-00002/data
output_path/part-00002/index{code}

 # if set, all data files go to the same output directory, and the index 
directory is inferred by the relative path, for example:
dataPath =                     `/foo/bar/hive/table_name/dt=123`
relativeIndexPath =        `../../../index/dt=123`
absolute index path =    `/foo/bar/index/table_name/dt=123`

 
{code:java}
// data files for hive:
foo/bar/hive/table_name/dt=20200101/part-00000-data
foo/bar/hive/table_name/dt=20200101/part-00001-data
foo/bar/hive/table_name/dt=20200101/part-00002-data

//index files:
foo/bar/index/table_name/dt=20200101/part-00000-index
foo/bar/index/table_name/dt=20200101/part-00001-index
foo/bar/index/table_name/dt=20200101/part-00002-index
{code}
 

 # adding `relativeIndexPath` string property to 
SortedKeyValueFile.[Reader/Writer].Options to apply the above logic

Please tell me what you think.

> MapReduce Jobs can output write SortedKeyValueFiles directly
> ------------------------------------------------------------
>
>                 Key: AVRO-1130
>                 URL: https://issues.apache.org/jira/browse/AVRO-1130
>             Project: Apache Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Jeremy Lewi
>            Priority: Minor
>
> It would be nice if MapReduce jobs could write directly to 
> SortedKeyValueFile's.
> harsh@'s response on this thread http://goo.gl/OT1rN for some more 
> information on what needs to be done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to