[
https://issues.apache.org/jira/browse/AVRO-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17130638#comment-17130638
]
Shay Elbaz commented on AVRO-1130:
----------------------------------
Hi [~rskraba],
Thanks for answering.
In our implementation we overwritten SortedKeyValueFile with breaking changes,
since we didn't need to original implementation.
For not breaking the public API in the PR, I thought of the following design:
# adding `avro.output.sortedkv.relativeIndexPath` job config. defaults to null.
# if not set, keep current SKVF behavior - index+data files in the same
directory:
{code:java}
output_path/part-00000/data
output_path/part-00000/index
output_path/part-00001/data
output_path/part-00001/index
output_path/part-00002/data
output_path/part-00002/index{code}
# if set, all data files go to the same output directory, and the index
directory is inferred by the relative path, for example:
dataPath = `/foo/bar/hive/table_name/dt=123`
relativeIndexPath = `../../../index/dt=123`
absolute index path = `/foo/bar/index/table_name/dt=123`
{code:java}
// data files for hive:
foo/bar/hive/table_name/dt=20200101/part-00000-data
foo/bar/hive/table_name/dt=20200101/part-00001-data
foo/bar/hive/table_name/dt=20200101/part-00002-data
//index files:
foo/bar/index/table_name/dt=20200101/part-00000-index
foo/bar/index/table_name/dt=20200101/part-00001-index
foo/bar/index/table_name/dt=20200101/part-00002-index
{code}
# adding `relativeIndexPath` string property to
SortedKeyValueFile.[Reader/Writer].Options to apply the above logic
Please tell me what you think.
> MapReduce Jobs can output write SortedKeyValueFiles directly
> ------------------------------------------------------------
>
> Key: AVRO-1130
> URL: https://issues.apache.org/jira/browse/AVRO-1130
> Project: Apache Avro
> Issue Type: New Feature
> Components: java
> Affects Versions: 1.7.1
> Reporter: Jeremy Lewi
> Priority: Minor
>
> It would be nice if MapReduce jobs could write directly to
> SortedKeyValueFile's.
> harsh@'s response on this thread http://goo.gl/OT1rN for some more
> information on what needs to be done.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)