[jira] [Commented] (AVRO-1130) MapReduce Jobs can output write SortedKeyValueFiles directly

Shay Elbaz (Jira) Thu, 04 Jun 2020 10:18:54 -0700


    [ 
https://issues.apache.org/jira/browse/AVRO-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126087#comment-17126087
 ]


Shay Elbaz commented on AVRO-1130:
----------------------------------

Is this still relevant? We implemented some Hive-compatible version in my team:

```

hive_data_table/dt=20200101/part-00000-data
hive_data_table/dt=20200101/part-00001-data
hive_data_table/dt=20200101/part-00002-data

hive_index_table/dt=20200101/part-00000-index
hive_index_table/dt=20200101/part-00001-index
hive_index_table/dt=20200101/part-00002-index
```

 

This way we are able to create external table over the data files, which is not 
possible with the current implementation. We use it in Spark application, not 
MR job, though.

The changes to SortedKeyValueFile were minimal, as one would expect.

Should we PR our implementation?

> MapReduce Jobs can output write SortedKeyValueFiles directly
> ------------------------------------------------------------
>
>                 Key: AVRO-1130
>                 URL: https://issues.apache.org/jira/browse/AVRO-1130
>             Project: Apache Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.1
>            Reporter: Jeremy Lewi
>            Priority: Minor
>
> It would be nice if MapReduce jobs could write directly to 
> SortedKeyValueFile's.
> harsh@'s response on this thread http://goo.gl/OT1rN for some more 
> information on what needs to be done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (AVRO-1130) MapReduce Jobs can output write SortedKeyValueFiles directly

Reply via email to