pyckle commented on code in PR #3264:
URL: https://github.com/apache/parquet-java/pull/3264#discussion_r2259391067
##########
parquet-column/src/main/java/org/apache/parquet/column/impl/ColumnReaderBase.java:
##########
@@ -445,30 +445,39 @@ void writeValue() {
* @param converter a converter that materializes the values in this
column in the current record
* @param writerVersion writer version string from the Parquet file being
read
*/
- ColumnReaderBase(
+ protected ColumnReaderBase(
ColumnDescriptor path, PageReader pageReader, PrimitiveConverter
converter, ParsedVersion writerVersion) {
+ this(path, readDict(path, pageReader), pageReader, converter,
writerVersion);
+ }
+
+ protected ColumnReaderBase(
+ ColumnDescriptor path, Dictionary dictionary, PageReader
dataPageReader, PrimitiveConverter converter, ParsedVersion writerVersion) {
this.path = Objects.requireNonNull(path, "path cannot be null");
- this.pageReader = Objects.requireNonNull(pageReader, "pageReader cannot be
null");
+ this.pageReader = Objects.requireNonNull(dataPageReader, "pageReader
cannot be null");
this.converter = Objects.requireNonNull(converter, "converter cannot be
null");
this.writerVersion = writerVersion;
this.maxDefinitionLevel = path.getMaxDefinitionLevel();
- DictionaryPage dictionaryPage = pageReader.readDictionaryPage();
Review Comment:
The PageReader returns a DictionaryPage which is then decoded in the
constructor:
`dictionary = dictionaryPage.getEncoding().initDictionary(path,
dictionaryPage);`
As `getEncoding()` returns an enum, there's no way to override the method of
`initDictionary()` to return a cached instance.
Adding a `getDictionary()` function to the `PageReader` interface would
therefore be necessary. In order not to break any downstream `PageReader`
implementations, a default method must be added, which seems unclean to me, but
maybe it's OK.
I implemented this so you can see what it looks like. If you prefer this
approach, I can add the caching logic to the `ColumnChunkPageReadStore` too.
Another option could be adding a function `decodeDictionary()` to
`DictionaryPage`
Let me know what you think is cleanest.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]