[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851779#comment-17851779 ]
ASF GitHub Bot commented on TIKA-4252: -------------------------------------- tballison commented on PR #1778: URL: https://github.com/apache/tika/pull/1778#issuecomment-2145904427 Ha, @nddipiazza. I did earlier this morning. I chose your choices over mine in the merge, largely. See https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=17851727#comment-17851727 What we now need to do is figure out how to serialize+deserialize ParseContext with as little work as possible. :D > PipesClient#process - seems to lose the Fetch input metadata? > ------------------------------------------------------------- > > Key: TIKA-4252 > URL: https://issues.apache.org/jira/browse/TIKA-4252 > Project: Tika > Issue Type: Bug > Reporter: Nicholas DiPiazza > Priority: Major > Fix For: 3.0.0 > > > when calling: > PipesResult pipesResult = pipesClient.process(new > FetchEmitTuple(request.getFetchKey(), > new FetchKey(fetcher.getName(), request.getFetchKey()), > new EmitKey(), tikaMetadata, HandlerConfig.DEFAULT_HANDLER_CONFIG, > FetchEmitTuple.ON_PARSE_EXCEPTION.SKIP)); > the tikaMetadata is not present in the fetch data when the fetch method is > called. > > It's OK through this part: > UnsynchronizedByteArrayOutputStream bos = > UnsynchronizedByteArrayOutputStream.builder().get(); > try (ObjectOutputStream objectOutputStream = new > ObjectOutputStream(bos)) > { objectOutputStream.writeObject(t); } > byte[] bytes = bos.toByteArray(); > output.write(CALL.getByte()); > output.writeInt(bytes.length); > output.write(bytes); > output.flush(); > > i verified the bytes have the expected metadata from that point. > > UPDATE: found issue > > org.apache.tika.pipes.PipesServer#parseFromTuple > > is using a new Metadata when it should only use empty metadata if fetch tuple > metadata is null. -- This message was sent by Atlassian Jira (v8.20.10#820010)