[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845299#comment-17845299 ]
ASF GitHub Bot commented on TIKA-4252: -------------------------------------- tballison commented on code in PR #1753: URL: https://github.com/apache/tika/pull/1753#discussion_r1596634451 ########## tika-core/src/main/java/org/apache/tika/pipes/PipesServer.java: ########## @@ -455,33 +455,33 @@ private Fetcher getFetcher(FetchEmitTuple t) { } } - protected MetadataListAndEmbeddedBytes parseFromTuple(FetchEmitTuple t, Fetcher fetcher) { - FetchKey fetchKey = t.getFetchKey(); + protected MetadataListAndEmbeddedBytes parseFromTuple(FetchEmitTuple fetchEmitTuple, Fetcher fetcher) { + FetchKey fetchKey = fetchEmitTuple.getFetchKey(); + Metadata fetchResponseMetadata = new Metadata(); Review Comment: The metadata that goes in the fetchemittuple was envisioned to be user-injected metadata that passed through the parse process and was emitted (provenance metadata). I think we need to put both metadatas on the fetchemittuple. > PipesClient#process - seems to lose the Fetch input metadata? > ------------------------------------------------------------- > > Key: TIKA-4252 > URL: https://issues.apache.org/jira/browse/TIKA-4252 > Project: Tika > Issue Type: Bug > Reporter: Nicholas DiPiazza > Priority: Major > Fix For: 3.0.0 > > > when calling: > PipesResult pipesResult = pipesClient.process(new > FetchEmitTuple(request.getFetchKey(), > new FetchKey(fetcher.getName(), request.getFetchKey()), > new EmitKey(), tikaMetadata, HandlerConfig.DEFAULT_HANDLER_CONFIG, > FetchEmitTuple.ON_PARSE_EXCEPTION.SKIP)); > the tikaMetadata is not present in the fetch data when the fetch method is > called. > > It's OK through this part: > UnsynchronizedByteArrayOutputStream bos = > UnsynchronizedByteArrayOutputStream.builder().get(); > try (ObjectOutputStream objectOutputStream = new > ObjectOutputStream(bos)) > { objectOutputStream.writeObject(t); } > byte[] bytes = bos.toByteArray(); > output.write(CALL.getByte()); > output.writeInt(bytes.length); > output.write(bytes); > output.flush(); > > i verified the bytes have the expected metadata from that point. > > UPDATE: found issue > > org.apache.tika.pipes.PipesServer#parseFromTuple > > is using a new Metadata when it should only use empty metadata if fetch tuple > metadata is null. -- This message was sent by Atlassian Jira (v8.20.10#820010)