[ 
https://issues.apache.org/jira/browse/TIKA-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18084569#comment-18084569
 ] 

Adrian Bird commented on TIKA-4746:
-----------------------------------

Sorry, that's not something I know how to do.

 

> tika-4.0.0-alpha1 - General Documentation Comments
> --------------------------------------------------
>
>                 Key: TIKA-4746
>                 URL: https://issues.apache.org/jira/browse/TIKA-4746
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 4.0.0
>            Reporter: Adrian Bird
>            Priority: Major
>
> Here are some comments/thoughts etc. from looking at the updated, and 
> unchanged, documentation. Some comments may not be valid if associated code 
> changes have also been made (which I haven't checked). 
> I've only really looked at the Tika App / Pipes / File System combination, 
> although I have skimmed most of the others.
> On the web documentation pages there is a static 'Contents' table in the top 
> right that doesn't move when you scroll down. It is missing on the following 
> pages:
> - using-tika/index.adoc
> - configuration/index.adoc
> - maintainers/index.adoc
> Also, these pages don't open when you click the closed triangle image.
> General point - sometimes you use 'Solr' and 'Kafka' and sometimes 'Apache 
> Solr' and 'Apache Kafka'. Should they all be one or the other?
> using-tika/cli/index.adoc
> - Command Line Options - this lists a subset of the options. Shouldn't it 
> list all of them i.e. cover the same list that is output when doing `--help`.
> - Tika Pipes processing (the first one) - I think this could be removed as it 
> is covered in detail later.
> - Extract Markdown from a file - I think `Extract Markdown from a file` would 
> fit better after `Extract metadata as JSON`.
> - How Pipes mode is activated (2nd bullet) - with the released code I get an 
> exception if I specify both `--input` and `--output`.
> - How Pipes mode is activated (2nd bullet) - some of the options are not in 
> the Batch Options list below.
> - Tika Pipes Options - I would expect this list to match what is output when 
> doing `--help`.
> - Tika Pipes Examples - the formatting is different for these examples and 
> the ones above
> pipes/index.adoc
> - question - why is there a section on Emitters in this page, rather than in 
> the Emitters page?
> pipes/getting-started.adoc
> - JSON Configuration - 1st Note - EMIT_INTERMEDIATE_RESULTS is also a 
> placeholder token
> - JSON Configuration example - there should be a '=' in '--config 
> tika-config.json'
> pipes/iterators.adoc
> - why does it say 'they are not wrapped in a baseConfig block.' This is the 
> only mention of 'baseConfig' in the documentation.
> pipes/configuration.adoc
> - Filesystem-to-filesystem pipeline - EMIT_INTERMEDIATE_RESULTS is also a 
> placeholder token
> pipes/parse-modes.adoc
> - Content Handler Types - this mentions 'ContentHandlerFactory' and 
> 'parseContext' which seem like Java names and not JSON Config names.
> - CLI Usage - should it be "The tika-app pipes processor ..." rather than 
> 'batch'
> pipes/unpack-config.adoc
> - Quick Start - 'ParseMode.UNPACK' doesn't reflect what is in the config.
> - Configuration Options - this should say that these options are defined 
> within the 'unpack-config' key.
> - Enabling Frictionless Output -is 'UnpackConfig' ok here or should it be 
> 'unpack-config'.
> - CLI Usage - I don't see the '--unpack' option in the `--help` output
>  
> pipes/timeouts.adoc
> - CLI Usage - the output from '--help' doesn't seem to show that '--fork' 
> etc. can be used in Pipes mode.
> pipes/troubleshooting.adoc
> - Log levels and sensitive data - I didn't see any documentation about 
> logging in general and setting log levels.
> pipes/plugins/filesystem.adoc
> - Complete Pipeline Example - EMIT_INTERMEDIATE_RESULTS is also a placeholder 
> token
> - File System Reporter (file-system-reporter)- Configuration - statusFile - 
> Does this have to be an absolute path and not a relative path? If so it would 
> be worth saying..
> - Status file schema - counts - in my file I'm seeing 'statusCounts' and not 
> 'counts'
> - Status file schema - timestamp - in my file I'm seeing 'lastUpdate' rather 
> than 'timestamp'
> - Status file schema - in my file I'm also seeing 'started'
> configuration/index.adoc
> - I don't see a general overview of the configuration structure here (I know 
> Pipes configuration is covered elsewhere). If a user new to Tika comes here 
> and is starting with V4 they need more of an overview than is currently here 
> e.g. it should cover the top level keys in a JSON config file.
> - There are no links to the VML Parsers, External Parser and Tess4J OCR pages.
> configuration/digesters.adoc
> - Supported Algorithms - the output from '--help' does not mention the last 
> three - should it?
>  
> migration-to-4x/serialization-4x.adoc
> - Friendly Naming Convention - not really specific to this page, but how do 
> users know what the friendly names are. Running the 'list--*' options all 
> produce class names. 
> advanced/index.adoc 
> - it looks like all the topic entries are geared towards using the Java API 
> and aren't available through the CLI and JSON configuration, with the 
> exception of 'Setting Limits'. Is it worth adding text to this effect?
>  
> advanced/language-detection.adoc 
> - Overriding Model Selection - says 'Or via Tika’s JSON configuration 
> mechanism if you are using SelfConfiguring component loading' - how can it be 
> specified in a JSON config file.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to