syedamisbahh opened a new pull request, #2171: URL: https://github.com/apache/tika/pull/2171
This pull request includes multiple code refactorings aimed at improving clarity, readability, and maintainability in the Apache Tika codebase. The changes preserve original functionality while making the code more expressive and modular. Refactorings Applied 1. Extract Method + Decompose Conditional Location: MediaTypeRegistry#getSupertype() Replaced deeply nested if-else blocks with helper methods like isXmlSubtype(), isTextType(), isEmptyType(), etc. Used early returns to simplify control flow and reduce cyclomatic complexity. 2. Rename Variable Locations: MediaTypeRegistry.java, JsonPipesIterator.java Renamed variables to improve self-documentation: type → mediaType t → tuple r → reader 3. Introduce Explaining Constants Location: TextStatistics#looksLikeUTF8() Replaced magic numbers (e.g. 0x20, 0x80, 0xc0) with named constants for better readability and understanding of UTF-8 byte range logic. **Note**: I am awaiting access to the Apache Tika Jira issue tracker to file a formal issue. Once granted access, I will: - Create the corresponding [TIKA-XXXX] issue. - Update this PR title and description to include the issue reference. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org