dpol1 opened a new pull request, #1931:
URL: https://github.com/apache/stormcrawler/pull/1931
A collection of small but concrete correctness and robustness improvements.
- `FetcherBolt`: catch `NumberFormatException` when parsing crawl delay and
max thread values from metadata; log a warning and fall back to defaults
instead of crashing the bolt
- `ConfigurableTopology`, `URLFilters`: replace `e.printStackTrace()` with
proper SLF4J `LOG.error()` calls
- `CloudSearchUtils`: fix misleading error message ("must be score" →
"must NOT be score"); replace manual `MessageDigest` boilerplate with
`DigestUtils.sha512Hex()`
- `S3CacheChecker`, `S3Cacher`: use the `StandardCharsets.UTF_8` overload of
`URLEncoder.encode()` to remove an unnecessary checked exception
- `JsRenderingDetector`: use `CharsetIdentification.getCharsetFast()` to
detect the actual document charset instead of hardcoding UTF-8
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]