chhsiao90 commented on PR #1250: URL: https://github.com/apache/incubator-stormcrawler/pull/1250#issuecomment-2227655082
> The solutions applied in okhttp and httpclient classes seem very different. It would greatly simply review to have 2 different PRs that can be reviewed independently. Sure, I'll separate the httpclient in a separated PR. > My gut reaction is that adding even more shared state to the classes is not a good way to go. Actually, those newly added fields are not shared state. They are just the http options retrieved from the conf. The easiest and most readable way is to just have a local field conf, and we always build the http client from the conf. But that would introduce unnecessary map.get operations as the conf should keep unchanged when it was loaded. So I decided to have these option fields to be loaded during the `void configure(conf)` method call. And I think that's how most other storm crawler classes do, ex: JSoupParserBolt. What do you think? > The more I look at this the more I think there is a major design flaw. The HttpProtocol classes should not be reused for HTTP calls that have different configs. There should be separate HttpProtocol class instances created for each different config. In #1247, there are 2 proxies described, HTTP calls for those 2 proxies should not be made with the same HTTPProtocol instance. I think it's might be a common case to round-robin a large proxy pool (may have thousands or more ips). So it makes sense to me to have a single HttpProtocol, and select a proxy from ProxyManager whenever it was called. If we'd like to have multiple HttpProtocol instances for each proxy, it means that we may round-robin the HttpProtocol in ProtocolFactory. And I'm not sure if it's easy to implement it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@stormcrawler.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org