chhsiao90 commented on PR #1250:
URL: 
https://github.com/apache/incubator-stormcrawler/pull/1250#issuecomment-2227655082

   > The solutions applied in okhttp and httpclient classes seem very 
different. It would greatly simply review to have 2 different PRs that can be 
reviewed independently.
   
   Sure, I'll separate the httpclient in a separated PR.
   
   > My gut reaction is that adding even more shared state to the classes is 
not a good way to go.
   
   Actually, those newly added fields are not shared state. They are just the 
http options retrieved from the conf. 
   
   The easiest and most readable way is to just have a local field conf, and we 
always build the http client from the conf. But that would introduce 
unnecessary map.get operations as the conf should keep unchanged when it was 
loaded.
   
   So I decided to have these option fields to be loaded during the `void 
configure(conf)` method call. And I think that's how most other storm crawler 
classes do, ex: JSoupParserBolt.
   
   What do you think?
   
   > The more I look at this the more I think there is a major design flaw. The 
HttpProtocol classes should not be reused for HTTP calls that have different 
configs. There should be separate HttpProtocol class instances created for each 
different config. In #1247, there are 2 proxies described, HTTP calls for those 
2 proxies should not be made with the same HTTPProtocol instance.
   
   I think it's might be a common case to round-robin a large proxy pool (may 
have thousands or more ips). So it makes sense to me to have a single 
HttpProtocol, and select a proxy from ProxyManager whenever it was called.
   
   If we'd like to have multiple HttpProtocol instances for each proxy, it 
means that we may round-robin the HttpProtocol in ProtocolFactory. And I'm not 
sure if it's easy to implement it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@stormcrawler.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to