rzo1 commented on code in PR #1587:
URL: https://github.com/apache/stormcrawler/pull/1587#discussion_r2178397592


##########
core/pom.xml:
##########
@@ -264,6 +265,12 @@ under the License.
                        <scope>test</scope>
                </dependency>
 
+               <dependency>
+                       <groupId>org.apache.httpcomponents.core5</groupId>

Review Comment:
   StormCrawler is currently using HttpClient version 4, along with core 
components also on version 4. We should either upgrade to version 5 or maintain 
consistency by sticking with version 4 throughout.



##########
core/src/main/java/org/apache/stormcrawler/protocol/okhttp/HttpProtocol.java:
##########
@@ -352,7 +352,7 @@ public ProtocolResponse getProtocolOutput(String url, final 
Metadata metadata)
 
             final String lastModified = 
metadata.getFirstValue(HttpHeaders.LAST_MODIFIED);
             if (StringUtils.isNotBlank(lastModified)) {
-                rb.header("If-Modified-Since", 
HttpHeaders.formatHttpDate(lastModified));
+                rb.header("If-Modified-Since", formatHttpDate(lastModified));

Review Comment:
   Can this be replaced by 
https://hc.apache.org/httpcomponents-core-4.4.x/current/httpcore/apidocs/org/apache/http/HttpHeaders.html#IF_MODIFIED_SINCE
 ?



##########
core/src/main/java/org/apache/stormcrawler/protocol/okhttp/HttpProtocol.java:
##########
@@ -352,7 +352,7 @@ public ProtocolResponse getProtocolOutput(String url, final 
Metadata metadata)
 
             final String lastModified = 
metadata.getFirstValue(HttpHeaders.LAST_MODIFIED);
             if (StringUtils.isNotBlank(lastModified)) {
-                rb.header("If-Modified-Since", 
HttpHeaders.formatHttpDate(lastModified));
+                rb.header("If-Modified-Since", formatHttpDate(lastModified));
             }
 
             final String ifNoneMatch = metadata.getFirstValue("etag", 
protocolMDprefix);

Review Comment:
   
https://hc.apache.org/httpcomponents-core-4.4.x/current/httpcore/apidocs/org/apache/http/HttpHeaders.html#ETAG
 ?



##########
core/src/main/java/org/apache/stormcrawler/Metadata.java:
##########
@@ -109,14 +109,15 @@ public String[] getValues(String key, String prefix) {
     }
 
     public String[] getValues(String key) {
-        String[] values = md.get(key);
+        if (key == null || key.isEmpty()) return null;
+        String[] values = md.getOrDefault(key, md.get(key.toLowerCase()));

Review Comment:
   I would be fine to drop the backward compatibility here. We could mention in 
the changelog, that the next SC release have fixed odd http headers (or we 
stick with the compatibility, but we should drop it some day because it makes 
our code a bit more complex to re-think / track why it was added). If people 
would like to have the backward compatibility for those headers, we should 
maybe add a code comment referencing the issue? wdyt?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@stormcrawler.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to