Re: Release?

2024-05-03 Thread Richard Zowalla
Hi folks,

I quickly chatted with Julien off-list.
If no one objects, I am going to propose a first release candidate next week!

Best
Richard

On 2024/05/02 16:30:22 Richard Zowalla wrote:
> Ok so anything else?
> We have release docs available ;-)
> Anyone want to act as Release Manager for our first ASF release?
> 
> Think we are ready issue-wise.
> 
> Gruß
> Richard 
> 
> 
> Am 29. April 2024 16:48:42 MESZ schrieb Julien Nioche 
> :
> >Thanks Ayush
> >
> >I have fixed the license headers in
> >https://github.com/apache/incubator-stormcrawler/pull/1201
> >
> >Julien
> >
> >On Mon, 29 Apr 2024 at 15:18, Ayush Saxena  wrote:
> >
> >> Should be great.
> >> Was looking around the code & see if there are any potential issues which
> >> can block the vote.
> >> Little bit curious around some files having "Licensed to DigitalPebble Ltd
> >> under one or more" [1]
> >>
> >> Should we ditch such LICENSE headers, not sure if it is allowed or not, [2]
> >> just mentions the standard License header
> >>
> >> There are some files here in this directory [3] referring to DigitalPebble,
> >> if not required we can consider dropping before the release
> >>
> >> Some files tend to have different header as compared to one mentioned in
> >> the official doc [4], it mentions reading the NOTICE file & stuff
> >>
> >> Just reading the incubator vote checklist [4], if everything is good as per
> >> this doc, We should be good to go.
> >>
> >> Thanx Richard for initiating the discussion!!!
> >>
> >> -Ayush
> >>
> >> [1]
> >>
> >> https://github.com/apache/incubator-stormcrawler/blob/main/core/src/test/java/org/apache/stormcrawler/indexer/BasicIndexingTest.java
> >> [2] https://www.apache.org/legal/src-headers#headers
> >> [3]
> >>
> >> https://github.com/apache/incubator-stormcrawler/tree/main/core/src/test/resources
> >> [4]
> >>
> >> https://cwiki.apache.org/confluence/display/INCUBATOR/Incubator+Release+Checklist
> >>
> >> On Mon, 29 Apr 2024 at 19:16, Richard Zowalla  wrote:
> >>
> >> > Hi all,
> >> >
> >> > what do we need to do to run our first ASF release?
> >> > Personally, I would love to see [1] in 3.0.
> >> >
> >> > Don't think we have any other formal blockers?
> >> >
> >> > Gruß
> >> > Richard
> >> >
> >> >
> >> > [1] https://github.com/apache/incubator-stormcrawler/pull/1199
> >> >
> >>
> 


Re: [I] Newer Elasticsearch Version deprecate the REST High Level Client in favour of the Java API Client [incubator-stormcrawler]

2024-05-03 Thread via GitHub


rzo1 closed issue #945: Newer Elasticsearch Version deprecate the REST High 
Level Client in favour of the Java API Client
URL: https://github.com/apache/incubator-stormcrawler/issues/945


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@stormcrawler.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Newer Elasticsearch Version deprecate the REST High Level Client in favour of the Java API Client [incubator-stormcrawler]

2024-05-03 Thread via GitHub


rzo1 commented on issue #945:
URL: 
https://github.com/apache/incubator-stormcrawler/issues/945#issuecomment-2092820610

   We dropped ES, so closing this issue is ok now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@stormcrawler.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Add config to shard based on instance number instead of field [incubator-stormcrawler]

2024-05-03 Thread via GitHub


rzo1 closed issue #489: Add config to shard based on instance number instead of 
field
URL: https://github.com/apache/incubator-stormcrawler/issues/489


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@stormcrawler.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] ES IndexerBold - Fix behaviour of afterBulk [incubator-stormcrawler]

2024-05-03 Thread via GitHub


rzo1 closed issue #992: ES IndexerBold - Fix behaviour of afterBulk
URL: https://github.com/apache/incubator-stormcrawler/issues/992


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@stormcrawler.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] #1207 -- add forbidden-apis [incubator-stormcrawler]

2024-05-03 Thread via GitHub


tballison opened a new pull request, #1208:
URL: https://github.com/apache/incubator-stormcrawler/pull/1208

this just adds the plugin. I'll update the repo… o pass it in follow-on 
commits. This is just a WIP.
   
   Thank you for contributing to Apache StormCrawler.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [ ] Is there a issue associated with this PR? Is it referenced in the 
commit message?
   
   - [ ] Does your PR title start with `#` where `` is the issue number 
you are trying to resolve? 
   
   - [ ] Has your PR been rebased against the latest commit within the target 
branch (typically main)?
   
   - [ ] Is your initial contribution a single, squashed commit?
   
   - [ ] Is the code properly formatted with `mvn git-code-format:format-code 
-Dgcf.globPattern=**/*`?
   
   ### For code changes:
   
   - [ ] Have you ensured that the full suite of tests is executed via `mvn 
clean verify`?
   - [ ] Have you written or updated unit tests to verify your changes?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file?
   - [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file?
   
   
   ### Note:
   Please ensure that once the PR is submitted, you check GitHub Actions for 
build issues and submit an update to your PR as soon as possible.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@stormcrawler.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Add forbidden-apis [incubator-stormcrawler]

2024-05-03 Thread via GitHub


tballison commented on issue #1207:
URL: 
https://github.com/apache/incubator-stormcrawler/issues/1207#issuecomment-2093053737

   Working on this here: 
https://github.com/apache/incubator-stormcrawler/pull/1208


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@stormcrawler.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Add forbidden-apis [incubator-stormcrawler]

2024-05-03 Thread via GitHub


tballison commented on issue #1207:
URL: 
https://github.com/apache/incubator-stormcrawler/issues/1207#issuecomment-2093158410

   K, that's ready for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@stormcrawler.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] #1207 -- add forbidden-apis [incubator-stormcrawler]

2024-05-03 Thread via GitHub


jnioche merged PR #1208:
URL: https://github.com/apache/incubator-stormcrawler/pull/1208


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@stormcrawler.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] #1207 -- add forbidden-apis [incubator-stormcrawler]

2024-05-03 Thread via GitHub


jnioche commented on PR #1208:
URL: 
https://github.com/apache/incubator-stormcrawler/pull/1208#issuecomment-2093285732

   thanks @tballison 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@stormcrawler.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] Apple Silicon emulation issue in unit tests [incubator-stormcrawler]

2024-05-03 Thread via GitHub


joshfischer1108 opened a new issue, #1209:
URL: https://github.com/apache/incubator-stormcrawler/issues/1209

   When compiling Stormcrawler from source on Apple Silicon we are hitting 
timeout issues in selenium tests due to emulation issues. 

   ## Steps to reproduce:
   
   Using an Apple M3:
   From the top level directory run:
   ```
   mvn clean install
   ```
   
   First we get this warning.
   ```
   The architecture 'amd64' for image 'selenium/standalone-chrome:120.0' (ID 
sha256:deff784da2138b912b66e2941cc976ced4ecba3a4e6941ca3bfa2b8c6b75) does 
not match the Docker server architecture 'arm64'. This will cause the container 
to execute much more slowly due to emulation and may lead to timeout failures.
   ```
   Then we get this error:
   ```
   [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
27.48 s <<< FAILURE! -- in 
org.apache.stormcrawler.protocol.selenium.ProtocolTest
   [ERROR] org.apache.stormcrawler.protocol.selenium.ProtocolTest.testBlocking 
-- Time elapsed: 27.44 s <<< ERROR!
   org.awaitility.core.ConditionTimeoutException: Condition with 
org.apache.stormcrawler.protocol.selenium.ProtocolTest was not fulfilled within 
10 seconds.
   ```
   
   
   Then the error should appear
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@stormcrawler.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Add forbidden-apis [incubator-stormcrawler]

2024-05-03 Thread via GitHub


rzo1 closed issue #1207: Add forbidden-apis
URL: https://github.com/apache/incubator-stormcrawler/issues/1207


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@stormcrawler.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Apple Silicon emulation issue in unit tests [incubator-stormcrawler]

2024-05-03 Thread via GitHub


joshfischer1108 commented on issue #1209:
URL: 
https://github.com/apache/incubator-stormcrawler/issues/1209#issuecomment-2093827076

   I'm looking at the test and see the below.  Are these the timeouts?  I've 
changed them to much higher values such as `10` and the tests seem to 
timeout about the same time on my machine (which is around 27 seconds)
   
   ```
timeouts.put("implicit", 1);
   timeouts.put("pageLoad", 1);
   timeouts.put("script", 1);
   
   conf.put("selenium.timeouts", timeouts);
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@stormcrawler.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Apple Silicon emulation issue in unit tests [incubator-stormcrawler]

2024-05-03 Thread via GitHub


joshfischer1108 commented on issue #1209:
URL: 
https://github.com/apache/incubator-stormcrawler/issues/1209#issuecomment-2093832637

   I forgot to add the link.  [Here is where I am 
looking](https://github.com/apache/incubator-stormcrawler/blob/c1088fb3ff3ca9ca99bcce108d8bb2b40b97c094/core/src/test/java/org/apache/stormcrawler/protocol/selenium/ProtocolTest.java#L90)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@stormcrawler.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] #1209 fix for emulation error in tests run on silicon [incubator-stormcrawler]

2024-05-03 Thread via GitHub


joshfischer1108 opened a new pull request, #1210:
URL: https://github.com/apache/incubator-stormcrawler/pull/1210

   This addresses the container emulation issue referenced in #1209 
   
   ### For all changes:
   - [x] Is there a issue associated with this PR? Is it referenced in the 
commit message?
   
   - [x] Does your PR title start with `#` where `` is the issue number 
you are trying to resolve? 
   
   - [x] Has your PR been rebased against the latest commit within the target 
branch (typically main)?
   
   - [x] Is your initial contribution a single, squashed commit?
   
   - [x] Is the code properly formatted with `mvn git-code-format:format-code 
-Dgcf.globPattern=**/*`?
   
   ### For code changes:
   
   - [x] Have you ensured that the full suite of tests is executed via `mvn 
clean verify`?
   - [x] Have you written or updated unit tests to verify your changes?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file?
   - [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file?
   
   
   ### Note:
   Please ensure that once the PR is submitted, you check GitHub Actions for 
build issues and submit an update to your PR as soon as possible.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@stormcrawler.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] #1209 fix for emulation error in tests run on silicon [incubator-stormcrawler]

2024-05-03 Thread via GitHub


rzo1 commented on code in PR #1210:
URL: 
https://github.com/apache/incubator-stormcrawler/pull/1210#discussion_r1589882794


##
core/src/test/java/org/apache/stormcrawler/protocol/selenium/ProtocolTest.java:
##
@@ -51,7 +51,8 @@ public class ProtocolTest extends AbstractProtocolTest {
 private static final Logger LOG = 
LoggerFactory.getLogger(ProtocolTest.class);
 
 private static final DockerImageName SELENIUM_IMAGE =
-DockerImageName.parse("selenium/standalone-chrome:120.0");
+DockerImageName.parse("seleniarm/standalone-chromium:latest")

Review Comment:
   Wonder if we can use a fixed tag? Reasoning would be, that "latest" can vary 
between environments / runs making reproducability difficult.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@stormcrawler.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org