[GH] (incubator-stormcrawler): Workflow run "Publish SNAPSHOTs" is working again!

2024-10-18 Thread GitBox


The GitHub Actions job "Publish SNAPSHOTs" on incubator-stormcrawler.git has 
succeeded.
Run started by GitHub user jnioche (triggered by jnioche).

Head commit for run:
da735bc7612346a9eaf4e0ae5e9df50fc1d88253 / jnioche 

Minor: Regenerated License File for 6ba510c91848dd3d8bce09dc65ac97194eea829c

Signed-off-by: GitHub 

Report URL: 
https://github.com/apache/incubator-stormcrawler/actions/runs/11412897996

With regards,
GitHub Actions via GitBox



[GH] (incubator-stormcrawler): Workflow run "Java CI with Maven" failed!

2024-10-18 Thread GitBox


The GitHub Actions job "Java CI with Maven" on incubator-stormcrawler.git has 
failed.
Run started by GitHub user jnioche (triggered by jnioche).

Head commit for run:
65c2e3253f66b54f6862b85364def3764bb1aa98 / Julien Nioche 

Connect to a remote instance using web sockets

Signed-off-by: Julien Nioche 

Report URL: 
https://github.com/apache/incubator-stormcrawler/actions/runs/11401161712

With regards,
GitHub Actions via GitBox



[GH] (incubator-stormcrawler): Workflow run "Java CI with Maven" failed!

2024-10-18 Thread GitBox


The GitHub Actions job "Java CI with Maven" on incubator-stormcrawler.git has 
failed.
Run started by GitHub user jnioche (triggered by jnioche).

Head commit for run:
65c2e3253f66b54f6862b85364def3764bb1aa98 / Julien Nioche 

Connect to a remote instance using web sockets

Signed-off-by: Julien Nioche 

Report URL: 
https://github.com/apache/incubator-stormcrawler/actions/runs/11401201893

With regards,
GitHub Actions via GitBox



[PR] Connect to a remote instance using web sockets [incubator-stormcrawler]

2024-10-18 Thread via GitHub


jnioche opened a new pull request, #1361:
URL: https://github.com/apache/incubator-stormcrawler/pull/1361

   Tested by running the PW image in a Docker container 
   
   ```
 pw-server:
container_name: pw-server
image: mcr.microsoft.com/playwright:v1.47.0
command:
  - /bin/sh
  - -c
  - |
cd /home/pwuser 
npx -y playwright@1.47.0 run-server --port 3000
ports:
- 3000:3000
restart: always
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@stormcrawler.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GH] (incubator-stormcrawler): Workflow run "Java CI with Maven" is working again!

2024-10-18 Thread GitBox


The GitHub Actions job "Java CI with Maven" on incubator-stormcrawler.git has 
succeeded.
Run started by GitHub user jnioche (triggered by jnioche).

Head commit for run:
9a9f41e2ed9c7f7ee9cc394fe16527956557b582 / Julien Nioche 

Fixed format

Signed-off-by: Julien Nioche 

Report URL: 
https://github.com/apache/incubator-stormcrawler/actions/runs/11401272346

With regards,
GitHub Actions via GitBox



[PR] Bugfix nofollow instructions in rel tags ignored [incubator-stormcrawler]

2024-10-18 Thread via GitHub


jnioche opened a new pull request, #1362:
URL: https://github.com/apache/incubator-stormcrawler/pull/1362

   ... when the tag has more than one value
   
   As specified in 
[here](https://developer.mozilla.org/en-US/docs/Web/HTML/Attributes/rel)
   
   > the value of the rel attribute, which, if present, must have a value that 
is an unordered set of unique space-separated keywords.
   
   we currently assume that `nofollow` is the entire value of a `rel` tag and 
so, if any other keyword is specified, we miss it.
   
   This PR includes a test case and a fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@stormcrawler.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] #620 Add support for shards - SolrSpout [incubator-stormcrawler]

2024-10-18 Thread via GitHub


mvolikas commented on code in PR #1343:
URL: 
https://github.com/apache/incubator-stormcrawler/pull/1343#discussion_r1806320925


##
external/solr/setup-solr.sh:
##


Review Comment:
   In our team, we currently have an ES deployment for StormCrawler. The main 
reason for not using Solr in the first place was that we could not have many 
spouts running in parallel and were not sure how well it would scale. I think 
this will no longer be the case after issues #620 and #621 get resolved and we 
would prefer Solr for many reasons - Apache ecosystem, more experience using 
it, and easier deployment (at least in our use case) to name a few.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@stormcrawler.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org