Nice!
I think, that we can touch the links in "Resources": "Powered-By" and
„Presentations“.
Don’t know if we will find a replacement for links, which are gone, but would
maybe also be an option to drop them or migrate it to the website.
Gruß
Richard
> Am 16.07.2025 um 23:42 schrieb Sebastian
Hi,
We uploaded our StormCrawler logo for automatic logo generation [1]. The source
file is available here [2] or in our StormCrawler website repository.
Does anyone have decent Photoshop skills? I think we should consider the
following: *
Update our logo by creating an SVG version that scales
Thanks Dave.
Added the logo :)
> Am 08.07.2025 um 21:16 schrieb Dave Fisher :
>
> ASF assets can use a project’s logo in various places, but only we provide it.
>
> See https://www.apache.org/logos/about.html for how to submit the logo.
>
> Best,
> Dave
Hi,
I would most likely add an additional addValues(…) with an array as parameter
as I think the use of arrays was intentional.
Gruß
Richard
> Am 06.07.2025 um 14:43 schrieb Dávid Szigecsán :
>
> Hi team,
>
> I've been working with the org.apache.stormcrawler.Metadata API and noticed
> an inc
The Apache StormCrawler team is pleased to announce the release of version
3.4.0 of Apache StormCrawler.
StormCrawler is a collection of resources for building low-latency,
customisable and scalable web crawlers on Apache Storm.
Apache StormCrawler 3.4.0 source distributions is available for d
Hi,
This vote passes with the following +1 and no -1:
Julien Nioche (binding)
Markos Volikas (binding)
Richard Zowalla (binding)
Thanks for testing. I’ll proceed.
Gruß
Richard
Here is my own +1 (binding)
On 2025/06/24 07:00:54 Richard Zowalla wrote:
> Hi folks,
>
> I have posted a 1st release candidate for the Apache StormCrawler[version]
> release and it is ready for testing.
>
> StormCrawler 3.4.0 introduces several enhancements, most notably the
Hi folks,
I have posted a 1st release candidate for the Apache StormCrawler[version]
release and it is ready for testing.
StormCrawler 3.4.0 introduces several enhancements, most notably the
integration of generative AI via a new LLM-based text extractor compatible with
OpenAI APIs, enabling a
Thanks for all of your feedback.
Will try to get an RC online for vote this week.
Gruß
Richard
> Am 18.06.2025 um 18:58 schrieb Tim Allison :
>
> +1
>
> On Tue, Jun 17, 2025 at 7:16 AM Richard Zowalla wrote:
>>
>> Hi all,
>>
>> What do you think about
Hi all,
What do you think about running a release soon?
We have a few things:
1.) Removed incubator notes
2.) Possibly LLM Support (pending PR)
3.) Updates in the SOLR area (pending PR)
4.) Multiple dependency updates including Storm 2.8.1
WDYT?
Gruß
Richard
Oh s*** :) you are right. Thanks, Dave.
For Stormcrawler we already have it :)
Am Freitag, dem 02.05.2025 um 11:48 -0700 schrieb Dave Fisher:
> Did you mean to send this to d...@storm.apache.org?
>
> > On May 2, 2025, at 11:42 AM, Richard Zowalla
> > wrote:
> >
>
Hi all,
I have updated our Storm website to be build automatically via GitHub
actions on push to "main".
You have the possbility to either push to "main" and wait for a deploy
_or_ you create a branch in the site repo and do a staging deploy via a
manually triggered action. The website is than a
FYI: Vote passed :)
I’ll ask our mentors via Slack how to proceed now. I guess, we need to add the
resolution to the next board agenda.
Gruß
Richard
> Anfang der weitergeleiteten Nachricht:
>
> Von: Richard Zowalla
> Betreff: [RESULT] [VOTE] Graduate Apache StormCrawler (Incubati
The general@a. <mailto:general@a.p>o. vote is here:
https://lists.apache.org/thread/d2z9mdy5ott8hrygoov7p496p0w9hcq0
> Am 22.04.2025 um 07:09 schrieb Richard Zowalla :
>
> Hi,
>
> This vote passes with the following votes:
>
> PJ Fanning (IPMC, binding)
> Ayush S
Hi,
This vote passes with the following votes:
PJ Fanning (IPMC, binding)
Ayush Saxena (IPMC, binding)
Julien Nioche
Dave Fisher (IPMC, binding)
Sebastian Nagel
Tim Allison (IPMC, binding)
Markos Volikas
Richard Zowalla (IPMC, binding)
I’ll bring the vote to the general@a.o <mailto:general@
Hi all,
here is my
+1 (PPMC, IPMC, binding)
Richard
Am Donnerstag, dem 17.04.2025 um 12:46 +0200 schrieb Richard Zowalla:
> Hi all,
>
> We’ve got positive feedback on the DISCUSS threads for graduation
> [1,2], and would like to start an official VOTE thread now.
> If this
immediately below be and hereby are appointed
to serve as the initial members of the Apache StormCrawler Project:
* Tim Allison
* Sebastian Nagel
* Julien Nicole
* Markos Volikas
* Richard Zowalla
NOW, THEREFORE, BE IT FURTHER RESOLVED, that Richard Zowalla be appointed to
the office of
.
>>>>
>>>> Example Incubator list threads for Uniffle.
>>>> * https://lists.apache.org/thread/bgsvlm383o33wt29nbs0vjvndywpmmnz
>>>> (discuss)
>>>> * https://lists.apache.org/thread/mn40kvo7m1mppy5h3vco2wlbmp7kjkx0
>>> (vote)
&g
7kjkx0 (vote)
> >
> > On 2025/04/12 11:59:31 Julien Nioche wrote:
> > > Hi PJ,
> > >
> > > Now that Richard has kindly fixed the website and volunteered to be PMC
> > > chair, what are the next steps towards graduation?
> > >
> > > Have a good
Hi Tim,
I think it is in the nature of things here. If we can avoid confusing the user,
it would be beneficial, to fix it, imho.
I had the same thing a while back and simply increased the max pending limit.
Gruß
Richard
> Am 01.04.2025 um 22:52 schrieb Tim Allison :
>
> All,
>
> I recently
Hi all,
I would be willing to serve as chair.
If anyone else wants to do it, I am also fine with it.
Gruß
Richard
Am 4. April 2025 17:47:54 MESZ schrieb Julien Nioche :
>Hello,
>
>We recently discussed steps towards graduation and one of them would be to
>have a PMC chair.
>Is anyone willing t
on, 24 Mar 2025 at 20:37, Richard Zowalla wrote:
>
>> Hi all,
>>
>> First draft is here:
>> https://github.com/apache/incubator-stormcrawler/wiki/Apache-Maturity-Model-Assessment-for-StormCrawler
>>
>> Think we need to add some cross references and
Hi all,
First draft is here:
https://github.com/apache/incubator-stormcrawler/wiki/Apache-Maturity-Model-Assessment-for-StormCrawler
Think we need to add some cross references and discuss about adding a dedicated
page for the existing PPMC / committers to our website.
Gruß
Richard
> Am 24.03.
Hi everyone,
I'm okay with graduating, though I'm not sure if the rest of the IPMC will feel
the same way about our (small) community.
Overall, SC is a well-established project that has been around for over ten
years and has consistently had users. There are also other TLPs with a niche
focus,
- Checked hashes and signtures of the src distribution
- Checked the staging page.
- Checked artifacts in the staging repo
- Build from source with full tests
- Run a dockerized crawl with the opensearch / playwright module
+1 (binding) - rzo1 (PPMC,IPMC)
Minor issues:
- We need to update th
No strong opinion from my side. Not a Solr User ;-)
Am 10. März 2025 15:55:39 MEZ schrieb Tim Allison :
>Should I wait for:
>https://github.com/apache/incubator-stormcrawler/pull/1488
>
>before cutting RC1?
>
>On Wed, Mar 5, 2025 at 8:54 AM Richard Zowalla wrote:
>
>>
; I don't think I'll manage to include
>> https://github.com/apache/incubator-stormcrawler/issues/621 in this
>> release.
>>
>> There is still testing and performance comparison to be done, and
>> probably some design decisions to be made when I open the PR.
>
Good news ;-) no report for us this month.
Ursprüngliche Nachricht
Von: Justin Mclean
Antwort an: gene...@incubator.apache.org
An: Incubator General
Betreff: Re: [MENTORS] March report timeline - reports due March 2025
Datum: Tue, 4 Mar 2025 16:46:17 +1100
> HI,
>
> StormCraw
We reported last month:
https://cwiki.apache.org/confluence/display/INCUBATOR/February2025#stormcrawler
Don’t think that there is much value in reporting 3 weeks later again but we
can copy paste the Feb report… WDYT?
> Am 03.03.2025 um 04:20 schrieb jmcl...@apache.org:
>
> Dear podling,
>
Feel free :-)
Am 21. Februar 2025 12:34:52 MEZ schrieb Tim Allison :
>I’d like to get two bug fixes in if possible. Early this coming week?
>
>I’m happy to be release manager again unless anyone else would like it?
>
>On Fri, Feb 21, 2025 at 4:15 AM Richard Zowalla wrote:
>
Hi,
Since Storm 2.8.0 is out a few weeks now, what do you think about doing a SC
release?
Gruß
Richard
I think, that there is no additional logic inside of the warc processing.
So in case you are going to re-fetch (the old content), you will have an
additional copy.
(Disclaimer: not using warc)
> Am 07.02.2025 um 12:57 schrieb Tim Allison :
>
> Again, this is more for a user@ list Sorry.
>
Do we have a large enough user base? Or do we want to rely on GitHub
discussions for such questions?
I am open for every approach.
Gruß
Richard
> Am 07.02.2025 um 13:22 schrieb PJ Fanning :
>
> Lists can be created with https://selfserve.apache.org/ but I would
> suggest that maybe there should
Hi Tim,
I think, that it is just not implemented. Feel free to add code to support that
configuration.
Gruß
Richard
> Am 05.02.2025 um 15:31 schrieb Tim Allison :
>
> This should be a question for our user@ mailing list, but I don't think
> we've set that up yet?
>
> I'm not sure if this is u
s, so I am still very new to this process.
> If I am doing something wrong or missing anything important, I would
> greatly appreciate it if you could let me know.🥹
>
> 2025년 1월 26일 (일) 오후 5:54, Richard Zowalla <mailto:r...@apache.org>>님이 작성:
>
>> Hi,
>>
>&
Hi,
Maybe it would be also good to subscribe to the dev@ list? :)
You’re mails are going into moderation and that would be away to circumvent
that.
Gruß
Richard
> Am 26.01.2025 um 05:52 schrieb 홍용준 :
>
> 다음은 자연스럽게 번역한 이메일입니다:
> --
>
> Hi Julien,
>
> I apologize fo
- Checked hashes with
https://gist.github.com/rzo1/28d493b24307e253635b0af9e78b0b02
- Checked signatures with
https://gist.github.com/rzo1/28d493b24307e253635b0af9e78b0b02
- Build from source with tests
- Run a crawl (opensearch)
+1 (binding)
rzo1 (ppmc + ipmc)
On 2024/12/03 14:24:28 Tim Allis
- Checked hashes
- Checked signatures with
- Build from source with tests
- Run a crawl (opensearch)
- (re-checked the licenses while working on the RC1 fixes)
+1 (binding)
rzo1 (ppmc + ipmc)
Am Freitag, dem 22.11.2024 um 16:46 -0500 schrieb Tim Allison:
> Hi folks,
>
> I have posted a 2nd re
- Checked hashes with
https://gist.github.com/rzo1/28d493b24307e253635b0af9e78b0b02
- Checked signatures with
https://gist.github.com/rzo1/28d493b24307e253635b0af9e78b0b02
- Build from source with tests
- Run a crawl (opensearch)
+1 (binding)
rzo1 (ppmc + ipmc)
> Am 16.11.2024 um 20:00 schrieb
Might be a good Idea, I guess.
Am 12. November 2024 21:44:29 MEZ schrieb Tim Allison :
>Looks like the last release of Storm was on 2.23.0:
>https://github.com/apache/storm/blob/v2.7.0/pom.xml#L110
>
>Should we downgrade stormcrawler?
>
>On Tue, Nov 12, 2024 at 12:40 PM Richa
Actually, as we are a sole SDK it might be ok to ship because we are not
included in containers, i.e. but rather in the Storm Runtime environment.
I cannot remember on which version Storm 2.7.0 is running but if there is a
difference, we should most likely downgrade.
Gruß and Thx
Richard
Am 1
ien Nioche wrote:
>>> > > Hi,
>>> > >
>>> > > I have left some comments on
>>> > > https://github.com/apache/incubator-stormcrawler/pull/1343
>>> > > Would be great to have more people testing this, any Apache SO
Any other opinions. Otherwise, I will assume lazy consensus.
Gruß
Richard
Am 3. November 2024 15:43:55 MEZ schrieb Julien Nioche :
>+1
>
>jnioche
>
>On Sun, 3 Nov 2024 at 14:40, Richard Zowalla wrote:
>
>> Hi all,
>>
>> we just found, that our Java-base
+1 (no blockers from my side)
Am 4. November 2024 19:16:10 MEZ schrieb Tim Allison :
>There's been quite a bit of recent work. Should we aim for the next release
>in the next week or so? Are there any blockers?
>
>WDYT?
>
>Thank you!
>
>Best,
>
> Tim
Hi all,
we just found, that our Java-based topologies in the Maven archetypes
are actually broken due to missing imports:
https://github.com/apache/incubator-stormcrawler/issues/1389
Since nobody complained and most people are actually using the flux
definition for topologies, I would like to sug
The Apache StormCrawler (Incubating) team is pleased to announce the release of
version 3.1.0 of Apache StormCrawler (Incubating).
StormCrawler is a collection of resources for building low-latency,
customisable and scalable web crawlers on Apache Storm.
Apache StormCrawler (Incubating) 3.1.0 s
Hi folks,
This vote passed with the following votes
+1 fanningpj (IPMC, PPMC, binding)
+1 ayushtkn (IPMC, PPMC, binding)
+1 rzo1 (IPMC, PPMC, binding)
+1 jnioche (PPMC)
I’ll setup the vote on the incubator list.
I will include (a) the correct tag link and (b) the fixed sha512sum in dev/dist
so
an issue now)
* Run a local as well as a remote topology using playwright and opensearch
Gruß
Richard
> Am 13.09.2024 um 11:49 schrieb Richard Zowalla :
>
>
> Hi folks,
>
> I have posted a 1st release candidate for the Apache StormCrawler
> (Incubating) [version] releas
for test case inputs
>> * nexus published jars look ok
>>
>> Some nits that I have raised as issues in the issue tracker.
>>
>> On 2024/09/13 13:40:18 Julien Nioche wrote:
>>> Thanks Richard
>>>
>>> Followed the checklist and ran a crawl with
Hi folks,
I have posted a 1st release candidate for the Apache StormCrawler (Incubating)
[version] release and it is ready for testing.
This is the 2nd release under the ASF umbrella. Notably, it contains the new
playwright module, which can be used for
dynamic scraping.
Thank you to everyo
Great. Let me prepare an RC ;-)
On 2024/09/12 10:01:13 Ayush Saxena wrote:
> +1, good to have regular releases
>
> -Ayush
>
> > On 12 Sep 2024, at 3:27 PM, Julien Nioche wrote:
> >
> > +1
> >
> > Thanks
> >
> >> On Thu, 12 Se
Hi all,
We have some fixes, new features (playwright) and dependency updates inside
StormCrawler.
WDYT about doing 3.1.0?
Gruß
Richard
The Apache StormCrawler (Incubating) team is pleased to announce the release of
version 3.0 of Apache StormCrawler (Incubating).
StormCrawler is a collection of resources for building low-latency,
customisable and scalable web crawlers on Apache Storm.
Apache StormCrawler (Incubating) 3.0 source
Hi all,
Thanks for your review and vote for Apache StormCrawler (Incubating) 3.0
Release Candidate 2
I'm happy to announce the vote has passed:
4 binding votes, 1 non-binding, no +0 or -1 votes.
Thanks for reviewing and voting.
+4 (binding) +1, from:
- PJ Fanning
- Dave Fisher
- Ri
Hi all,
this vote passes with the following +1 being cast:
+1 Tim Allison (binding, PPMC & IPMC)
+1 Josh Fischer
+1 Dave Fisher (binding, mentor & IPMC)
+1 Julien Nioche (PPMC)
+1 Richard Zowalla (binding, PPMC & IPMC)
+1 Ayush Saxena (binding, mentor & IPMC)
I will move
.
- Checked checksums + asc signatures
- Re-checked licenses.
Gruß
Richard
Am Dienstag, dem 07.05.2024 um 11:12 +0200 schrieb Richard Zowalla:
> Hi folks,
>
> I have posted a 2nd release candidate for the Apache StormCrawler
> (Incubating) 3.0 release and it is ready for testing.
>
Great, thanks for your thoughts. I think, we won't re-roll for now (given we
don't know the opinions on general@ and we still can do the LEGAL path in the
near future).
Thx
Richard
Am 10. Mai 2024 16:49:07 MESZ schrieb Dave Fisher :
>
>
>> On May 9, 2024, at 9:50 PM,
>
>> On May 9, 2024, at 1:45 PM, Richard Zowalla wrote:
>>
>> https://dist.apache.org/repos/dist/dev/incubator/stormcrawler/stormcrawler-3.0/
>> is contained in the mail in the "Source" section?
>>
>> The Rat excludes are defined in the r
>Dave
>
>> On May 9, 2024, at 9:45 AM, Tim Allison wrote:
>>
>> +1
>>
>> shasum checks out for source
>> Built locally on ubuntu with Java 17
>>
>> Thank you!
>>
>> On 2024/05/07 09:12:55 Richard Zowalla wrote:
>>> Hi f
>> I checked the signature and checksum and they are good. LICENSE and NOTICE
>> are good for a source release. The GitHub tag include the .github directory
>> and .asf.yaml while the source release adds DEPENDENCIES
>>
>> Best,
>> Dave
>>
>>> On
wrote:
>>
>> +1
>>
>> shasum checks out for source
>> Built locally on ubuntu with Java 17
>>
>> Thank you!
>>
>> On 2024/05/07 09:12:55 Richard Zowalla wrote:
>>> Hi folks,
>>>
>>> I have posted a 2nd release
Hi folks,
I have posted a 2nd release candidate for the Apache StormCrawler
(Incubating) 3.0 release and it is ready for testing.
The previous VOTE was cancelled because building from source (without
an initalized git repo) wasn't possible.
This is our first release after joining the ASF incuba
Guys, I need to re-roll that release.
It isn't buildable from source because of the git formatting plugin
requiring .git to be present.
Stay tuned for the next try.
Am Dienstag, dem 07.05.2024 um 10:04 +0200 schrieb Richard Zowalla:
> Hi folks,
>
> I have posted a first release
Hi folks,
I have posted a first release candidate for the Apache StormCrawler
(Incubating) 3.0 release and it is ready for testing.
This is our first release after joining the ASF incubator as a
poddling. It is a breaking change with renamings in the group ids and
the removal of the elasticsearch
Turns out, that it needs to be setup by INFRA.
I opened https://issues.apache.org/jira/browse/INFRA-25764 for it.
Gruß
Richard
Am Montag, dem 06.05.2024 um 20:58 +0200 schrieb Richard Zowalla:
> Hi all,
>
> I was trying to put up a 3.0 releaes candiate for a vote.
>
> Sadly, i
Hi all,
I was trying to put up a 3.0 releaes candiate for a vote.
Sadly, it fails with an unexpected error (http 400) for deploying to
the staging repositories:
[INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-
deploy-plugin:2.8.2:deploy (default-deploy) on project stormcrawl
Hi folks,
I quickly chatted with Julien off-list.
If no one objects, I am going to propose a first release candidate next week!
Best
Richard
On 2024/05/02 16:30:22 Richard Zowalla wrote:
> Ok so anything else?
> We have release docs available ;-)
> Anyone want to act as Release Manage
rawler/blob/main/core/src/test/java/org/apache/stormcrawler/indexer/BasicIndexingTest.java
>> [2] https://www.apache.org/legal/src-headers#headers
>> [3]
>>
>> https://github.com/apache/incubator-stormcrawler/tree/main/core/src/test/resources
>> [4]
>>
>> https:/
Hi all,
what do we need to do to run our first ASF release?
Personally, I would love to see [1] in 3.0.
Don't think we have any other formal blockers?
Gruß
Richard
[1] https://github.com/apache/incubator-stormcrawler/pull/1199
Hi,
I think we should go for 3.0.0 because of the major breaking change.
In addition, we need to ensure, that SC 2.x users have a smooth
transition and know, that we are now incubating at the ASF.
Guess we should sent some pointers on the old users list on google?
Best
Richard
Am Mittwoch, dem
70 matches
Mail list logo