Re: Druid Operator to Apache Software Foundation - Call for Maintainers

2025-07-29 Thread Gian Merlino
one after donation? > > > Best regards, > Eyal Yurman > > > On Tue, Jul 29, 2025 at 11:41 AM Gian Merlino wrote: > > > That's a good question. I am not sure, but we can figure it out. > > > > The main question is whether the IP Clearance requirements

Re: Druid Operator to Apache Software Foundation - Call for Maintainers

2025-07-29 Thread Gian Merlino
That's a good question. I am not sure, but we can figure it out. The main question is whether the IP Clearance requirements have been met. I think they *may* have been met with the ICLAs but let's be sure. I think the next step would be to write a summary about the history of the operator, cove

Re: [DISCUSS] Hadoop ingestion support

2025-07-01 Thread Gian Merlino
t; > > > Best, > > Krishna Thirumalasetty > > Roku Inc. > > > > On Tue, Jun 17, 2025 at 3:28 PM Eyal Yurman wrote: > > > > > Sharing as another data point - > > > > > > We still use YARN to run Hadoop-based batch ingestion. Very

Re: [DISCUSS] Dropping support for Java 11

2025-06-18 Thread Gian Merlino
35 - since Hadoop doesn't support java 17 yet, I think that > means we would also have to drop that too. I'm on board, but wondering > if that Is too aggressive? > > On Tue, Jun 17, 2025 at 2:15 PM Gian Merlino wrote: > > > > Actually, I wonder if Druid 35 w

Re: [DISCUSS] Dropping support for Java 11

2025-06-17 Thread Gian Merlino
. We could target an upgrade to Jetty 12 and a dropping of Java 11 both for Druid 35. Gian On 2025/06/17 19:17:22 Gian Merlino wrote: > This sounds good to me. > > On 2025/06/09 20:11:41 Clint Wylie wrote: > > Following up on this, I want to propose the first release of 2026 for &

Re: [DISCUSS] Hadoop ingestion support

2025-06-17 Thread Gian Merlino
I'm on board with this. I also think we should deprecate it ASAP, starting in the next major release. It'd be nice to also build a migration guide that helps people move from Hadoop ingestion to SQL/MSQ ingestion, and from YARN to K8S pod runners. Gian On 2025/06/09 20:10:03 Clint Wylie wrote:

Re: [DISCUSS] Dropping support for Java 11

2025-06-17 Thread Gian Merlino
port will disappoint some users, the benefits for > > > the broader community outweigh the downsides. > > > > > > On Tue, Dec 17, 2024 at 11:32 PM Gian Merlino wrote: > > > > > > > Regarding Hadoop: if core Druid code starts requiring Java 17, we mi

Re: [DISCUSS] Dropping support for Java 11

2025-06-17 Thread Gian Merlino
; support for Java 11. We will never see zero Hadoop usage in the > > > > community. > > > > While dropping Hadoop support will disappoint some users, the benefits > > > > for > > > > the broader community outweigh the downsides. > > > >

[ANNOUNCE] New committer: Cece Mei

2025-06-16 Thread Gian Merlino
Druids, I am pleased to announce that we have invited Cece Mei (apache id: yqm) to be a committer, and she has accepted. Congratulations and welcome! Gian

[ANNOUNCE] New committer: Phua Guan Wei

2025-06-16 Thread Gian Merlino
Druids, I am pleased to announce that we have invited Phua Guan Wei (apache id: gwphua) to become a committer, and he has accepted. Congratulations and welcome! Gian

[ANNOUNCE] New committer: Jesse Tuğlu

2025-06-16 Thread Gian Merlino
Druids, I am please to announce that we have invited Jesse Tuğlu (apache id: tuglu) to become a committer, and he has accepted. Congratulations and welcome! Gian

[ANNOUNCE] New Druid PMC chair: Abhishek Agarwal

2025-05-30 Thread Gian Merlino
Druids, I am pleased to announce that the Druid PMC has decided to begin a tradition of changing chairs periodically, as has been adopted by many other Apache projects. This is a sign of maturity and we believe it will be beneficial for the project's long term health. It has been an honor for

Re: [ANNOUNCE] New Druid PMC chair: Abhishek Agarwal

2025-05-30 Thread Gian Merlino
Lol, oops, you can see at the bottom of my mail that I started from another project's announcement as inspiration. Please ignore the bit about Ruben and focus your congratulations on Abhishek :) Gian On 2025/05/30 23:51:24 Gian Merlino wrote: > Druids, > > I am pleased to ann

Re: [DISCUSS] Ingestion lag problem

2025-04-28 Thread Gian Merlino
I have replied in the issue with some thoughts on the root causes of ingestion lag, and pointers to some recent work on one of the root causes. (I believe there are two roots.) Gian On 2025/04/15 09:29:01 Frank Chen wrote: > Hi Gian and Maytas, > > I'm writing this email to you to bring an old

Re: Remove Google Analytics from the Druid Website

2025-04-28 Thread Gian Merlino
Thank you, I have replied on the PR. On 2025/04/28 14:18:22 Niall Pemberton wrote: > Hi Druid Team, > > I appreciate the work you've done to remove Google Analytics(GA) - thank > you - however, some of the old doc versions still contain it and the > download page. > > I created the following PR

Re: Druid sql upper function

2025-04-11 Thread Gian Merlino
’ … ‘z’ < ‘A’ > < ‘D’, and therefore ‘druid’ < ‘dRuid’ < ‘fluid’ < ‘Druid’ < ‘DRUID’)? > > There’s another collation where ‘d’ < ‘D’ < ‘e’ < ‘E’ etc. This puts ‘druid’, > ‘dRuid’, ‘Druid’, ‘DRUID’ next to each other, so that the query "UPPER(name) &

New label "Upgrade Note"

2025-04-11 Thread Gian Merlino
After the discussion on https://github.com/apache/druid/issues/17891, I added a new label "Upgrade Note": https://github.com/apache/druid/labels/Upgrade%20Note Typically when pulling together release notes, the release manager will look at changes tagged "Release Notes" and "Incompatible". The r

Re: Druid sql upper function

2025-04-11 Thread Gian Merlino
String indexes in Druid are case-sensitive. UPPER (and LOWER) can use the index, but they must do it through index scans, rather than point index lookups. So index usage will be less efficient. Gian On 2025/04/09 18:50:45 vikas thakur wrote: > On Wed, Apr 9, 2025 at 2:49 PM vikas thakur > wrot

Re: CVE-2025-27888: Apache Druid: Server-Side Request Forgery and Cross-Site Scripting

2025-04-05 Thread Gian Merlino
Hi David, For 1) ordinarily we only release fixes for the most recent major version. In this case we did two major versions (31 and 32), because 32 had a major change in that it removed all legacy non-SQL compliant null handling behaviors. If you need a patched version of v29 you can apply the pat

Re: CI stuff

2025-03-19 Thread Gian Merlino
Bit late to the thread, but, thank you! Enhancements to tests are important and I'm happy they run a bit faster now. On Mon, Feb 17, 2025 at 3:45 AM Zoltan Haindrich wrote: > Hey All, > > Recently I've worked with Akshat to improve on the CI pipeline. > Key changes are that the execution of the

Re: Add rb-druid-indexer to druid website src

2025-03-19 Thread Gian Merlino
I see the PR has been approved. congrats! On Thu, Mar 13, 2025 at 3:47 AM Miguel Vicente Álvarez Adsuara < malva...@redborder.com> wrote: > Hi Apache team, > > At redBorder, we developed rb-druid-indexer as one of our internal > services. The idea is to simplify indexing task submissions in distr

Re: [DISCUSS] Dropping support for Java 11

2024-12-17 Thread Gian Merlino
Regarding Hadoop: if core Druid code starts requiring Java 17, we might run into issues with running that core Druid code inside the remote Hadoop M/R processes. People would need to update their YARN runners to Java 17. And given Hadoop doesn't officially support Java 17 yet, this might cause p

Re: [Discuss] deprecating and eventually removing non-sql compatible configs

2024-12-16 Thread Gian Merlino
Thanks Clint. I'm looking forward to the code and testing matrix being simplified! I do want to note that in addition to the defaults being SQL compatible since Druid 28, the docs also list the legacy mode as deprecated. So the timing makes sense for removal in Druid 32. Gian On Mon, Dec 16, 202

New committer: Wei Liu

2024-08-29 Thread Gian Merlino
Hi Druids, The Druid PMC has invited Wei Liu (github username "AlbericByte", apache id "albericbyte") to become a committer and we are pleased to announce that he has accepted. Wei has been a consistent contributor over the past year, with accepted PRs in areas like documentation improvements, de

Druid Summit 2024- save the date + call for proposals

2024-07-25 Thread Gian Merlino
Hi Druids, We are planning to host an in-person Druid Summit event on October 22 in the San Francisco bay area. Registration will be opening soon, so save the date! I am writing to say that if you are interested in speaking, please send in a proposal here: https://forms.gle/R7o9GyV6WBDdR72E9. We

Re: [DISCUSS] Dropping support for Java 8

2024-07-25 Thread Gian Merlino
trictly related, but with 8 still supported maybe > > it just seems like we are kind of slow and cautious, but if we drop 8, > > it seems like our java support is in a strange place of moderately old > > versions if we only officially support 11 and 17, given 21 is also an

Re: Add to mailing list

2024-07-25 Thread Gian Merlino
Hi Razin, You can subscribe to the list by emailing dev-subscr...@druid.apache.org. Please let us know if you have any trouble with that. Thank you, Gian On Wed, Jul 24, 2024 at 9:36 AM Razin Bouzar wrote: > Please add me to the mailing list. > > -- > RAZIN BOUZAR > Monitoring Cloud | Salesfor

Re: [DISCUSS] Dropping support for Java 8

2024-07-16 Thread Gian Merlino
I think this is a good move. Let's give users some warning by deprecating it first prior to removal. IMO, good timing would be to deprecate Java 8 in the next major Druid release (Druid 31). That means a doc update, release note update, and updating the start scripts to log a warning that suppor

Re: [VOTE] Release Apache Druid 29.0.0 [RC1]

2024-02-16 Thread Gian Merlino
eb console is pushing more people to change their arrayIngestMode. Gian On 2024/02/16 22:24:23 Gian Merlino wrote: > I just learned that arrayIngestMode is not actually new, just > https://github.com/apache/druid/pull/15588 is. However this will still make > it more likely that peopl

Re: [VOTE] Release Apache Druid 29.0.0 [RC1]

2024-02-16 Thread Gian Merlino
ink it should wait for 30, given the impact that can happen if people end up with mixed types without planning for it. On Fri, Feb 16, 2024 at 2:16 PM Gian Merlino wrote: > Thanks for managing this release! > > My vote is -0, let me explain why. I am concerned about usability issues >

Re: [VOTE] Release Apache Druid 29.0.0 [RC1]

2024-02-16 Thread Gian Merlino
Thanks for managing this release! My vote is -0, let me explain why. I am concerned about usability issues with the new arrayIngestMode feature. There are various issues when mixing MVD strings and string arrays in the same column: as soon as arrays show up in a column, various "classic MVD-style"

Re: on removing 'auto' strategy from native search query

2023-11-20 Thread Gian Merlino
We don't have usage data, but my sense is that the search query is not commonly used, and among people that use the search query, it's not common to rely on "druid.query.search.searchStrategy: auto". So I think it would be ok to remove the feature and have "auto" be an alias for "useIndexes", espec

Druid Summit 2023 — call for speakers!

2023-09-10 Thread Gian Merlino
Hey Druids, I am excited to write to you about this year's Druid Summit ( https://druidsummit.org/), an event being held virtually on December 5–6, 2023. The call for speakers is open here: https://docs.google.com/forms/d/e/1FAIpQLSfoBZNh_IpSCT59fsYdTSSK92hYa7Rxf_7Fu0yBRCbK8ZwJdg/viewform A title

Re: CVEs in contrib extensions

2023-09-05 Thread Gian Merlino
I think it would be OK to have a policy that contrib extension dependencies are not proactively screened for CVEs. If we adopt such a policy, we do need to make it clear to people that they should do their own screening of any contrib extensions they use. However, we can't extend that policy to sa

Re: New Committer : Soumyava Das

2023-08-23 Thread Gian Merlino
Congratulations!! On Mon, Aug 21, 2023 at 9:13 AM Karan Kumar wrote: > Hello everyone, > > The Project Management Committee (PMC) for Apache Druid has invited > Soumyava to become a committer and we are pleased to announce that > Soumyava has accepted. > > Soumyava has been a consistent contribu

Re: New Committer : Adarsh Sanjeev

2023-08-23 Thread Gian Merlino
Congratulations!! On Mon, Aug 21, 2023 at 8:14 AM Karan Kumar wrote: > Hello everyone, > > The Project Management Committee (PMC) for Apache Druid has invited > Adarsh to become a committer and we are pleased to announce that > Adarsh has accepted. > > Adarsh has been a consistent contributor fo

Re: [DISCUSS] Druid 28 dropping support for Hadoop 2

2023-07-19 Thread Gian Merlino
d off already, and the next release (28) is meant to not have it. Does anyone have some spare cycles to do (2)? Gian On 2023/06/28 06:42:08 Gian Merlino wrote: > I'd like to propose dropping support for Hadoop 2 in Druid 28. Not the very > next release (which I assume will be Druid 27) b

Re: About maintaining the Helm's Chart of Apache Druid

2023-07-17 Thread Gian Merlino
will just remove the > code. > > On Wed, Mar 1, 2023 at 7:14 AM Gian Merlino wrote: > > > Not as far as I _know_, I mean. > > > > On 2023/03/01 01:43:43 Gian Merlino wrote: > > > Not as far as I do. I think we're stuck since nobody has volunteered to > >

Re: group-by v1

2023-07-17 Thread Gian Merlino
+1 to removing it. The only benefit I am aware of is the same one that you mentioned. But I don't think this needs to block removing the old v1 algo. On Wed, Jul 12, 2023 at 4:07 AM Clint Wylie wrote: > Is anyone opposed to removing group-by v1? I think it would allow us > to simplify quite a l

Re: request to join dev group

2023-07-06 Thread Gian Merlino
Hi Tanya, Welcome! You can subscribe by sending an email to dev-subscr...@druid.apache.org. Gian On 2023/07/04 06:41:02 Tanya Mary wrote: > request to join dev group > - To unsubscribe, e-mail: dev-unsubscr...@druid.apache.or

Re: [DISCUSS] Druid 28 dropping support for Hadoop 2

2023-06-29 Thread Gian Merlino
1:53 PM Karan Kumar > wrote: > > > In favour of dropping hadoop 2 support . Another point is the lack of > > security and vulnerability fixes in hadoop2. > > > > > > > > On Wed, Jun 28, 2023 at 12:17 PM Clint Wylie wrote: > > > > >

[DISCUSS] Druid 28 dropping support for Hadoop 2

2023-06-27 Thread Gian Merlino
I'd like to propose dropping support for Hadoop 2 in Druid 28. Not the very next release (which I assume will be Druid 27) but the one after that, likely late 2023 timeframe. In 2021, we had a discussion about moving away from Hadoop 2: https://lists.apache.org/thread/zmc389trnkh6x444so8mdb2h0x0no

Re: Requirements for relaxing restrictions on github actions usage

2023-06-02 Thread Gian Merlino
+1, allowing CI to run without an explicit button push by committers will help encourage new contributors. The requirements seem OK. I looked through our repo and I don't see any external actions (they are all in "github" or "actions"). We do have ".github/workflows/labeler.yml" that fires on p

Roadmap event: call for speakers

2023-05-30 Thread Gian Merlino
Hi Druids, We are looking to put on a virtual event called "Druid.NEXT" in June highlighting things that people in the community are working on. This is a call for speakers for that event! Date is TBD, but likely late June. The event will be on the shorter side, about meetup-length (an hour or tw

Re: Error message: "Error: Resource limit exceeded

2023-05-15 Thread Gian Merlino
Hi Alaka, There's a bit of text cut off in the error message. The full one is something like: "Time ordering is not supported for a Scan query with %,d segments per time chunk and a row limit of %,d. " + "Try reducing your query limit below maxRowsQueuedForOrdering (cu

Re: Question regarding new development

2023-03-27 Thread Gian Merlino
Looks like the conversation is now in https://github.com/apache/druid/issues/13948. On Sat, Mar 18, 2023 at 8:00 AM Sergiu Ungureanu wrote: > Hi Team, > > Yesterday I raised a question in #dev channel in slack > > https://apachedruidworkspace.slack.com/archives/C030CMF6B70/p1679085073683509 > >

CI requiring approval for external contributors

2023-03-27 Thread Gian Merlino
Recently, ASF GitHub repos had their defaults for GitHub Actions changed to "always require approval for external contributors". In Slack, Karan pointed out that Airflow has recently submitted a ticket to have that changed back: https://issues.apache.org/jira/browse/INFRA-24200. IMO, we should do t

Re: About maintaining the Helm's Chart of Apache Druid

2023-02-28 Thread Gian Merlino
Not as far as I _know_, I mean. On 2023/03/01 01:43:43 Gian Merlino wrote: > Not as far as I do. I think we're stuck since nobody has volunteered to do > one of the two necessary things: > > 1) shepherd this code the IP clearance process, or > 2) analyze its provenance enou

Re: About maintaining the Helm's Chart of Apache Druid

2023-02-28 Thread Gian Merlino
Not as far as I do. I think we're stuck since nobody has volunteered to do one of the two necessary things: 1) shepherd this code the IP clearance process, or 2) analyze its provenance enough to determine that IP clearance isn't necessary. If anyone is willing to do one of the above it would be

Re: [Discuss] S3 buckets or IT tests

2023-02-22 Thread Gian Merlino
I think the ticket you're referring to is https://issues.apache.org/jira/browse/INFRA-23952. It would definitely be valuable to run S3 integration tests as part of the automated test suite in GitHub Actions. If Infra is willing to provide a bucket for this purpose then we would certainly be abl

Re: moving druid-core, extendedset, druid-hll into druid-processing

2023-02-06 Thread Gian Merlino
I support this. I don't feel like the separation between core and processing is buying us very much. On Mon, Jan 23, 2023 at 5:12 PM Clint Wylie wrote: > Hi all, > > I want to discuss moving druid-core, extendedset, and druid-hll into > druid-processing to simplify our code structure and depende

Re: [DISCUSS] Release 24.0.1

2022-10-18 Thread Gian Merlino
Thank you for volunteering! On Mon, Oct 17, 2022 at 7:00 AM Kashif Faraz wrote: > Hi Abhishek > > If you haven't started with the release process already, I would like to > volunteer to perform this release so that we can expedite it. > Please let me know if that works for you. > > Regards > Kas

Druid Summit on the road

2022-09-06 Thread Gian Merlino
Hey Druids, I am excited to write to you about upcoming events in this year's edition of Druid Summit, which is being conducted as a series of more local in-person events. I hope it gives you a chance to meet people near you in the Druid community. Attendance is free of charge. I personally will

Re: Intermediate segment persistence

2022-09-06 Thread Gian Merlino
Hey Pramod, If it's a minor change I recommend raising a PR. Generally raising an issue first is a good idea for bigger changes, where it is helpful to have some discussion prior to the code showing up. But for smaller changes, we can go directly to the code. You can post the PR here too, or in S

Re: [E] [DISCUSS] Hadoop 3, dropping support for Hadoop 2.x for 24.0

2022-08-08 Thread Gian Merlino
It's always good to deprecate things for some time prior to removing them, so we don't need to (nor should we) remove Hadoop 2 support right now. My vote is that in this upcoming release, we should deprecate it. The main problem in my eyes is the one Abhishek brought up: the dependency management s

Re: Next Druid release version scheme

2022-07-06 Thread Gian Merlino
users by making too many API changes, look no > further than Guava.) > > Julian > > > On Jul 6, 2022, at 1:53 AM, Gian Merlino wrote: > > My proposal for the next release is that we merely drop the leading "0." > and don't change anything else about our dev pr

Re: Next Druid release version scheme

2022-07-06 Thread Gian Merlino
> Does my current extension work with new releases? > > Can I do a rolling upgrade of druid to the next version? > > > > The more things that are versioned the better, but (2) and (4) have been > > the things that have been most important to me in the past. > > >

Re: [DISCUSS] Removing code related to `FireHose`

2022-07-06 Thread Gian Merlino
I am in favor of immediately removing FiniteFirehoseFactory and marking EventReceiverFirehoseFactory deprecated. Then, later on we can remove InputRowParser and EventReceiverFirehoseFactory. On Fri, Jun 24, 2022 at 4:41 AM Abhishek Agarwal wrote: > I didn’t include them (RealtimeIndexTask and >

Re: Vulnerability Report [Misconfigured DMARC Record Flag]

2022-06-21 Thread Gian Merlino
Hey Zeus, You should have received a response to this report from the Apache Security Team (secur...@apache.org). In the future, please note that security reports should be sent to secur...@apache.org, not the dev list. On Tue, Jun 21, 2022 at 1:04 PM Cyber Zeus wrote: > Hi team > kindly update

New PMC member: Abhishek Agarwal

2022-06-07 Thread Gian Merlino
Hey Druids, The Druid PMC has invited Abhishek Agarwal (asf id abhishek, github id abhishekagarwal87) to become a PMC member, and we are pleased to announce that he has accepted. Abhishek has authored dozens of commits, participated in nearly 200 code reviews, and is release manager for the upcomi

Re: EJB interceptor binding API is not available

2022-06-04 Thread Gian Merlino
Hi Maithri, I haven't encountered something like this before so I'm not sure what's causing it. Is it reproducible? If you could provide some steps for someone else to see the same thing you're seeing — maybe it relies on a particular Java version, or particular Druid version, or something — then

Re: Next Druid release version scheme

2022-05-27 Thread Gian Merlino
Yeah, I'd say the next one after 24.0 would be 25.0. The idea is really just to remove the leading zero and thereby communicate the accurate state of the project: it has been stable and production-ready for a long time. Some people see the leading zero and interpret that as a sign of an immature or

Re: [DISCUSS] Druid 0.23 release

2022-05-26 Thread Gian Merlino
I'm supportive of changing the versioning to something without the leading zero in the next release where this is practical. If it's the one after 0.23.0, then I would go with 24.0. IMO, going with 1.0 would send a message that this is the first mature release. But that isn't the case: we have been

Re: Limitations of automated unused segment kill logic (Issue #10876 and PR #10877)

2022-05-05 Thread Gian Merlino
I just took a look, and it looks like a few other people did too. Sorry it took so long! I do think that "review for a review" is a good way to go, I think! Thanks for volunteering. On Mon, May 2, 2022 at 12:12 PM Lucas Capistrant wrote: > Hi all, > > I'm writing in regards to my enhancement pr

Re: [GitHub] [druid] cryptoe commented on a diff in pull request #12339: Make AWS WebIdentityToken actually working and usable from inside EKS.

2022-04-04 Thread Gian Merlino
I thought these emails were supposed to go to comm...@druid.apache.org? I do see a bunch on that list from today, so maybe this was a weird gitbox snafu. On Sun, Apr 3, 2022 at 10:53 PM GitBox wrote: > > cryptoe commented on code in PR #12339: > URL: https://github.com/apache/druid/pull/12339#di

Re: 0.23

2022-03-24 Thread Gian Merlino
I agree it's a good time to do a release. Most of the release-manager steps involve having commit privileges, but nevertheless, you might find it interesting to read about the process: https://github.com/apache/druid/blob/master/distribution/asf-release-process-guide.md You've actually already don

Multi-stage queries

2022-02-25 Thread Gian Merlino
Hey Druids, I recently posted a proposal on GitHub about adding multi-stage distributed queries to Druid: https://github.com/apache/druid/issues/12262 I think it'll be a powerful advancement in what Druid is capable of, and I'm interested in what people think. It's also going to be a lot of work

Re: Apache Druid Slack

2022-01-21 Thread Gian Merlino
It sounds like a good idea to me. It's not ideal that the current Slack workspace is hard for new people to join. On Thu, Jan 20, 2022 at 10:15 AM Vadim Ogievetsky wrote: > I think that the PMC should create a new Slack channel for Apache Druid and > shift the community towards using it away fro

Re: [E] [DISCUSS] Patch to fix new vulnerabilities in log4j

2021-12-20 Thread Gian Merlino
I think doing a 0.22.2 would be worth it for users' peace of mind, even if Druid isn't vulnerable by default. Just because people are on edge about log4j-related stuff right now. In case other people agree, I created an 0.22.2 branch just now. Is anyone able to release-manage this one? Btw, John a

Re: Apache Druid security advisory: critical vulnerability CVE-2021-44228 in Apache Log4j

2021-12-13 Thread Gian Merlino
To clarify about the mitigations: the "-Dlog4j2.formatMsgNoLookups=true" mitigation that has been floating around the Internet is *not effective* for log4j 2.8.2, which was used by Druid 0.22.0 and other recent versions. If you are going to stay on an older version of Druid, do not use this mitigat

Re: Need Help Benchmarking Druid

2021-12-11 Thread Gian Merlino
Hey Abdel, Feel free to DM me on ASF Slack. The info to join is here: https://druid.apache.org/community/ On Fri, Dec 3, 2021 at 9:11 AM Abdelouahab Khelifati wrote: > Hello, > > I am Abdel, a researcher of Computer Science and I am working on a > benchmarking paper on time series database sys

Re: [RESULT][VOTE] Release Apache Druid 0.22.1 [RC2]

2021-12-11 Thread Gian Merlino
Thank you for running this release! On Sat, Dec 11, 2021 at 12:28 AM Jihoon Son wrote: > Thanks to everyone who participated in the vote! The vote has passed > with 3 binding +1s. > > Gian Merlino: +1 (binding) > Clint Wylie: +1 (binding) > Jonatha

Re: [VOTE] Release Apache Druid 0.22.1 [RC2]

2021-12-10 Thread Gian Merlino
+1 on releasing 0.22.1-rc2 I verified: - hashes / gpg - unit tests - compared the src and bin packages against 0.22.0 to make sure there were no unexpected changes - attempted to trigger the jndi lookup functionality; it triggered on 0.22.0 but not 0.22.1-rc2 - verified that task logs look normal

Re: Log4j vulnerability - hotfix?

2021-12-10 Thread Gian Merlino
nds for Druid are more challenging than for > projects on the slightly newer versions of log4j2, perhaps it would be > appropriate to put out one or two more patch releases, against 0.21 > and/or 0.20? I know our installation is still on 0.21, which is less > than 2 months old. &g

Re: [VOTE] Release Apache Druid 0.22.1 [RC1]

2021-12-10 Thread Gian Merlino
My vote is 0 on this release. I verified the usual things, and compared the src and bin packages against 0.22.0 to make sure there were no unexpected changes. That all looks OK to me. But there is an issue with weird errors at the end of logfiles for processes that exit normally. It's especially n

Re: Log4j vulnerability - hotfix?

2021-12-10 Thread Gian Merlino
We're working on this right now and will be getting a vote / release for 0.22.1 out asap. Btw, the log4j announcement mentions a mitigation that does work for our current version (2.8.2). It's part (b) here, specifying "%m{nolookups}" in the PatternLayout configuration: https://lists.apache.org/th

Re: Need help in understanding real-time ingestion task pause behavior during checkpointing

2021-12-02 Thread Gian Merlino
Harini, those are interesting findings. I'm not sure if the two pauses are necessary, but my thought is that it ideally shouldn't matter because the supervisor shouldn't be taking that long to handle its notices. A couple things come to mind about that: 1) Did you see what specifically the supervi

Re: Push-down of operations for SystemSchema tables

2021-11-29 Thread Gian Merlino
27;re on the same pathway with ordered scan > query, so I could rebase on top of that and break into a smaller set of > PRs, nonetheless the conceptual approach and direction is something that I > think will work. > > Thanks! > Jason > > > > > > > On Wed, May

Re: Druid-specific Calcite keywords

2021-11-05 Thread Gian Merlino
PLE. In > a location that expects an identifier (e.g. after FROM), BERNOULLI > will be converted into an identifier. Thus you can use BERNOULLI as a > table name. > > Julian > > On Thu, Nov 4, 2021 at 2:18 PM Gian Merlino wrote: > > > > Hey Druids, > > > > I&

Druid-specific Calcite keywords

2021-11-04 Thread Gian Merlino
Hey Druids, I'm looking into how to add keywords to Druid's SQL dialect, and I wanted to ask if anyone has enough familiarity with Calcite to point at some info about how to do that without needing to modify Calcite itself?

Druid Summit 2021

2021-09-28 Thread Gian Merlino
Hey Druids, I am excited to write to you about Druid Summit (https://druidsummit.org/), an event being held virtually on November 9–10, 2021. The entire Apache Druid community is welcome, and registration is free. It would also be great to see a bunch of people from the community giving talks abo

Re: [Proposal] - Kafka Input Format for headers, key and payload parsing

2021-09-21 Thread Gian Merlino
ill know > how to use this feature. And it'll help us better understand how it's > supposed to work. (Perhaps it could have answered the two questions above) > > >>> Absolutely agree with you, I will do that along with other review > comments from the code. >

Re: [Proposal] - Kafka Input Format for headers, key and payload parsing

2021-09-16 Thread Gian Merlino
re to get all your replies 🙂 On Tue, Sep 14, 2021 at 10:10 PM Gian Merlino wrote: > Hey Lokesh, > > The concept and API looks solid to me! Thank you for writing this up. I > agree with Ben's comment. This will be really useful functionality. > > I have a few questions ab

Re: compression strategy concurrency

2021-09-14 Thread Gian Merlino
Hey Rahul, What kind of errors are you seeing? I ran the test a few times with a bumped up number of threads, and I did see a few problems but they were in the Closer. It looks like a single Closer is used for every thread, which is bad because Closers are not thread-safe (they are built around an

Re: [Proposal] - Kafka Input Format for headers, key and payload parsing

2021-09-14 Thread Gian Merlino
Hey Lokesh, The concept and API looks solid to me! Thank you for writing this up. I agree with Ben's comment. This will be really useful functionality. I have a few questions about how it would work: 1) How is the timestamp exposed exactly? I see there is a recordTimestampLabelPrefix, but what i

Re: Get Druid Service details in runtime (via extension)

2021-08-23 Thread Gian Merlino
gt; while locating org.apache.druid.discovery.DiscoveryDruidNode > for the 3rd parameter of > com.custom.MyEmitterModule.getEmitter(MyEmitterModule.java:39) > > According to the error, it looks like I cannot add DiscoveryDruidNode > because it does not have @Inject or a zero-argument constructor.

Re: Get Druid Service details in runtime (via extension)

2021-08-22 Thread Gian Merlino
Does the "getNodeRole()" method on DiscoveryDruidNode do what you want? On Fri, Aug 20, 2021 at 3:07 PM Jeet Patel wrote: > Hi all, > > Is there a way to to know what druid services are running in a DruidNode > (Not > talking about the HTTP APIs)? > I went through druid-server module, class > Dr

Re: Apache Druid Project Structure

2021-08-18 Thread Gian Merlino
ew comers who are looking to contribute to the > project and make them feel more confident knowing the project layout. > > Thank you, > Jeet > > On 2021/08/17 17:12:33, Gian Merlino wrote: > > Hey Jeet, > > > > I think it is a case of "it seemed like a goo

Re: Apache Druid Project Structure

2021-08-17 Thread Gian Merlino
Hey Jeet, I think it is a case of "it seemed like a good idea at the time". Some things about the current layout do work well: one is that there is actually a lot of common query engine code between anything that handles queries. That's historical, broker, peon, and indexer. That common query engi

Re: [E] Re: Question about merging groupby v2 spill files

2021-08-11 Thread Gian Merlino
ngineering > > M 508 561 6427 > 1908 S. First St > Champaign, IL 61822 > > <http://www.facebook.com/verizonmedia> <http://twitter.com/verizonmedia> > <https://www.linkedin.com/company/verizon-media/> > <http://www.instagram.com/verizonmedia> > > &g

Re: Question about merging groupby v2 spill files

2021-08-10 Thread Gian Merlino
Hey Will, The sorting that happens on the data servers is really useful, because it means the Broker can do its part of the query fully streaming instead of buffering things up. At one point we had a similar problem in ingestion (you could have a ton of spill files if you had a lot of sketches) a

Re: Interested in contributing an article to your site

2021-07-30 Thread Gian Merlino
Hi Angela, There are a couple of places on the Druid website where we include content from the community. 1) If Sisu Data uses Druid internally, or produces Druid-based products, it would be appropriate to describe Sisu's usage of Druid on our Powered By page: https://druid.apache.org/druid-power

Re: ItemsSketch Aggregator in druid-datasketches extension

2021-07-23 Thread Gian Merlino
er risk of using too much heap memory. The only advantage (2) has is that you don't need a Direct version of the ItemsSketch for it to work. On Fri, Jul 23, 2021 at 1:35 PM Gian Merlino wrote: > Hey Michael, > > Very cool! > > To answer your question: it is critical to have a Bu

Re: ItemsSketch Aggregator in druid-datasketches extension

2021-07-23 Thread Gian Merlino
Hey Michael, Very cool! To answer your question: it is critical to have a BufferAggregator. Some context; there are 3 kinds of aggregators: - Aggregator: stores intermediate state on heap; is used during ingestion and by the non-vectorized timeseries query engine. Required, or else some queries

Re: druid can't parse string

2021-07-16 Thread Gian Merlino
Including the original poster in case they are not on the dev list themselves (hello!). On Fri, Jul 16, 2021 at 9:44 AM Gian Merlino wrote: > Druid stores strings as UTF-8 and from a storage and query basis, it > should work fine with any language. The > "wikiticker-2015-09-12-s

Re: druid can't parse string

2021-07-16 Thread Gian Merlino
Druid stores strings as UTF-8 and from a storage and query basis, it should work fine with any language. The "wikiticker-2015-09-12-sampled.json.gz" dataset used for the tutorial has strings in a variety of languages (check the "page" field): https://druid.apache.org/docs/latest/tutorials/index.htm

Re: A question about a potential bug in Druid Joins

2021-06-24 Thread Gian Merlino
Thanks for writing back and confirming that the suggested workaround is effective. As far as comments on the future: when we do a release with #10942, your original query will work ok too, but I wonder if it might still be better-performing to do the + 0. Long typed keys might be faster than strin

Re: A question about a potential bug in Druid Joins

2021-06-24 Thread Gian Merlino
le, FACTS.api_client_id, FACTS.api_client_count > > FROM FACTS > > LEFT JOIN DIM ON FACTS.api_client_id = DIM.api_client_id > > So the “api_client_id” field is `long` type in both > “inline_data” and “inline_dimension_api_clients_1” datasources. However, > when doing a join, the makeLongProcessor method wi

Re: Enabling dependabot in our github repository

2021-06-08 Thread Gian Merlino
Here's a running list of PRs opened by the dependabot: https://github.com/apache/druid/pulls?q=is%3Apr+author%3Aapp%2Fdependabot On Mon, Jun 7, 2021 at 12:22 PM Gian Merlino wrote: > There's been some extra discussion this PR: > https://github.com/apache/druid/pull/11079 >

Re: Enabling dependabot in our github repository

2021-06-07 Thread Gian Merlino
There's been some extra discussion this PR: https://github.com/apache/druid/pull/11079 I just +1'ed it, but I wanted to come back here to say that IMO, we should avoid getting in the habit of blindly applying these updates without testing. There's been lots of situations in the past where a harmle

  1   2   3   4   5   6   >