Hi all, Unless there are some objections or new suggestions, I will open a voting thread next Monday, thanks~
Best, Rui On Wed, Nov 15, 2023 at 4:11 PM Rui Fan <1996fan...@gmail.com> wrote: > Thanks to Hangxiang and Feng's feedback! > > Best, > Rui > > On Wed, Nov 15, 2023 at 3:37 PM Feng Jin <jinfeng1...@gmail.com> wrote: > >> Hi, Rui >> >> Thanks for driving this. over LGTM. >> >> I agree that redirecting stdout to LOG at first as it is not the default >> behavior . >> However, if there are other scenarios in the future, we can consider >> adding >> additional parameters to redirect stdout to a separate file. >> >> >> Best, >> Feng >> >> On Wed, Nov 15, 2023 at 2:23 PM Hangxiang Yu <master...@gmail.com> wrote: >> >> > Hi, Rui. >> > Overall LGTM now. >> > Thanks for driving this again! >> > >> > On Wed, Nov 15, 2023 at 2:07 PM Rui Fan <1996fan...@gmail.com> wrote: >> > >> > > Hi Hangxiang, >> > > >> > > Thanks for your feedback! >> > > >> > > > I saw some users may debug operator correctness using print() or >> > > System.out >> > > > which may bring lots of info. >> > > > It's fine if files are separate, otherwise these will make >> > > taskMnanager.log >> > > > roll quickly. >> > > > But I think it's fine if we don't use it as default first. >> > > >> > > Got it, we will support redirect System.out to LOG, and keep >> > > original behaviour(DEFAULT enum) as default value. >> > > >> > > >> > > > > Do you mean like this: LOG.info("taskName {} : {}", taskName, >> > > > > userLogContext)? >> > > > >> > > > Yes, That's what I mean. >> > > >> > > Sounds make sense, I added a new option to FLIP[1], >> > > it's taskmanager.system-out.log.thread-name.enabled: false. >> > > >> > > Why log the thread name here? >> > > >> > > Flink's RichFunction can get subtaskId, but redirecting >> > > System.out to LOG is public code, so it is difficult to get subtaskId. >> > > After consideration, a workaround solution is to log thread name, >> > > because the thread name of the default Task Thread contains the >> > > task name and subtask id. And this solution can support all threads, >> > > because some non-task threads may also call println. >> > > >> > > [1] https://cwiki.apache.org/confluence/x/4guZE >> > > >> > > Best, >> > > Rui >> > > >> > > On Wed, Nov 15, 2023 at 10:26 AM Hangxiang Yu <master...@gmail.com> >> > wrote: >> > > >> > > > Hi, Rui. >> > > > >> > > > As I mentioned before: the user didn't really want >> > > > > to log into taskmanager.out, it just happened by accident. >> > > > > So, if users change the System.out to LOG.info, it still happen. >> > > > >> > > > Thanks for the feedback. >> > > > I saw some users may debug operator correctness using print() or >> > > System.out >> > > > which may bring lots of info. >> > > > It's fine if files are separate, otherwise these will make >> > > taskMnanager.log >> > > > roll quickly. >> > > > But I think it's fine if we don't use it as default first. >> > > > >> > > > Do you mean like this: LOG.info("taskName {} : {}", taskName, >> > > > > userLogContext)? >> > > > > >> > > > >> > > > Yes, That's what I mean. >> > > > >> > > > On Sat, Nov 11, 2023 at 12:54 AM Piotr Nowojski < >> > > piotr.nowoj...@gmail.com> >> > > > wrote: >> > > > >> > > > > Thanks! :) >> > > > > >> > > > > Best, Piotrek >> > > > > >> > > > > czw., 9 lis 2023 o 16:15 Rui Fan <1996fan...@gmail.com> >> napisał(a): >> > > > > >> > > > > > Hi Piotr, >> > > > > > >> > > > > > Thanks for your feedback! >> > > > > > >> > > > > > > Or implement your own loop? It shouldn't be more than a >> couple of >> > > > > lines. >> > > > > > >> > > > > > Implementing it directly is fine, I have updated the FLIP. >> > > > > > And this logic can be found in the `isLineEnded` method. >> > > > > > >> > > > > > Best, >> > > > > > Rui >> > > > > > >> > > > > > On Thu, Nov 9, 2023 at 11:00 PM Piotr Nowojski < >> > > > piotr.nowoj...@gmail.com >> > > > > > >> > > > > > wrote: >> > > > > > >> > > > > > > Hi Rui, >> > > > > > > >> > > > > > > > I see java8 doesn't have `Arrays.equals(int[] a, int >> > aFromIndex, >> > > > int >> > > > > > > > aToIndex, int[] b, int bFromIndex, int bToIndex)`, >> > > > > > > > and java11 has it. Do you have any other suggestions for >> java8? >> > > > > > > >> > > > > > > Maybe use `ByteBuffer.wrap`? >> > > > > > > >> > > > > > > ByteBuffer.wrap(array, ..., >> ...).equals(ByteBuffer.wrap(array2, >> > > ..., >> > > > > > ...)) >> > > > > > > >> > > > > > > This shouldn't have overheads as far as I remember. >> > > > > > > >> > > > > > > Or implement your own loop? It shouldn't be more than a >> couple of >> > > > > lines. >> > > > > > > >> > > > > > > Best, >> > > > > > > Piotrek >> > > > > > > >> > > > > > > czw., 9 lis 2023 o 06:43 Rui Fan <1996fan...@gmail.com> >> > > napisał(a): >> > > > > > > >> > > > > > > > Hi Piotr, Archit, Feng and Hangxiang: >> > > > > > > > >> > > > > > > > Thanks a lot for your feedback! >> > > > > > > > >> > > > > > > > Following is my comment, please correct me if I >> misunderstood >> > > > > anything! >> > > > > > > > >> > > > > > > > To Piotr: >> > > > > > > > >> > > > > > > > > Is there a reason why you are suggesting to copy out bytes >> > from >> > > > > `buf` >> > > > > > > to >> > > > > > > > `bytes`, >> > > > > > > > > instead of using `Arrays.equals(int[] a, int aFromIndex, >> int >> > > > > > aToIndex, >> > > > > > > > int[] b, int bFromIndex, int bToIndex)`? >> > > > > > > > >> > > > > > > > I see java8 doesn't have `Arrays.equals(int[] a, int >> > aFromIndex, >> > > > int >> > > > > > > > aToIndex, int[] b, int bFromIndex, int bToIndex)`, >> > > > > > > > and java11 has it. Do you have any other suggestions for >> java8? >> > > > > > > > >> > > > > > > > Also, this code doesn't run in production. As the comment of >> > > > > > > > System.lineSeparator(): >> > > > > > > > >> > > > > > > > > On UNIX systems, it returns {@code "\n"}; on Microsoft >> > > > > > > > > Windows systems it returns {@code "\r\n"}. >> > > > > > > > >> > > > > > > > So Mac and Linux just return one character, we will compare >> > > > > > > > one byte directly. >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > To Feng: >> > > > > > > > >> > > > > > > > > Will they be written to the taskManager.log file by >> default >> > > > > > > > > or the taskManager.out file? >> > > > > > > > >> > > > > > > > I prefer LOG as the default value for >> > > taskmanager.system-out.mode. >> > > > > > > > It's useful for job stability and doesn't introduce >> significant >> > > > > impact >> > > > > > to >> > > > > > > > users. Also, our production has already used this feature >> for >> > > > > > > > more than 1 years, it works well. >> > > > > > > > >> > > > > > > > However, I write the DEFAULT as the default value for >> > > > > > > > taskmanager.system-out.mode, because when the community >> > > introduces >> > > > > > > > new options, the default value often selects the original >> > > behavior. >> > > > > > > > >> > > > > > > > Looking forward to hear more thoughts from community about >> this >> > > > > > > > default value. >> > > > > > > > >> > > > > > > > > If we can make taskmanager.out splittable and rolling, >> would >> > it >> > > > be >> > > > > > > > > easier for users to use this feature? >> > > > > > > > >> > > > > > > > Making taskmanager.out splittable and rolling is a good >> choice! >> > > > > > > > I have some concerns about it: >> > > > > > > > >> > > > > > > > 1. Users may also want to use LOG.info in their code and >> just >> > > > > > > > accidentally use System.out.println. It is possible that >> they >> > > > will >> > > > > > > > also find the logs directly in taskmanager.log. >> > > > > > > > 2. I'm not sure whether the rolling strategy is easy to >> > > implement. >> > > > > > > > If we do it, it's necessary to define a series of flink >> > options >> > > > > > similar >> > > > > > > > to log options, such as: fileMax(how many files should be >> > > > > retained), >> > > > > > > > fileSize(The max size each file), fileNamePatten (The >> suffix >> > of >> > > > > file >> > > > > > > > name), >> > > > > > > > 3. Check the file size periodically: all logs are written by >> > log >> > > > > > plugin, >> > > > > > > > they can check the log file size after writing. However, >> > > > System.out >> > > > > > > > are written directly. And flink must start a thread to >> check >> > > the >> > > > > > latest >> > > > > > > > taskmanager.out size periodically. If it's too quick, >> most of >> > > job >> > > > > > > aren't >> > > > > > > > necessary. If it's too slow, the file size cannot be >> > controlled >> > > > > > > properly. >> > > > > > > > >> > > > > > > > Redirect it to LOG.info may be a reasonable and easy choice. >> > > > > > > > The user didn't really want to log into taskmanager.out, it >> > just >> > > > > > > > happened by accident. >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > To Hangxiang: >> > > > > > > > >> > > > > > > > > 1. I have a similar concern as Feng. Will we redirect to >> > > another >> > > > > log >> > > > > > > file >> > > > > > > > > not taskManager.log ? >> > > > > > > > >> > > > > > > > Please see my last comment, thanks! >> > > > > > > > >> > > > > > > > > taskManager.log contains lots of important information >> like >> > > init >> > > > > log. >> > > > > > > It >> > > > > > > > > will be rolled quickly if we redirect out and error here. >> > > > > > > > >> > > > > > > > IIUC, this issue isn't caused by System.out, and it can >> happen >> > if >> > > > > user >> > > > > > > > call a lot of LOG.info. As I mentioned before: the user >> didn't >> > > > really >> > > > > > > want >> > > > > > > > to log into taskmanager.out, it just happened by accident. >> > > > > > > > So, if users change the System.out to LOG.info, it still >> > happen. >> > > > > > > > >> > > > > > > > > 2. Since we have redirected to LOG mode, Could we also log >> > the >> > > > > > subtask >> > > > > > > > info >> > > > > > > > > ? It may help us to debug granularly. >> > > > > > > > >> > > > > > > > I'm not sure what `log the subtask info` means. Let me >> confirm >> > > with >> > > > > you >> > > > > > > > first. >> > > > > > > > Do you mean like this: LOG.info("taskName {} : {}", >> taskName, >> > > > > > > > userLogContext)? >> > > > > > > > >> > > > > > > > Best, >> > > > > > > > Rui >> > > > > > > > >> > > > > > > > On Thu, Nov 9, 2023 at 11:47 AM Hangxiang Yu < >> > > master...@gmail.com> >> > > > > > > wrote: >> > > > > > > > >> > > > > > > > > Hi, Rui. >> > > > > > > > > Thanks for the proposal. It sounds reasonable. >> > > > > > > > > I have some questions, PTAL: >> > > > > > > > > 1. I have a similar concern as Feng. Will we redirect to >> > > another >> > > > > log >> > > > > > > file >> > > > > > > > > not taskManager.log ? >> > > > > > > > > taskManager.log contains lots of important information >> like >> > > init >> > > > > log. >> > > > > > > It >> > > > > > > > > will be rolled quickly if we redirect out and error here. >> > > > > > > > > 2. Since we have redirected to LOG mode, Could we also log >> > the >> > > > > > subtask >> > > > > > > > info >> > > > > > > > > ? It may help us to debug granularly. >> > > > > > > > > >> > > > > > > > > On Thu, Nov 9, 2023 at 9:47 AM Feng Jin < >> > jinfeng1...@gmail.com >> > > > >> > > > > > wrote: >> > > > > > > > > >> > > > > > > > > > Hi, Rui. >> > > > > > > > > > >> > > > > > > > > > Thank you for initiating this proposal. >> > > > > > > > > > >> > > > > > > > > > I have a question regarding redirecting stdout and >> stderr >> > to >> > > > LOG: >> > > > > > > > > > >> > > > > > > > > > Will they be written to the taskManager.log file by >> default >> > > or >> > > > > the >> > > > > > > > > > taskManager.out file? >> > > > > > > > > > If we can make taskmanager.out splittable and rolling, >> > would >> > > it >> > > > > be >> > > > > > > > easier >> > > > > > > > > > for users to use this feature? >> > > > > > > > > > >> > > > > > > > > > Best, >> > > > > > > > > > Feng >> > > > > > > > > > >> > > > > > > > > > On Thu, Nov 9, 2023 at 3:15 AM Archit Goyal >> > > > > > > > <argo...@linkedin.com.invalid >> > > > > > > > > > >> > > > > > > > > > wrote: >> > > > > > > > > > >> > > > > > > > > > > Hi Rui, >> > > > > > > > > > > >> > > > > > > > > > > Thanks for the proposal. >> > > > > > > > > > > >> > > > > > > > > > > The proposed solution of supporting System out and >> err to >> > > be >> > > > > > > > redirected >> > > > > > > > > > to >> > > > > > > > > > > LOG or discarded and introducing an enum and two >> options >> > to >> > > > > > manage >> > > > > > > > > this, >> > > > > > > > > > > seems reasonable. >> > > > > > > > > > > >> > > > > > > > > > > +1 >> > > > > > > > > > > >> > > > > > > > > > > Thanks, >> > > > > > > > > > > Archit Goyal >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > From: Piotr Nowojski <pnowoj...@apache.org> >> > > > > > > > > > > Date: Wednesday, November 8, 2023 at 7:38 AM >> > > > > > > > > > > To: dev@flink.apache.org <dev@flink.apache.org> >> > > > > > > > > > > Subject: Re: [DISCUSS] FLIP-390: Support System out >> and >> > err >> > > > to >> > > > > be >> > > > > > > > > > > redirected to LOG or discarded >> > > > > > > > > > > Hi Rui, >> > > > > > > > > > > >> > > > > > > > > > > Thanks for the proposal. >> > > > > > > > > > > >> > > > > > > > > > > +1 I don't have any major comments :) >> > > > > > > > > > > >> > > > > > > > > > > One nit. In `SystemOutRedirectToLog` in this code: >> > > > > > > > > > > >> > > > > > > > > > > System.arraycopy(buf, count - >> > > > LINE_SEPARATOR_LENGTH, >> > > > > > > > bytes, >> > > > > > > > > 0, >> > > > > > > > > > > LINE_SEPARATOR_LENGTH); >> > > > > > > > > > > return Arrays.equals(LINE_SEPARATOR_BYTES, >> > > bytes) >> > > > > > > > > > > >> > > > > > > > > > > Is there a reason why you are suggesting to copy out >> > bytes >> > > > from >> > > > > > > `buf` >> > > > > > > > > to >> > > > > > > > > > > `bytes`, >> > > > > > > > > > > instead of using `Arrays.equals(int[] a, int >> aFromIndex, >> > > int >> > > > > > > > aToIndex, >> > > > > > > > > > > int[] b, int bFromIndex, int bToIndex)`? >> > > > > > > > > > > >> > > > > > > > > > > Best, >> > > > > > > > > > > Piotrek >> > > > > > > > > > > >> > > > > > > > > > > śr., 8 lis 2023 o 11:53 Rui Fan <1996fan...@gmail.com >> > >> > > > > > napisał(a): >> > > > > > > > > > > >> > > > > > > > > > > > Hi all! >> > > > > > > > > > > > >> > > > > > > > > > > > I would like to start a discussion of FLIP-390: >> Support >> > > > > System >> > > > > > > out >> > > > > > > > > and >> > > > > > > > > > > err >> > > > > > > > > > > > to be redirected to LOG or discarded[1]. >> > > > > > > > > > > > >> > > > > > > > > > > > In various production environments, either cloud >> native >> > > or >> > > > > > > physical >> > > > > > > > > > > > machines, the disk space that Flink TaskManager can >> use >> > > is >> > > > > > > limited. >> > > > > > > > > > > > >> > > > > > > > > > > > In general, the flink users shouldn't use the >> > > > > > > `System.out.println` >> > > > > > > > in >> > > > > > > > > > > > production, >> > > > > > > > > > > > however this may happen when the number of Flink >> jobs >> > and >> > > > job >> > > > > > > > > > developers >> > > > > > > > > > > > is very large. Flink job may use System.out to >> output a >> > > > large >> > > > > > > > amount >> > > > > > > > > of >> > > > > > > > > > > > data >> > > > > > > > > > > > to the taskmanager.out file. This file will not >> roll, >> > it >> > > > will >> > > > > > > > always >> > > > > > > > > > > > increment. >> > > > > > > > > > > > Eventually the upper limit of what the TM can be >> used >> > for >> > > > is >> > > > > > > > reached. >> > > > > > > > > > > > >> > > > > > > > > > > > We can support System out and err to be redirected >> to >> > LOG >> > > > or >> > > > > > > > > discarded, >> > > > > > > > > > > > the LOG can roll and won't increment forever. >> > > > > > > > > > > > >> > > > > > > > > > > > This feature is useful for SREs who maintain Flink >> > > > > > environments, >> > > > > > > > they >> > > > > > > > > > can >> > > > > > > > > > > > redirect System.out to LOG by default. Although the >> > cause >> > > > of >> > > > > > this >> > > > > > > > > > problem >> > > > > > > > > > > > is >> > > > > > > > > > > > that the user's code is not standardized, for SRE, >> > > pushing >> > > > > > users >> > > > > > > to >> > > > > > > > > > > modify >> > > > > > > > > > > > the code one by one is usually a very time-consuming >> > > > > operation. >> > > > > > > > It's >> > > > > > > > > > also >> > > > > > > > > > > > useful for job stability where System.out is >> > accidentally >> > > > > used. >> > > > > > > > > > > > >> > > > > > > > > > > > Looking forward to your feedback, thanks~ >> > > > > > > > > > > > >> > > > > > > > > > > > [1] >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fx%2F4guZE&data=05%7C01%7Cargoyal%40linkedin.com%7C937821de7bd846e6b97408dbe070beae%7C72f988bf86f141af91ab2d7cd011db47%7C0%7C0%7C638350547072823674%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=zEv6B0Xiq2SNuU6Fm%2BAXnH%2BRILbm6Q0uDRbN7h6iaPM%3D&reserved=0 >> > > > > > > > > > > <https://cwiki.apache.org/confluence/x/4guZE> >> > > > > > > > > > > > >> > > > > > > > > > > > Best, >> > > > > > > > > > > > Rui >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > -- >> > > > > > > > > Best, >> > > > > > > > > Hangxiang. >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > >> > > > -- >> > > > Best, >> > > > Hangxiang. >> > > > >> > > >> > >> > >> > -- >> > Best, >> > Hangxiang. >> > >> >