Thank you Kostas for spending time on my case.
Relating to the issue I mentioned, I have another issue caused by having a
lot of files to list. From the error msg, I understand that the listing was
taking more than 30s, and the JM thought that it hung and killed it. Is that
possible to increase th
I see,
Thanks for the clarification.
Cheers,
Kostas
> On Sep 25, 2018, at 8:51 AM, Averell wrote:
>
> Hi Kostas,
>
> I use PROCESS_CONTINUOUSLY mode, and checkpoint interval of 20 minutes. When
> I said "Within that 15 minutes, checkpointing process is not triggered
> though" in my previous e
Hi Kostas,
I use PROCESS_CONTINUOUSLY mode, and checkpoint interval of 20 minutes. When
I said "Within that 15 minutes, checkpointing process is not triggered
though" in my previous email, I was not complaining that checkpoint is not
running, but to say that the slowness is not due to ongoing chec
Hi Averell,
Can you describe your settings in a bit more detail?
For example, are you reading in PROCESS_CONTINUOUSLY mode or PROCESS_ONCE?
What is your checkpoint interval?
The above are to understand why checkpoints are not processed within these 15
min.
Kostas
> On Sep 25, 2018, at 8:08 AM
Hi Kostas,
Yes, applying the filter on the 100K files takes time, and the delay of 15
minutes I observed definitely caused by that big number of files and the
cost of each individual file status check. However, the delay is much
smaller when checkpointing is off.
Within that 15 minutes, checkpoint
Hi Averell,
Happy to hear that the problem is no longer there and if you have more news
from your
debugging, let us know.
The thing that I wanted to mention is that from what you are describing, the
problem does
not seem to be related to checkpointing, but to the fact that applying your
filt
Hi Vino, and all,
I tried to avoid the step to get File Status, and found that the problem is
not there any more. I guess doing that with every single file out of 100K+
files on S3 caused some issue with checkpointing.
Still trying to find the cause, but with lower priority now.
Thanks for your h
Please refer to this version:
===
import java.util.Date
import org.apache.flink.api.common.io.FilePathFilter
import org.apache.flink.core.fs.Path
import org.slf4j.LoggerFactory
object SdcFilePathFilter {
private val TIME_FORMAT = new java.text.SimpleDateFormat("MMdd
hhmm
Hi Vino,
I am using a custom FileInputFormat, but the mentioned problem only comes
when I try a custom FilePathFilter.
My whole file for that custom FilePathFilter is quoted below.
Regarding enabling DEBUG, which classes/packages should I turn DEBUG on? as
I am afraid that turning DEBUG on at t
Hi Averell,
Is this all the custom code for "CustomFileSource"?
If not, can you share the entire file with us, and if you can set the log
level to DEBUG, it will help you analyze and locate the problem.
If you can't come to a conclusion, you can share the log with us.
Thanks, vino.
Averell 于20
Good day everyone,
I have about 100 thousand files to read, and a custom FilePathFilter with a
simple filterPath method defined as below (the custom part is only to check
file-size and skip files with size = 0)
override def filterPath(filePath: Path): Boolean = {
filePath
11 matches
Mail list logo