I think that's actually very use case specific.
You're code will never see the malformed record because it is dropped by
the input format.
Other applications might rely on complete input and would prefer an
exception to be notified about invalid input.
Flink's CsvInputFormat has a parameter "lenie
Yes it's,
I'm checking number of columns per line to filter out mailformed
*Sent from my ZenFone
On Oct 8, 2015 1:19 PM, "Stephan Ewen" wrote:
> There is probably a different CSV input format implementation which drops
> invalid lines (too long lines).
>
> Is that actually desired behavior, simp
There is probably a different CSV input format implementation which drops
invalid lines (too long lines).
Is that actually desired behavior, simply dropping malformatted input?
On Thu, Oct 8, 2015 at 7:12 PM, KOSTIANTYN Kudriavtsev <
kudryavtsev.konstan...@gmail.com> wrote:
> Hm, you was write
>
Hm, you was write
I checked all files, one by one and found an issue with a line in one of
them... It's really unexpected for me as far as I run spark job on the same
dataset and "wrong" rows were filtered out without issues.
Thanks for help!
Thank you,
Konstantin Kudryavtsev
On Thu, Oct 8, 201
Ah, that makes sense!
The problem is not in the core runtime, it is in the delimited input
format. It probably looks for the line split character and never finds it,
so that it starts buffering a super large line (gigabytes) which leads to
the OOM exception.
Can you check whether the line split c
What's confuse me, I'm running Flink on yarn with the following
command: ./yarn-session.sh -n 4 -jm 2096 -tm 5000
so I expect to have TaskManager with almost 5GB ram available, but
taskmanager manel I found that each task manager has the following conf:
Flink Managed Memory: 2460 mb
CPU cores: 4
10/08/2015 16:25:48 CHAIN DataSource (at
com.epam.AirSetJobExample$.main(AirSetJobExample.scala:31)
(org.apache.flink.api.java.io.TextInputFormat)) -> Filter (Filter at
com.epam.AirSetJobExample$.main(AirSetJobExample.scala:31)) -> FlatMap
(FlatMap at count(DataSet.scala:523))(1/1) switched to
Can you paste the exception stack trace?
On Thu, Oct 8, 2015 at 6:15 PM, KOSTIANTYN Kudriavtsev <
kudryavtsev.konstan...@gmail.com> wrote:
> It's DataSet program that performs simple filtering, crossjoin and
> aggregation.
>
> I'm using Hadoop S3 FileSystem (not Emr) as far as Flink's s3 connecto
It's DataSet program that performs simple filtering, crossjoin and
aggregation.
I'm using Hadoop S3 FileSystem (not Emr) as far as Flink's s3 connector
doesn't work at all.
Currently I have 3 taskmanagers each 5k MB, but I tried different
configurations and all leads to the same exception
*Sent
Hi Konstantin,
Flink uses managed memory only for its internal processing (sorting, hash
tables, etc.).
If you allocate too much memory in your user code, it can still fail with
an OOME. This can also happen for large broadcast sets.
Can you check how much much memory the JVM allocated and how muc
Can you give us a bit more background? What exactly is your program doing?
- Are you running a DataSet program, or a DataStream program?
- Is it one simple source that reads from S3, or are there multiple
sources?
- What operations do you apply on the CSV file?
- Are you using Flink's S3
Hi guys,
I'm running FLink on EMR with 2 m3.xlarge (each 16 GB RAM) and trying to
process 3.8 GB CSV data from S3. I'm surprised the fact that Flink failed
with OutOfMemory: Java Heap space
I tried to find the reason:
1) to identify TaskManager with a command ps aux | grep TaskManager
2) then bui
12 matches
Mail list logo