How were the Parquet files you are trying to read generated? Same
version of libraries? I am successfully using the following Scala code
to read Parquet files using the HadoopInputFormat wrapper. Maybe try
that in Java?
val hadoopInputFormat =
new HadoopInputFormat[Void, GenericRecord](new
Thanks for your detail explanation.
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/manual-scaling-with-savepoint-tp10974p10995.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at
Nabble.com.
I'm having pretty frequent issues with the exception below. It basically always
ends up killing my cluster after forcing a large number of job restarts. I just
can't keep Flink up & running.
I am running Flink 1.1.3 on EMR 5.2.0. I already tried updating the emrfs-site
config fs.s3.maxConnectio
Anyone seen this before:
Caused by: java.io.IOException: Received an event in channel 0 while still
having data from a record. This indicates broken serialization logic. If you
are using custom serialization code (Writable or Value types), check their
serialization routines. In the case of Kryo
Samra, As I was quickly looking at your code I only saw the
ExecutionEnvironment from the read and not the StreamingExecutionEnvironment
for the write. Glad to hear that this worked for batching. Like you, I am very
much a Flink beginner who just happened to have tried out the batch write to
Hi Markus,
Thanks! This was very helpful! I realize what the issue is now. I followed
what you did and I am able to write data to s3 if I do batch processing,
but not stream processing. Do you know what the difference is and why it
would work for one and not the other?
Sam
On Wed, Jan 11, 2017 a
Thank you for the reply . I have found the issue ,my bad I was trying to
write from local intellij i local mode to remote HDFS, if I run execution
mode it works fine now .
On Wed, Jan 11, 2017 at 2:13 AM, Fabian Hueske wrote:
> Hi,
>
> the exception says
> "org.apache.hadoop.hdfs.protocol.Alrea
Sam, Don't point the variables at files, point them at the directories
containing the files. Do you have fs.s3.impl property defined?
Concrete example:
/home/markus/hadoop-config directory has one file "core-site.xml" with
thefollowing content:
fs.s3.impl
org.apache.hadoop.f
Hi,
I think this is a case for the ProcessFunction that was recently added and
will be included in Flink 1.2.
ProcessFunction allows to register timers (so the 5 secs timeout can be
addressed). You can maintain the fault tolerance guarantees if you collect
the records in managed state. That way th
Hi,
Sorry if this was already asked.
For performances reasons (streaming as well as batch) I'd like to "group"
messages (let's say by batches of 1000) before sending them to my sink (kafka,
but mainly ES) so that I have a smaller overhead.
I've seen the "countWindow" operation but if I'm not w
Hi Markus,
Thanks for your help. I created an environment variable in IntelliJ for
FLINK_CONF_DIR to point to the flink-conf.yaml and in it defined
fs.hdfs.hadoopconf to point to the core-site.xml, but when I do that, I get
the error: java.io.IOException: No file system found with scheme s3,
refer
Hi,
what's the best way to read a compressed (bz2 / gz) XML file splitting
it at a specific XML-tag?
So far I've been using hadoop's TextInputFormat in combination with
mahouts XmlInputFormat ([0]) with env.readHadoopFile(). Whereas the
plain TextInputFormat can handle compressed data, the XmlInp
Did you manage to push yet?
Thanks
-Original Message-
From: Jean-Baptiste Onofré [mailto:j...@nanthrax.net]
Sent: Tuesday, December 13, 2016 11:12 AM
To: user@flink.apache.org
Subject: Re: Avro Parquet/Flink/Beam
Hi Billy,
no, ParquetIO is in early stage and won't be included in
0.4.0
Hi,
Ok. I think I get it.
WHAT IF:
Assume we create a addKeyedSource(...) which will allow us to add a source
that makes some guarantees about the data.
And assume this source returns simply the Kafka partition id as the result
of this 'hash' function.
Then if I have 10 kafka partitions I would r
Hello,
I’m trying to understand the guarantees made by Flink’s Elasticsearch sink in
terms of message delivery. according to (1), the ES sink offers at-least-once
guarantees. This page doesn’t differentiate between flink-elasticsearch and
flink-elasticsearch2, so I have to assume for the moment
Hi Estela,
it's correct that Flink's runtime has a dependency on
com.fasterxml.jackson.core:jackson-core:jar:2.7.4:compile
com.fasterxml.jackson.core:jackson-databind:jar:2.7.4:compile
com.fasterxml.jackson.core:jackson-annotations:jar:2.7.4:compile
First of all it's important to understand wher
Hi CVP,
changing the parallelism from 1 to 2 with every TM having only one slot
will inevitably introduce another network shuffle operation between the
sources and the keyed co flat map. This might be the source of your slow
down, because before everything was running on one machine without any
ne
Hi Aljoscha,
I have realized that the output stream is not defined separately in the
code below, and hence the input values are getting in the sink. After
defining a separate output stream it works.
We have now confirmed that the windows are processed separately as per the
groupings.
Thanks.
*
Hi,
(I'm just getting back from holidays, therefore the slow response. Sorry
for that.)
I think you can simulate the way Storm windows work by using a
GlobalWindows assigner and having a custom Trigger and/or Evictor and also
some special logic in your WindowFunction.
About mergeable state, we're
Hi Guiliano,
thanks for bringing up this issue.
A "ClassCastException: X cannot be cast to X" often points to a classloader
issue.
So it might actually be a bug in Flink.
I assume you submit the same application (same jar file) with the same
command right?
Did you cancel the job before resubmitti
Hi,
the exception says
"org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
Failed to CREATE_FILE /y=2017/m=01/d=10/H=18/M=12/_part-0-0.in-progress for
DFSClient_NONMAPREDUCE_1062142735_3".
I would assume that your output format tries to create a file that already
exists.
Maybe you need
Hi,
Flink supports two types of state:
1) Key-partitioned state
2) Non-partitioned operator state (Checkpointed interface)
Key-partitioned state is internally organized by key and can be "simply"
rehashed. The actual implementation is more involved to make this
efficient. This document contains d
22 matches
Mail list logo