I've seen this type of error when using Kryo with a Cascading scheme I'd
created.
In my case it happened when serializing a large object graph, where some of the
classes didn't have no-arg constructors.
The general fix was to set an instantiator strategy for Kryo - see:
https://github.com/Scal
On Thu, Feb 18, 2016 at 6:59 PM, Thomas Lamirault
wrote:
> We are trying flink in HA mode.
Great to hear!
> We set in the flink yaml :
>
> state.backend: filesystem
>
> recovery.mode: zookeeper
> recovery.zookeeper.quorum:
>
> recovery.zookeeper.path.root:
>
> recovery.zookeeper.storageDir:
>
Hi !
We are trying flink in HA mode.
Our application is a streaming application with windowing mechanism.
We set in the flink yaml :
state.backend: filesystem
recovery.mode: zookeeper
recovery.zookeeper.quorum:
recovery.zookeeper.path.root:
recovery.zookeeper.storageDir:
recovery.back
Hello Aljoscha ,
You mentioned: '.. Yes, this is right if you temperatures don’t have any
other field on which you could partition them. '.
What I am failing to understand is that if temperatures are partitioned on
some other field (in my use-case, I have one such: the
temp_reading_timestamp), th
I tried to implement your idea but I'm getting NullPointer exceptions from
the AvroInputFormat any Idea what I'm doing wrong?
See the code below:
public static void main(String[] args) throws Exception {
// set up the execution environment
final StreamExecutionEnvironment env =
StreamExec
I guess I need to set the parallelism for the FlatMap to 1 to make sure I
read one file at a time. The downside I see with this is that I will be not
able to read in parallel from HDFS (and the files are Huge).
I give it a try and see how much performance I loose.
cheers Martin
On Thu, Feb 18, 2
My current solution is:
List paths = new ArrayList();
File dir = new File(BASE_DIR);
for (File f : dir.listFiles()) {
paths.add(f.getName());
}
DataSet mail = env.fromCollection(paths).map(new
FileToString(BASE_DIR)).
The FileToString does basically a map that return FileUtils.toString(new
Hi,
if you want to iterate through a DataSet you can simply use the map
function on the DataSets instead of for loops.
In your example you have nested loops, instead of this you can join the two
datasets
and then perform the map function.
It looks like you may want to implement a k-means algorithm
Hi to all,
I want to apply a map function to every file in a folder. Is there an easy
way (or an already existing InputFormat) to do that?
Best,
Flavio
Thanks Stephan
On Thu, Feb 18, 2016 at 3:00 PM, Stephan Ewen wrote:
> You are right, the checkpoints should contain all offsets.
>
> I created a Ticket for this:
> https://issues.apache.org/jira/browse/FLINK-3440
>
>
>
>
> On Thu, Feb 18, 2016 at 10:15 AM, agaoglu wrote:
>
>> Hi,
>>
>> On a rel
Hello there,
I have been stuck on how to iterate over the DataSet, perform operations
and return a new modified DataSet similar to that of list operation as
shown below.
Eg:
for (Centroid centroid : centroids.collect()) {
for (Tuple2 element : clusteredPoints.collect()) {
//perform nece
Thanks Stephan, problem solved.
here is my configuration
org.apache.maven.plugins
maven-shade-plugin
2.4.3
package
shade
my.main.class
Hi Ufuk - thanks for the 2016 roadmap - glad to see changing parallelism is
the first bullet :) Mesos support also sounds great, we're currently
running job and task managers on Mesos statically via Marathon.
Hi Stephan - thanks, that trick sounds pretty clever, I will try wrapping
my head around
Awesome, thanks Suneel. :D
I made the changes to support our use case, which needed flatMap behavior
(index 2 docs, or zero docs, per incoming element) instead of map, and we
also need to make either IndexRequest or UpdateRequest depending on the
element.
-Zach
On Thu, Feb 18, 2016 at 2:06 AM S
Martin,
I think you can approximate this in an easy way like this:
- On the client, you traverse your directories to collect all files that
you need, collect all file paths in a list.
- Then you have a source "env.fromElements(paths)".
- Then you flatMap and in the FlatMap, run the Avro inp
Hi Zach!
Yes, changing parallelism is pretty high up the priority list. The good
news is that "scaling in" is the simpler part of changing the parallelism
and we are pushing to get that in soon.
Until then, there is only a pretty ugly trick that you can do right now to
"rescale' the state:
1)
Hi!
A lot of those dependencies are pulled in by Hadoop (for example the
configuration / HTTP components).
In 1.0-SNAPSHOT, the HTTP components dependency has been shaded away in
Hadoop, so it should not bother you any more.
One solution you can always do is to "shade" your dependencies in your
You are right, the checkpoints should contain all offsets.
I created a Ticket for this:
https://issues.apache.org/jira/browse/FLINK-3440
On Thu, Feb 18, 2016 at 10:15 AM, agaoglu wrote:
> Hi,
>
> On a related and a more exaggerated setup, our kafka-producer (flume) seems
> to send data to a
Hi guys,
You probably have noticed. I found a lot of old dependencies (http component
3.1/apache configuration 1.6 etc..) in Flink and leads up errors to stuff like
this:
java.lang.NoClassDefFoundError: Could not initialize class
org.apache.http.conn.ssl.SSLConnectionSocketFactory
Is there an
Combiners in streaming are a bit tricky, from their semantics:
1) Combiners always hold data back, through the preaggregation. That adds
latency and also means the values are not in the actual windows
immediately, where a trigger may expect them.
2) In batch, a combiner combines as long as there
Hi Javier,
sorry for the late response. In the Error Mapping of Kafka, it says that
code 15 means: ConsumerCoordinatorNotAvailableCode.
https://github.com/apache/kafka/blob/0.9.0/core/src/main/scala/kafka/common/ErrorMapping.scala
How many brokers did you put into the list of bootstrap servers?
C
They would be awesome, but it’s not yet possible in Flink Streaming, I’m afraid.
> On 18 Feb 2016, at 10:59, Stefano Baghino
> wrote:
>
> I think combiners are pretty awesome for certain cases to minimize network
> usage (the average use case seems to fit perfectly), maybe it would be
> worth
I think combiners are pretty awesome for certain cases to minimize network
usage (the average use case seems to fit perfectly), maybe it would be
worthwhile adding a detailed description of the approach to the docs?
On Thu, Feb 18, 2016 at 10:47 AM, Aljoscha Krettek
wrote:
> @Nirmalya: Yes, this
@Nirmalya: Yes, this is right if you temperatures don’t have any other field on
which you could partition them.
@Stefano: Under some circumstances it would be possible to use a a combiner
(I’m using the name as Hadoop MapReduce would use it, here). When the
assignment of elements to windows hap
Hi,
On a related and a more exaggerated setup, our kafka-producer (flume) seems
to send data to a single partition at a time and switches it every few
minutes. So when i run my flink datastream program for the first time, it
starts on the *largest* offsets and shows something like this:
. Fetched
Thanks, Aljosha, for the explanation. Isn't there a way to apply the
concept of the combiner to a streaming process?
On Thu, Feb 18, 2016 at 3:56 AM, Nirmalya Sengupta <
sengupta.nirma...@gmail.com> wrote:
> Hello Aljoscha
>
> Thanks very much for clarifying the role of Pre-Aggregation (rather
Hey Zach!
Sounds like a great use case.
On Wed, Feb 17, 2016 at 3:16 PM, Zach Cox wrote:
> However, the savepoint docs state that the job parallelism cannot be changed
> over time [1]. Does this mean we need to use the same, fixed parallelism=n
> during reprocessing and going forward? Are there
Thanks Zach, I have a few minor changes too locally; I'll push a PR out
tomorrow that has ur changes too.
On Wed, Feb 17, 2016 at 5:13 PM, Zach Cox wrote:
> I recently did exactly what Robert described: I copied the code from this
> (closed) PR https://github.com/apache/flink/pull/1479, modified
28 matches
Mail list logo