Logged FLINK-5517 for upgrading hbase version to 1.3.0
On Mon, Jan 16, 2017 at 5:26 PM, Ted Yu wrote:
> hbase uses Guava 12.0.1 and Flink uses 18.0 where Stopwatch.()V is
> no longer accessible.
> HBASE-14963 removes the use of Stopwatch at this location.
>
> hbase 1.3.0 RC has passed voting per
hbase uses Guava 12.0.1 and Flink uses 18.0 where Stopwatch.()V is no
longer accessible.
HBASE-14963 removes the use of Stopwatch at this location.
hbase 1.3.0 RC has passed voting period.
Please use 1.3.0 where you wouldn't see the IllegalAccessError
On Mon, Jan 16, 2017 at 4:50 PM, Giuliano Ca
Hello,
I'm trying to use HBase on one of my stream transformations and I'm running
into the Guava/Stopwatch dependency problem
java.lang.IllegalAccessError: tried to access method
com.google.common.base.Stopwatch.()V from class
org.apache.hadoop.hbase.zookeeper.MetaTableLocator
Reading on the p
Hi,
there are only *two *interfaces defined at the moment:
*OneInputStreamOperator*
and
*TwoInputStreamOperator.*
Is there any way to define an operator with arbitrary number of inputs?
My another concern is how to maintain *backpressure *in the operator?
Let's say I read events from two Kafka s
Hi Gordon,
Thanks for getting back to me. The ticket looks good, but I’m going to need to
do something similar for our homegrown sinks. It sounds like just having the
affected sinks participate in checkpointing is enough of a solution - is there
anything special about `SinkFunction[T]` extendin
Thanks very much.
Regards,
Charith
On Mon, Jan 16, 2017 at 12:49 PM, Ufuk Celebi wrote:
> This is exposed via the REST API under:
>
> /jobs/:jobid/vertices/:vertexid/subtasktimes
>
> The SubtasksTimesHandler serves this endpoint.
>
> – Ufuk
>
>
> On Mon, Jan 16, 2017 at 6:20 PM, Charith Wickram
Can you check the log files of the TaskManagers and JobManager?
There is no obvious reason that the collection should not work.
On another note: the rolling file sink might be what you are looking for.
https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/filesystem_sink.htm
This is exposed via the REST API under:
/jobs/:jobid/vertices/:vertexid/subtasktimes
The SubtasksTimesHandler serves this endpoint.
– Ufuk
On Mon, Jan 16, 2017 at 6:20 PM, Charith Wickramarachchi
wrote:
> Thanks very much. I noticed the times recorded in the web interface. I was a
> little he
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512
I would put this differently: "auto.offset.reset" policy is only used,
if there are no valid committed offsets for a topic.
See here:
http://stackoverflow.com/documentation/apache-kafka/5449/consumer-groups
- -and-offset-management
(don't be confus
You also need to have a new for this to work.
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-read-from-a-Kafka-topic-from-the-beginning-tp3522p11087.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at
Thanks very much. I noticed the times recorded in the web interface. I was
a little hesitant to use that as I might make mistake when recording the
times. Could you please point me to the code in flink these runtimes are
computed?
Thanks,
Charith
On Mon, Jan 16, 2017 at 3:15 AM, Ufuk Celebi wrot
+1 to what Fabian said. Regarding the memory consumption: Flink's back
pressure mechanisms also depends on this, because the availability of
(network) buffers determines how fast operator can produce data. If no
buffers are available, the producing operator will slow down.
On Mon, Jan 16, 2017 at
One job can have multiple data sources. You would then have one stream per
source like this:
You can then create separate operator graphs from this so that you have
three
separate computations on the data, maybe like this:
They will be executed separately. However, if you don't want to combine t
Yes, exactly. This is a little cumbersome at the moment, but there are plans to
improve this after 1.2 is released.
– Ufuk
On 16 January 2017 at 16:33:49, tao xiao (xiaotao...@gmail.com) wrote:
> Hi Ufuk,
>
> Thank you for the reply. I want to know what the difference is between
> state.backen
Hi Abdul,
You may want to check out FLIP13 "side output" https://goo.gl/6KSYd0 . Once
we have this feature, you should be able to collect the data to the
external distributed storage, and use these data later on demand.
BTW, can you explain your use case in more details, such that people here
may h
Hi Ufuk,
Thank you for the reply. I want to know what the difference is between
state.backend.fs.checkpoint.dir
and state.checkpoints.dir in this case? Does state.checkpoint.dir store the
metadata that points to the checkpoint that is stored in
state.backend.fs.checkpoint.dir?
On Mon, 16 Jan 2017
Hi Miguel,
thank you for opening the issue!
Changes/improvements to the documentation are also typically handled with
JIRAs and pull requests [1]. Would you like to give it a try and improve
the community detection docs?
Cheers,
-Vasia.
[1]: https://flink.apache.org/contribute-documentation.html
Hi Nico, Ufuk,
Thanks for diving into this issue.
@Nico
I don't think that's the problem. The code can be exactly reproduced in
java. I am using other constructor for ListDescriptor than you did:
You used:
> public ListStateDescriptor(String name, TypeInformation typeInfo)
>
While I used:
>
First issue is not a problem with idiomatic Scala - we make all our data
objects immutable.
Second.. yeah, I guess it makes sense.
Thanks for clarification.
Best regards,
Dmitry
On Mon, Jan 16, 2017 at 1:27 PM, Fabian Hueske wrote:
> One of the reasons is to ensure that data cannot be modified
One of the reasons is to ensure that data cannot be modified after it left
a thread.
A function that emits the same object several times (in order to reduce
object creation & GC) might accidentally modify emitted records if they
would be put as object in a queue.
Moreover, it is easier to control t
Hi Ufuk,
Do you know what's the reason for serialization of data between different
threads?
Also, thanks for the link!
Best regards,
Dmitry
On Mon, Jan 16, 2017 at 1:07 PM, Ufuk Celebi wrote:
> Hey Dmitry,
>
> this is not possible if I'm understanding you correctly.
>
> A task chain is execut
Hey Dmitry,
this is not possible if I'm understanding you correctly.
A task chain is executed by a single task thread and hence it is not
possible to continue processing before the record "leaves" the thread,
which only happens when the next task thread or the network stack
consumes it.
Hand ove
Hi Dawid,
regarding the original code, I couldn't reproduce this with the Java code I
wrote and my guess is that the second parameter of the ListStateDescriptor is
wrong:
.asQueryableState(
"type-time-series-count",
new ListStateDescriptor[KeyedDataPoint[java.lang.Integer]]
Hey Dawid! I talked offline with Nico last week and he took this over.
He also suggested to remove the list queryable state variant
altogether which makes a lot of sense to me (at least with the current
state of things). @Nico: could you open an issue for it?
Nico also found a difference in your c
Hello,
I created the JIRA issue at:
https://issues.apache.org/jira/browse/FLINK-5506
Is it possible to submit suggestions to the documentation?
If so, where can I do so?
I actually did this based on the example at this page (possible Flink
versions aside):
https://flink.apache.org/news/2015/08/
Hey!
This is possible with the upcoming 1.2 version of Flink (also in the
current snapshot version):
https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/checkpoints.html#externalized-checkpoints
You have to manually activate it via the checkpoint config (see docs).
Ping me if you h
Hi Aljoscha,
is was able to identify the root cause of the problem. It is my first map
function using the ValueState. But first, the assignTimestampsAndWatermarks()
is called after the connector to Kafka is generated:
FlinkKafkaConsumer09 carFlinkKafkaConsumer09 =
new FlinkKafkaConsumer09<
Unfortunately no. You only get the complete execution time after the
job has finished. What you can do is browse to the web interface and
check the runtime for each operator there (asumming that each
iterative process is a separate operator).
Does this help?
On Mon, Jan 16, 2017 at 7:36 AM, Chari
A user reported that outer joins on the Table API and SQL compute wrong
results:
https://issues.apache.org/jira/browse/FLINK-5498
2017-01-15 20:23 GMT+01:00 Till Rohrmann :
> I found two problematic issues with Mesos HA mode which breaks it:
>
> https://issues.apache.org/jira/browse/FLINK-5495
>
Hi!
State is only persisted as part of checkpoints or savepoints.
If both are not used, then state is not persistent across restarts.
Stephan
On Mon, Jan 16, 2017 at 8:47 AM, Janardhan Reddy <
janardhan.re...@olacabs.com> wrote:
> Hi,
>
> Is the value state persisted across job restart/deploym
Hi!
I think Yury pointed out the correct diagnosis. Caching the classes across
multiple jobs in the same session can cause these types of issues.
For YARN single-job deployments, Flink 1.2 will not to any dynamic
classloading any more, but start with everything in the application
classpath.
For Y
@Giuliano: any updates? Very curious to figure out what's causing
this. As Fabian said, this is most likely a class loading issue.
Judging from the stack trace, you are not running with YARN but a
standalone cluster. Is that correct? Class loading wise nothing
changed between Flink 1.1 and Flink 1.
Hi Niels,
I think the biggest problem for keyed sources is that Flink must be able to
co-locate key-partitioned state with the pre-partitioned data.
This might work, if the key is the partition ID, i.e, not the original key
attribue that was hashed to assign events to partitions.
Flink could need
Hi Fabian,
A case consists of all events sharing the same case id. This id is
what we initially key the stream by.
The order of these events is the trace.
For example,
caseid: case1, consisting of event1, event2, event3. Start time 11:00,
end 11:05, run time 5 minutes
caseid: case12, consisting
34 matches
Mail list logo