Hi. You should send your PR to apache/flink-web repository not your flink-web
repository.
Regards,
Chiwan Park
> On Jun 5, 2015, at 2:46 PM, Lokesh Rajaram wrote:
>
> Hello,
>
> For JIRA FLINK-2155 updated the document and created a pull request with
> flink-web project as https://github.com/
Hello,
For JIRA FLINK-2155 updated the document and created a pull request with
flink-web project as https://github.com/lokeshrajaram/flink-web/pull/1
I followed how to contrinute to create this pull request. can someone
please help me verify if this pull request is the right way of updating
flin
Thanks Stephan for clarifying :)
@kostas: i am just playing around with some ideas. Only in my head so far,
so lets not worry about these things
On Thu, Jun 4, 2015 at 6:33 PM Kostas Tzoumas wrote:
> Wouldn't this kind of cross-task communication break the whole dataflow
> abstraction? How can r
I see your problem. One way to solve the problem is to implement a special
PredictOperation which takes a tuple (id, vector) and returns a tuple (id,
labeledVector). You can take a look at the implementation for the vector
prediction operation.
But we can also discuss about adding an ID field to t
I think it is not a problem of join hints, but rather of too little memory
for the join operator. If you set the temporary directory, then the job
will be split in smaller parts and thus each operator gets more memory.
Alternatively, you can increase the memory you give to the Task Managers.
The p
I think that the NPE in second condition is bug in HashTable.
I just found that ConnectedComponents with small memory segments causes same
error. (I thought I fixed the bug, but It is still alive.)
Regards,
Chiwan Park
> On Jun 5, 2015, at 2:35 AM, Felix Neutatz wrote:
>
> now the question is
now the question is, which join in the ALS implementation is the problem :)
2015-06-04 19:09 GMT+02:00 Andra Lungu :
> Hi Felix,
>
> Passing a JoinHint to your function should help.
> see:
>
> http://mail-archives.apache.org/mod_mbox/flink-user/201504.mbox/%3ccanc1h_vffbqyyiktzcdpihn09r4he4oluiur
I am in principle with Ufuk on that, but let's not rush this into the
release. It is not a public API after all.
On Thu, Jun 4, 2015 at 5:23 PM, Ufuk Celebi wrote:
>
> On 04 Jun 2015, at 17:02, Maximilian Michels wrote:
>
> > I think ResultPartition is a pretty accurate description of what it i
Hi,
I have the following use case: I want to to regression for a timeseries
dataset like:
id, x1, x2, ..., xn, y
id = point in time
x = features
y = target value
In the Flink frame work I would map this to a LabeledVector (y,
DenseVector(x)). (I don't want to use the id as a feature)
When I ap
Hi Felix,
Passing a JoinHint to your function should help.
see:
http://mail-archives.apache.org/mod_mbox/flink-user/201504.mbox/%3ccanc1h_vffbqyyiktzcdpihn09r4he4oluiursjnci_rwc+c...@mail.gmail.com%3E
Cheers,
Andra
On Thu, Jun 4, 2015 at 7:07 PM, Felix Neutatz
wrote:
> after bug fix:
>
> for 1
after bug fix:
for 100 blocks and standard jvm heap space
Caused by: java.lang.RuntimeException: Hash join exceeded maximum number of
recursions, without reducing partitions enough to be memory resident.
Probably cause: Too many duplicate keys.
at
org.apache.flink.runtime.operators.hash.MutableHa
Wouldn't this kind of cross-task communication break the whole dataflow
abstraction? How can recovery be implemented if we allowed something like
this?
On Thu, Jun 4, 2015 at 5:14 PM, Stephan Ewen wrote:
> That is not what Ufuk said. You can use a singleton auxiliary task that
> communicates in
For linear regression, the main tasks are computing the covariance
matrix and X * y, which can both be parallelized well, and then you
need to solve a linear equation whose dimension consists of the number
of features. So if number of features is small, it actually makes
sense to do the setup in Fl
On 04 Jun 2015, at 17:02, Maximilian Michels wrote:
> I think ResultPartition is a pretty accurate description of what it is: a
> partition of the result of an operator. ResultStream on the other hand,
> seems very generic to me. Just because we like to think of Flink nowadays
> as a "streaming
That is not what Ufuk said. You can use a singleton auxiliary task that
communicates in both directions with the vertices and acts as a coordinator
between vertices on the same level.
On Thu, Jun 4, 2015 at 2:55 PM, Gyula Fóra wrote:
> Thank you!
> I was aware of the iterations as a possibility,
I think ResultPartition is a pretty accurate description of what it is: a
partition of the result of an operator. ResultStream on the other hand,
seems very generic to me. Just because we like to think of Flink nowadays
as a "streaming data flow" engine, we don't have to change the core
classes' na
I am using eclipse kepler
I tried to replicate the same problem in another workspace.
When i try to test the plugin using Junit PlugIn Test it throws me the
classNotFoundException.
However, when i try to test it as JunitTest,it works fine.
Am i missing something here?
--
View this message in
I agree that given a small data set it's probably better to solve the
linear regression problem directly. However, I'm not so sure how well this
performs if the data gets really big (more in terms of number of data
points). But maybe we can find something like a sweet spot when to switch
between bo
Aljoscha Krettek created FLINK-2163:
---
Summary: VertexCentricConfigurationITCase sometimes fails on Travis
Key: FLINK-2163
URL: https://issues.apache.org/jira/browse/FLINK-2163
Project: Flink
+1 for simple learning for simple cases.
Where normal equations have a reasonable condition number, using them is
good.
For large sparse systems, SGD with Adagrad will crush direct solutions,
however, even for linear problems.
On Thu, Jun 4, 2015 at 2:38 PM, Mikio Braun
wrote:
> It's true th
Thank you!
I was aware of the iterations as a possibility, but I was wondering if we
might have "lateral" communications.
Ufuk Celebi ezt írta (időpont: 2015. jún. 4., Cs, 13:29):
>
> On 04 Jun 2015, at 12:46, Stephan Ewen wrote:
>
> > There is no "lateral communication" right now. Typical patt
On Thu, Jun 4, 2015 at 1:26 PM, Till Rohrmann wrote:
> Maybe also the default learning rate of 0.1 is set too high.
>
Could be.
But grid search on learning rate is pretty standard practice. Running
multiple learning engines at the same time with different learning rates is
pretty plausible.
Al
It's true that we can and should look into methods to make sgd more
resilient, however, especially for linear regression, which even has a
closed form solution, all this seems too excessive.
I mean in the end, if the number of features is small (lets say less
than 2000), the best way is to compute
Big +1 :)
On 06/04/2015 01:33 PM, Robert Metzger wrote:
> I would also say that in particular big changes should include an update to
> the documentation as well!
>
> I'll add a rule to the guidelines and I'll start annoying you to write
> documentation in pull requests.
>
> On Thu, Jun 4, 2015
On 03 Jun 2015, at 17:00, Robert Metzger wrote:
> What is the status of the 0.9 release planning.
>
> It seems like many of the open issues from the document have been closed.
> When do you think are we able to fork off the "release-0.9" branch and
> create the first RC ?
It would be great to
On 04 Jun 2015, at 13:10, Maximilian Michels wrote:
> Rename what to streams? Do you mean "ResultPartition" => "StreamPartition"?
Exactly along those lines, but maybe "ResultStream".
> I'm not sure if that makes it easier to understand what the classes do.
It fits better into the terminology
Till Rohrmann created FLINK-2162:
Summary: Implement adaptive learning rate strategies for SGD
Key: FLINK-2162
URL: https://issues.apache.org/jira/browse/FLINK-2162
Project: Flink
Issue Type:
I would also say that in particular big changes should include an update to
the documentation as well!
I'll add a rule to the guidelines and I'll start annoying you to write
documentation in pull requests.
On Thu, Jun 4, 2015 at 1:06 PM, Maximilian Michels wrote:
> +1 for your proposed changes,
Resolved in https://issues.apache.org/jira/browse/FLINK-2070.
I'll update the documentation.
On Thu, Jun 4, 2015 at 12:22 AM, Stephan Ewen wrote:
> I'll prepare a fix...
>
> On Wed, Jun 3, 2015 at 10:24 PM, Stephan Ewen wrote:
>
> > +1 for printOnTaskManager(prefix)
> >
> > +1 for deprecating
On 04 Jun 2015, at 12:46, Stephan Ewen wrote:
> There is no "lateral communication" right now. Typical pattern is to break
> it up in two operators that communicate in an all-to-all fashion.
You can look at the iteration tasks: the iteration sync task is communicating
with the iteration heads
At the moment the current SGD implementation works like (modulo
regularization): newWeights = oldWeights - adaptedStepsize *
sumOfGradients/numberOfGradients where adaptedStepsize =
initialStepsize/sqrt(iterationNumber) and sumOfGradients is the simple sum
of the gradients for all points in the bat
Thanks for helping us debug this.
You can start many taskmanagers in one JVM, by using the LocalMiniCluster.
Have a look at this (manually triggered) test, which runs 100 TaskManagers
in one JVM:
https://github.com/apache/flink/blob/master/flink-tests/src/test/java/org/apache/flink/test/manual/No
Rename what to streams? Do you mean "ResultPartition" => "StreamPartition"?
I'm not sure if that makes it easier to understand what the classes do.
On Mon, Jun 1, 2015 at 10:11 AM, Aljoscha Krettek
wrote:
> +1
> I like it. We are a streaming system underneath after all.
> On Jun 1, 2015 10:02 AM
Thanks for your feedback. I am neither running IPSec nor the aesni-intel module.
So far, I could not reproduce the reordering issue. I also have detected that
my code might have created String objects with invalid UTF16 content in exactly
those jobs that suffered from the reordering. I wanted to
+1 for your proposed changes, Robert. I would argue that is even more
crucial that big pull requests contain documentation because a lot of times
only the contributor can create this documentation. Additionally,
documentation makes reviewing a pull request much easier.
Fragmented documentation is
There is no "lateral communication" right now. Typical pattern is to break
it up in two operators that communicate in an all-to-all fashion.
On Thu, Jun 4, 2015 at 11:52 AM, Gyula Fóra wrote:
> I am simply thinking about the best way to send data to different subtasks
> of the same operator.
>
>
I agree. It does not help with the current unstable tests. However, I
can help to prevent to run into instability issues in the future.
On 06/04/2015 11:58 AM, Fabian Hueske wrote:
> I think the problem is less with bugs being introduced by new commits but
> rather bugs which are already in the co
Till Rohrmann created FLINK-2161:
Summary: Flink Scala Shell does not support external jars (e.g.
Gelly, FlinkML)
Key: FLINK-2161
URL: https://issues.apache.org/jira/browse/FLINK-2161
Project: Flink
I think the problem is less with bugs being introduced by new commits but
rather bugs which are already in the code base.
2015-06-04 11:52 GMT+02:00 Matthias J. Sax :
> I have another idea: the problem is, that some commit might de-stabilize
> a former stable test. This in not detected, because t
I have another idea: the problem is, that some commit might de-stabilize
a former stable test. This in not detected, because the build was
("accidentally") green and the code in merged.
We could reduce the probability that this happens, if a pull request
must pass the test-run multiple times (mayb
I am simply thinking about the best way to send data to different subtasks
of the same operator.
Can we go back to the original question? :D
Stephan Ewen ezt írta (időpont: 2015. jún. 3., Sze,
23:45):
> I think that it may be a bit pre-mature to invest heavily into the parallel
> delta-policy w
Aljoscha Krettek created FLINK-2160:
---
Summary: Change Streaming Source Interface to run(Context)/cancel()
Key: FLINK-2160
URL: https://issues.apache.org/jira/browse/FLINK-2160
Project: Flink
Thanks for the feedback and the suggestions.
As Stephan said, the "we have to fix it asap" usually does not work well. I
think blocking master is not an option, exactly for the reasons that Fabian and
Till outlined.
From the comments so far, I don't feel like we are eager to adapt a disable
po
Ufuk Celebi created FLINK-2159:
--
Summary: SimpleRecoveryITCase fails
Key: FLINK-2159
URL: https://issues.apache.org/jira/browse/FLINK-2159
Project: Flink
Issue Type: Bug
Components: Te
Robert Metzger created FLINK-2158:
-
Summary: NullPointerException in DateSerializer.
Key: FLINK-2158
URL: https://issues.apache.org/jira/browse/FLINK-2158
Project: Flink
Issue Type: Bug
Till Rohrmann created FLINK-2157:
Summary: Create evaluation framework for ML library
Key: FLINK-2157
URL: https://issues.apache.org/jira/browse/FLINK-2157
Project: Flink
Issue Type: New Feat
The back-and-forth on the Source interface was unfortunate, yes.
In general, I think, that we should not doctor around on other
peoples's pull requests in semi secrecy. Some small cosmetic fixes or
rewordings of the commit message are OK. But if the PR needs rework
then this should be voiced in th
To run operations like Insert, Update, Delete on persistent files, you have
to have support from the storage engine.
The Apache ORC data format recently added support for transactional
inserts, updates, deletes. http://orc.apache.org/
ORC has a Hadoop input format, and you can use that one with F
Yes, I will try it again with the newest update :)
2015-06-04 10:17 GMT+02:00 Till Rohrmann :
> If the first error is not fixed by Chiwans PR, then we should create a JIRA
> for it to not forget it.
>
> @Felix: Chiwan's PR is here [1]. Could you try to run ALS again with this
> version?
>
> Cheer
If the first error is not fixed by Chiwans PR, then we should create a JIRA
for it to not forget it.
@Felix: Chiwan's PR is here [1]. Could you try to run ALS again with this
version?
Cheers,
Till
[1] https://github.com/apache/flink/pull/751
On Thu, Jun 4, 2015 at 10:10 AM, Chiwan Park wrote:
Hi. The second bug is fixed by the recent change in PR.
But there is just no test case for first bug.
Regards,
Chiwan Park
> On Jun 4, 2015, at 5:09 PM, Ufuk Celebi wrote:
>
> I think both are bugs. They are triggered by the different memory
> configurations.
>
> @chiwan: is the 2nd error fixe
I think both are bugs. They are triggered by the different memory
configurations.
@chiwan: is the 2nd error fixed by your recent change?
@felix: if yes, can you try the 2nd run again with the changes?
On Thursday, June 4, 2015, Felix Neutatz wrote:
> Hi,
>
> I played a bit with the ALS recomme
Till Rohrmann created FLINK-2156:
Summary: Scala modules cannot create logging file
Key: FLINK-2156
URL: https://issues.apache.org/jira/browse/FLINK-2156
Project: Flink
Issue Type: Bug
Hi,
I played a bit with the ALS recommender algorithm. I used the movielens
dataset: http://files.grouplens.org/datasets/movielens/ml-latest-README.html
The rating matrix has 21.063.128 entries (ratings).
I run the algorithm with 3 configurations:
1. standard jvm heap space:
val als = ALS()
Hi Admin
Do we have insert, update and remove operations on Apache Flink?
For example: I have 10 million records in my test file. I want to add one
record, update one record and remove one record from this test file.
How to implement it by Flink?
Thanks.
Best regards
Email:
+1 :-)
On Wed, Jun 3, 2015 at 4:53 PM, Vasiliki Kalavri
wrote:
> Hi Sachin,
>
> great idea to keep a blog! Thanks a lot for sharing :))
>
> -V.
>
> On 3 June 2015 at 16:41, Sachin Goel wrote:
>
> > Hi everyone
> > I'm maintaining a blog detailing my work here:
> > https://tolkienaboutcode.wordp
Yes, this is indeed a big change, but it was openly discussed multiple
times here on the mailing list and in a number of PRs. I am pretty sure
that we do not want to break the source interface any more, but there is
still some open discussion on it. Let us keep an eye on PR 742 where it is
currentl
The tests that Ufuk is referring to are not deterministically failing. This is
about hard to debug and hard to fix tests where it is not clear who broke them.
Fixing such a test can take a several days or even more… So locking the master
branch is not an option IMO.
Deactivating the tests wi
I'm also in favour of quickly fixing the failing test cases but I think
that blocking the master is a kind of drastic measure. IMO this creates a
culture of blaming someone whereas I would prefer a more proactive
approach. When you see a failing test case and know that someone recently
worked on it
I think, people should be forced to fixed failing tests asap. One way to
go, could be to lock the master branch until the test is fixed. If
nobody can push to the master, pressure is very high for the responsible
developer to get it done asap. Not sure if this is Apache compatible.
Just a thought
60 matches
Mail list logo