That would be better of course. My opinion had to do with
not-implementing-exactly-the-same-thing-twice. Perhaps Till could weigh in
here.
We really do need to come up with a general mechanism for this. Testing
labeled vectors has exactly the same problem. I'll look into how Spark and
sci-kit appro
I am in favor of efficiency. Therefore I would be prefer to introduce new
methods, in order to save memory and network traffic. This would also solve
the problem of "how to come up with ids?"
Best regards,
Felix
Am 08.06.2015 12:52 nachm. schrieb "Sachin Goel" :
> I think if the user doesn't prov
I think if the user doesn't provide IDs, we can safely assume that they
don't need it. We can just simply assign an ID of one as a temporary
measure and return the result, with no IDs [just to make the interface
cleaner].
If the IDs are provided, in that case, we simply use those IDs.
A possible te
My gut feeling is also that a `Transformer` would be a good place to
implement feature selection. Then you can simply reuse it across multiple
algorithms by simply chaining them together.
However, I don't know yet what's the best way to realize the IDs. One way
would be to add an ID field to `Vect
Yes. I agree too. It makes no sense for the learning algorithm to have
extra payload. Only relevant data makes sense.
Further, adding ID to the predict operation type definition seems a
legitimate choice. +1 from my side.
Regards
Sachin Goel
On Mon, Jun 8, 2015 at 4:06 PM, Theodore Vasiloudis <
t
I agree with Mikio; ids would be useful overall, and feature selection
should not be a part of learning algorithms,
all features in a LabeledVector should be assumed to be relevant by the
learners.
On Mon, Jun 8, 2015 at 12:00 PM, Mikio Braun
wrote:
> Hi all,
>
> I think there are number of issu
Hi all,
I think there are number of issues here:
- whether or not we generally need ids for our examples. For
time-series, this is a must, but I think it would also help us with
many other things (like partitioning the data, or picking a consistent
subset), so I would think adding (numeric) ids i
You're right Felix. You need to provide the `FitOperation` and
`PredictOperation` for the `Predictor` you want to use and the
`FitOperation` and `TransformOperation` for all `Transformer`s you want to
chain in front of the `Predictor`.
Specifying which features to take could be a solution. However
A more general approach would be to take as input which indices of the
vector to consider as features. After that, the vector can be returned as
such and user can do what they wish with the non-feature values. This
wouldn't need extending the predict operation, instead this can be
specified in the
Probably we also need it for the other classes of the pipeline as well, in
order to be able to pass the ID through the whole pipeline.
Best regards,
Felix
Am 06.06.2015 9:46 vorm. schrieb "Till Rohrmann" :
> Then you only have to provide an implicit PredictOperation[SVM, (T, Int),
> (LabeledVect
Then you only have to provide an implicit PredictOperation[SVM, (T, Int),
(LabeledVector, Int)] value with T <: Vector in the scope where you call
the predict operation.
On Jun 6, 2015 8:14 AM, "Felix Neutatz" wrote:
> That would be great. I like the special predict operation better because it
>
That would be great. I like the special predict operation better because it
is only in some cases necessary to return the id. The special predict
Operation would save this overhead.
Best regards,
Felix
Am 04.06.2015 7:56 nachm. schrieb "Till Rohrmann" :
> I see your problem. One way to solve the
I see your problem. One way to solve the problem is to implement a special
PredictOperation which takes a tuple (id, vector) and returns a tuple (id,
labeledVector). You can take a look at the implementation for the vector
prediction operation.
But we can also discuss about adding an ID field to t
Hi,
I have the following use case: I want to to regression for a timeseries
dataset like:
id, x1, x2, ..., xn, y
id = point in time
x = features
y = target value
In the Flink frame work I would map this to a LabeledVector (y,
DenseVector(x)). (I don't want to use the id as a feature)
When I ap
14 matches
Mail list logo