What’s perhaps not clear is that the PrunedFilteredScan trait which needs
to be implemented to allow for predicate pushdown does need to return
RDD:
https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/sql/sources/PrunedFilteredScan.html
IgniteSQLRelation implements this to perform Ignit
Stuart,
>From API standpoint Dataframe is just a Dataset of Rows, so they have the
same set of methods.
You have a valid point, by it's valid for both Dataset and Dataframe in the
same way. If you provide a function that filter Rows in a Dataframe,
current integration would also not take advantag
Val,
Happy to clarify my thoughts. Let’s take an example, say we have an Ignite
cache of Person objects, so in Nikolay’s Ignite Spark SQL implementation you
can currently obtain a DataFrame with a column called “age” because that’s
been registered as a field in Ignite. Then you can do something li
Stuart,
I don't see a reason why it would work with DataFrames, but not with
Datasets - they are pretty much the same thing. If you have any particular
thoughts on this, please let us know.
In any case, I would like to hear from Nikolay as he is an implementor of
this functionality. Nikolay, plea
I believe suggested approach will not work with the Spark SQL
relational optimisations which perform predicate pushdown from Spark
to Ignite. For that to work we need both the key/val and the
relational fields in a dataframe schema.
Stuart.
> On 1 Aug 2018, at 04:23, Valentin Kulichenko
> wrote
I don't think there are exact plans to remove _key and _value fields as
it's pretty hard considering the fact that many users use them and that
they are deeply integrated into the product. However, we already had
multiple usability and other issues due to their existence, and while
fixing them we g
Hello folks,
The documentation goes with a small reference about _key and _val usage,
and only for Ignite SQL APIs (Java, Net, C++). I tried to clean up all the
documentation code snippets.
As for the GitHub examples, they require a major overhaul. Instead of _key
and _val usage, we need to use S
Hello, Igniters.
Valentin,
> We never recommend to use these fields
Actually, we did:
* Documentation [1]. Please, see "Predefined Fields" section.
* Java Example [2]
* DotNet Example [3]
* Scala Example [4]
> ...hopefully will be removed altogether one day
Th
Stuart,
_key and _val fields is quite a dirty hack that was added years ago and is
virtually never used now. We never recommend to use these fields and I
would definitely avoid building new features based on them.
Having said that, I'm not arguing the use case, but we need better
implementation a
If your predicates and joins are expressed in Spark SQL, you cannot
currently optimise those and also gain access to the key/val objects. If
you went without the Ignite Spark SQL optimisations and expressed your
query in Ignite SQL, you still need to use the _key/_val columns. The
Ignite documentat
Well, the second approach would use the optimizations, no?
-Val
On Fri, Jul 27, 2018 at 11:49 AM Stuart Macdonald wrote:
> Val,
>
> Yes you can already get access to the cache objects as an RDD or
> Dataset but you can’t use the Ignite-optimised DataFrames with these
> mechanisms. Optimised Da
Val,
Yes you can already get access to the cache objects as an RDD or
Dataset but you can’t use the Ignite-optimised DataFrames with these
mechanisms. Optimised DataFrames have to be passed through Spark SQL’s
Catalyst engine to allow for predicate pushdown to Ignite. So the
usecase we’re talking
Stuart, Nikolay,
I really don't like the idea of exposing '_key' and '_val' fields. This is
legacy stuff that hopefully will be removed altogether one day. Let's not
use it in new features.
Actually, I don't even think it's even needed. Spark docs [1] suggest two
ways of creating a typed dataset:
Here’s the ticket:
https://issues.apache.org/jira/browse/IGNITE-9108
Stuart.
On Friday, 27 July 2018 at 14:19, Nikolay Izhikov wrote:
> Sure.
>
> Please, send ticket number in this thread.
>
> пт, 27 июля 2018 г., 16:16 Stuart Macdonald (mailto:stu...@stuwee.org)>:
>
> > Thanks Niko
Sure.
Please, send ticket number in this thread.
пт, 27 июля 2018 г., 16:16 Stuart Macdonald :
> Thanks Nikolay. For both options if the cache object isn’t a simple type,
> we’d probably do something like this in our Ignite SQL statement:
>
> select cast(_key as binary), cast(_val as binary), ..
Thanks Nikolay. For both options if the cache object isn’t a simple type,
we’d probably do something like this in our Ignite SQL statement:
select cast(_key as binary), cast(_val as binary), ...
Which would give us the BinaryObject’s byte[], then for option 1 we keep
the Ignite format and introdu
Hello, Stuart.
I like your idea.
> 1. Ignite BinaryObjects, in which case we’d need to supply a Spark Encoder
> implementation for BinaryObjects
> 2. Kryo-serialised versions of the objects.
Seems like first option is simple adapter. Am I right?
If yes, I think it's a more efficient way compari
17 matches
Mail list logo