Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-07-06 Thread via GitHub
zhuqi-lucas commented on issue #16374: URL: https://github.com/apache/datafusion/issues/16374#issuecomment-3041327170 Thank you @alamb , a minor topic is i may pick up this: http://github.com/apache/datafusion/pull/13933 To use this user-defined index or parquet SortColumn metad

Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-07-06 Thread via GitHub
alamb commented on issue #16374: URL: https://github.com/apache/datafusion/issues/16374#issuecomment-3041322628 > User-Defined Index. I think this is a really good term -- I will update the blog post in https://github.com/apache/datafusion-site/pull/79 to use that -- This is an aut

Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-07-06 Thread via GitHub
alamb commented on issue #16374: URL: https://github.com/apache/datafusion/issues/16374#issuecomment-3041320800 > Thank you [@alamb](https://github.com/alamb) [@JigaoLuo](https://github.com/JigaoLuo) [@adriangb](https://github.com/adriangb) , i agree current example is the start, we can fu

Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-07-05 Thread via GitHub
zhuqi-lucas commented on issue #16374: URL: https://github.com/apache/datafusion/issues/16374#issuecomment-3040702910 Thank you @alamb @JigaoLuo @adriangb , i agree current example is the start, we can further add more advanced examples! -- This is an automated message from the Ap

Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-07-05 Thread via GitHub
JigaoLuo commented on issue #16374: URL: https://github.com/apache/datafusion/issues/16374#issuecomment-3039853245 > > Hi [@zhuqi-lucas](https://github.com/zhuqi-lucas), > > While proofreading the blog, I had one major general question: **What are the limitations of such an embedded index

Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-07-05 Thread via GitHub
adriangb commented on issue #16374: URL: https://github.com/apache/datafusion/issues/16374#issuecomment-3039806254 Index suggestion: a tablesample index. And a general thought: exploring these sorts of indexes could do very cool stuff for DataFusion in general in terms of pushing us t

Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-07-05 Thread via GitHub
alamb commented on issue #16374: URL: https://github.com/apache/datafusion/issues/16374#issuecomment-3039796047 > Hi [@zhuqi-lucas](https://github.com/zhuqi-lucas), > > While proofreading the blog, I had one major general question: **What are the limitations of such an embedded index?

Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-07-05 Thread via GitHub
JigaoLuo commented on issue #16374: URL: https://github.com/apache/datafusion/issues/16374#issuecomment-3039204719 Hi @zhuqi-lucas, While proofreading the blog, I had one major general question: **What are the limitations of such an embedded index?** - Is it limited to just one emb

Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-07-03 Thread via GitHub
alamb closed issue #16374: Add an example of embedding indexes *inside* a parquet file URL: https://github.com/apache/datafusion/issues/16374 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-06-21 Thread via GitHub
JigaoLuo commented on issue #16374: URL: https://github.com/apache/datafusion/issues/16374#issuecomment-2993567391 @alamb @zhuqi-lucas Thank you for this issue and the PR. This could significantly aid query processing on Parquet. I was previously **never** aware of `key_value_metadat

Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-06-13 Thread via GitHub
zhuqi-lucas commented on issue #16374: URL: https://github.com/apache/datafusion/issues/16374#issuecomment-2969754548 I am also preparing to cook a advanced_embedding_indexes later after the simple one merged. -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-06-13 Thread via GitHub
zhuqi-lucas commented on issue #16374: URL: https://github.com/apache/datafusion/issues/16374#issuecomment-2969605248 Thank you @alamb @adriangb , submit a simple example PR for review, i can add more examples follow-up: https://github.com/apache/datafusion/pull/16395 -- This is an

Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-06-12 Thread via GitHub
adriangb commented on issue #16374: URL: https://github.com/apache/datafusion/issues/16374#issuecomment-2967987004 Very excited about this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-06-12 Thread via GitHub
alamb commented on issue #16374: URL: https://github.com/apache/datafusion/issues/16374#issuecomment-2966448392 Nice @zhuqi-lucas -- BTW I am not sure how easy it will be to use the parquet APIs to do this (specifically write arbitrary bytes to the inner writer) so it may take some fiddlin

Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-06-12 Thread via GitHub
zhuqi-lucas commented on issue #16374: URL: https://github.com/apache/datafusion/issues/16374#issuecomment-2966416266 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-06-12 Thread via GitHub
zhuqi-lucas commented on issue #16374: URL: https://github.com/apache/datafusion/issues/16374#issuecomment-2966419013 I am interested in this, and i want to be familiar with embedding indexes. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-06-12 Thread via GitHub
zhuqi-lucas commented on issue #16374: URL: https://github.com/apache/datafusion/issues/16374#issuecomment-2966460212 Thank you @alamb, i will investigate and explore the APIs and see what’s possible. -- This is an automated message from the Apache Git Service. To respond to the message,

[I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-06-11 Thread via GitHub
alamb opened a new issue, #16374: URL: https://github.com/apache/datafusion/issues/16374 ### Is your feature request related to a problem or challenge? One of the common criticisms of parquet based query systems is that they don't have some particular type of index (e.g. HyperLogLog a