zhuqi-lucas commented on PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#issuecomment-3034804227
Thank you @alamb , i will submit a draft blog soon in:
https://github.com/apache/datafusion/issues/16372
--
This is an automated message from the Apache Git Service.
To r
alamb commented on PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#issuecomment-3033685921
This is so great -- now we just need to write up a blog post 🎣
Thanks again @zhuqi-lucas -- this is going to be great
--
This is an automated message from the Apache Git
alamb merged PR #16395:
URL: https://github.com/apache/datafusion/pull/16395
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
jcsherin commented on PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#issuecomment-3027788574
@alamb The overview documentation is very clear and love the ASCII art.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to Git
jcsherin commented on code in PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#discussion_r2179977087
##
datafusion-examples/examples/parquet_embedded_index.rs:
##
@@ -0,0 +1,472 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contrib
zhuqi-lucas commented on PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#issuecomment-3027610547
> * 54a9e61
Thank you @alamb looks great to me!
> Simplified the code to only write the offset index (the length is stored
inline)
Perfect for this change!
alamb commented on PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#issuecomment-3027590526
I think it is now ready to merge, but it would probably be good for someone
else to go over it one last time to make sure it is clear
--
This is an automated message from the Apache
alamb commented on PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#issuecomment-3027588395
Hi @zhuqi-lucas -- spent a while this morning going over this PR carefully
-- it is great!
I hope you don't mind but I made some substantial edits to try and make it
read a bit
zhuqi-lucas commented on PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#issuecomment-3026260872
> Thank you @zhuqi-lucas -- I started going through this PR again in detail
>
> I renamed the example to align with the other parquet examples, and I
added it to the list o
zhuqi-lucas commented on code in PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#discussion_r2178933212
##
datafusion-examples/examples/embedding_parquet_indexes.rs:
##
@@ -0,0 +1,402 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more c
alamb commented on code in PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#discussion_r2178596921
##
datafusion-examples/examples/embedding_parquet_indexes.rs:
##
@@ -0,0 +1,402 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contrib
zhuqi-lucas commented on PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#issuecomment-3021686409
Updated the code with the merged PR:
https://github.com/apache/datafusion/pull/16575
And also added more comments.
--
This is an automated message from the Apache Git Servic
alamb commented on PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#issuecomment-2993532737
> How does it ensure that this extra index can be safely ignored by other
readers? If another parquet reader implementation decides to do a sequential
whole file scan, will it read int
zhuqi-lucas commented on PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#issuecomment-2993537466
This is amazing @alamb ! Thanks!
> There are other interesting things that can be done with this setup too
(for example, concatenating parquet files together without having to
alamb commented on PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#issuecomment-2993531597
FYI @XiangpengHao and @@JigaoLuo -- here is another example of the somewhat
crazy things you can do with parquet
--
This is an automated message from the Apache Git Service.
To resp
zhuqi-lucas commented on PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#issuecomment-2993484604
> wow this is so cool!
>
> I have a question (and I think it's worth adding to the comment for people
like me that's not familiar with parquet internals): How does it ensur
2010YOUY01 commented on PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#issuecomment-2993326084
wow this is so cool!
I have a question (and I think it's worth adding to the comment for people
like me that's not familiar with parquet internals):
How does it ensure th
zhuqi-lucas commented on code in PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#discussion_r2159141300
##
datafusion-examples/examples/embedding_parquet_indexes.rs:
##
@@ -0,0 +1,380 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more c
zhuqi-lucas commented on PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#issuecomment-2991840133
> Thank you @zhuqi-lucas -- this is (really) cool. It is definitely blog
post worthy (we have too many cool things that are blog worthy recently - and
not enough time to write th
zhuqi-lucas commented on code in PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#discussion_r2159136194
##
datafusion-examples/examples/embedding_parquet_indexes.rs:
##
@@ -0,0 +1,380 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more c
zhuqi-lucas commented on code in PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#discussion_r2159119104
##
datafusion-examples/examples/embedding_parquet_indexes.rs:
##
@@ -0,0 +1,380 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more c
alamb commented on code in PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#discussion_r2159067305
##
datafusion-examples/examples/embedding_parquet_indexes.rs:
##
@@ -0,0 +1,380 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contrib
alamb commented on PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#issuecomment-2991749429
> The example print logs, it's good, thanks!
this is so cool!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHu
zhuqi-lucas commented on PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#issuecomment-2987414899
Thank you @alamb,
I am excited to update today that i resolve the page index conflicts by
adding new API in arrow-rs which can write bytes to the buf, and it can make
the buf
zhuqi-lucas commented on code in PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#discussion_r2154205143
##
datafusion-examples/examples/embedding_parquet_indexes.rs:
##
@@ -0,0 +1,363 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more c
zhuqi-lucas commented on code in PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#discussion_r2153573700
##
datafusion-examples/examples/embedding_parquet_indexes.rs:
##
@@ -0,0 +1,243 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more c
alamb commented on code in PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#discussion_r2153064233
##
datafusion-examples/examples/embedding_parquet_indexes.rs:
##
@@ -0,0 +1,243 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contrib
zhuqi-lucas commented on code in PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#discussion_r2148133565
##
datafusion-examples/examples/embedding_parquet_indexes.rs:
##
@@ -0,0 +1,243 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more c
zhuqi-lucas commented on code in PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#discussion_r2148133565
##
datafusion-examples/examples/embedding_parquet_indexes.rs:
##
@@ -0,0 +1,243 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more c
zhuqi-lucas commented on code in PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#discussion_r2148132009
##
datafusion-examples/examples/embedding_parquet_indexes.rs:
##
@@ -0,0 +1,243 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more c
zhuqi-lucas commented on code in PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#discussion_r2148132009
##
datafusion-examples/examples/embedding_parquet_indexes.rs:
##
@@ -0,0 +1,243 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more c
zhuqi-lucas commented on code in PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#discussion_r214877
##
datafusion-examples/examples/embedding_parquet_indexes.rs:
##
@@ -0,0 +1,243 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more c
zhuqi-lucas commented on code in PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#discussion_r2146810315
##
datafusion-examples/examples/embedding_parquet_indexes.rs:
##
@@ -0,0 +1,243 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more c
zhuqi-lucas commented on code in PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#discussion_r2144937636
##
datafusion-examples/examples/embedding_parquet_indexes.rs:
##
@@ -0,0 +1,243 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more c
alamb commented on code in PR #16395:
URL: https://github.com/apache/datafusion/pull/16395#discussion_r2144899471
##
datafusion-examples/examples/embedding_parquet_indexes.rs:
##
@@ -0,0 +1,243 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contrib
zhuqi-lucas opened a new pull request, #16395:
URL: https://github.com/apache/datafusion/pull/16395
## Which issue does this PR close?
- Closes [#16374](https://github.com/apache/datafusion/issues/16374)
## Rationale for this change
```rust
//! Example: embeddi
36 matches
Mail list logo