Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-07-04 Thread via GitHub
zhuqi-lucas commented on PR #16395: URL: https://github.com/apache/datafusion/pull/16395#issuecomment-3034804227 Thank you @alamb , i will submit a draft blog soon in: https://github.com/apache/datafusion/issues/16372 -- This is an automated message from the Apache Git Service. To r

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-07-03 Thread via GitHub
alamb commented on PR #16395: URL: https://github.com/apache/datafusion/pull/16395#issuecomment-3033685921 This is so great -- now we just need to write up a blog post 🎣 Thanks again @zhuqi-lucas -- this is going to be great -- This is an automated message from the Apache Git

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-07-03 Thread via GitHub
alamb merged PR #16395: URL: https://github.com/apache/datafusion/pull/16395 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-07-02 Thread via GitHub
jcsherin commented on PR #16395: URL: https://github.com/apache/datafusion/pull/16395#issuecomment-3027788574 @alamb The overview documentation is very clear and love the ASCII art. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-07-02 Thread via GitHub
jcsherin commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2179977087 ## datafusion-examples/examples/parquet_embedded_index.rs: ## @@ -0,0 +1,472 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-07-02 Thread via GitHub
zhuqi-lucas commented on PR #16395: URL: https://github.com/apache/datafusion/pull/16395#issuecomment-3027610547 > * 54a9e61 Thank you @alamb looks great to me! > Simplified the code to only write the offset index (the length is stored inline) Perfect for this change!

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-07-02 Thread via GitHub
alamb commented on PR #16395: URL: https://github.com/apache/datafusion/pull/16395#issuecomment-3027590526 I think it is now ready to merge, but it would probably be good for someone else to go over it one last time to make sure it is clear -- This is an automated message from the Apache

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-07-02 Thread via GitHub
alamb commented on PR #16395: URL: https://github.com/apache/datafusion/pull/16395#issuecomment-3027588395 Hi @zhuqi-lucas -- spent a while this morning going over this PR carefully -- it is great! I hope you don't mind but I made some substantial edits to try and make it read a bit

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-07-01 Thread via GitHub
zhuqi-lucas commented on PR #16395: URL: https://github.com/apache/datafusion/pull/16395#issuecomment-3026260872 > Thank you @zhuqi-lucas -- I started going through this PR again in detail > > I renamed the example to align with the other parquet examples, and I added it to the list o

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-07-01 Thread via GitHub
zhuqi-lucas commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2178933212 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,402 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-07-01 Thread via GitHub
alamb commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2178596921 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,402 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-30 Thread via GitHub
zhuqi-lucas commented on PR #16395: URL: https://github.com/apache/datafusion/pull/16395#issuecomment-3021686409 Updated the code with the merged PR: https://github.com/apache/datafusion/pull/16575 And also added more comments. -- This is an automated message from the Apache Git Servic

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-21 Thread via GitHub
alamb commented on PR #16395: URL: https://github.com/apache/datafusion/pull/16395#issuecomment-2993532737 > How does it ensure that this extra index can be safely ignored by other readers? If another parquet reader implementation decides to do a sequential whole file scan, will it read int

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-21 Thread via GitHub
zhuqi-lucas commented on PR #16395: URL: https://github.com/apache/datafusion/pull/16395#issuecomment-2993537466 This is amazing @alamb ! Thanks! > There are other interesting things that can be done with this setup too (for example, concatenating parquet files together without having to

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-21 Thread via GitHub
alamb commented on PR #16395: URL: https://github.com/apache/datafusion/pull/16395#issuecomment-2993531597 FYI @XiangpengHao and @@JigaoLuo -- here is another example of the somewhat crazy things you can do with parquet -- This is an automated message from the Apache Git Service. To resp

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-21 Thread via GitHub
zhuqi-lucas commented on PR #16395: URL: https://github.com/apache/datafusion/pull/16395#issuecomment-2993484604 > wow this is so cool! > > I have a question (and I think it's worth adding to the comment for people like me that's not familiar with parquet internals): How does it ensur

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-20 Thread via GitHub
2010YOUY01 commented on PR #16395: URL: https://github.com/apache/datafusion/pull/16395#issuecomment-2993326084 wow this is so cool! I have a question (and I think it's worth adding to the comment for people like me that's not familiar with parquet internals): How does it ensure th

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-20 Thread via GitHub
zhuqi-lucas commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2159141300 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,380 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-20 Thread via GitHub
zhuqi-lucas commented on PR #16395: URL: https://github.com/apache/datafusion/pull/16395#issuecomment-2991840133 > Thank you @zhuqi-lucas -- this is (really) cool. It is definitely blog post worthy (we have too many cool things that are blog worthy recently - and not enough time to write th

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-20 Thread via GitHub
zhuqi-lucas commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2159136194 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,380 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-20 Thread via GitHub
zhuqi-lucas commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2159119104 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,380 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-20 Thread via GitHub
alamb commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2159067305 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,380 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-20 Thread via GitHub
alamb commented on PR #16395: URL: https://github.com/apache/datafusion/pull/16395#issuecomment-2991749429 > The example print logs, it's good, thanks! this is so cool! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-19 Thread via GitHub
zhuqi-lucas commented on PR #16395: URL: https://github.com/apache/datafusion/pull/16395#issuecomment-2987414899 Thank you @alamb, I am excited to update today that i resolve the page index conflicts by adding new API in arrow-rs which can write bytes to the buf, and it can make the buf

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-18 Thread via GitHub
zhuqi-lucas commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2154205143 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,363 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-17 Thread via GitHub
zhuqi-lucas commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2153573700 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,243 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-17 Thread via GitHub
alamb commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2153064233 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,243 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-15 Thread via GitHub
zhuqi-lucas commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2148133565 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,243 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-15 Thread via GitHub
zhuqi-lucas commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2148133565 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,243 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-15 Thread via GitHub
zhuqi-lucas commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2148132009 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,243 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-15 Thread via GitHub
zhuqi-lucas commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2148132009 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,243 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-15 Thread via GitHub
zhuqi-lucas commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r214877 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,243 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-14 Thread via GitHub
zhuqi-lucas commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2146810315 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,243 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-13 Thread via GitHub
zhuqi-lucas commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2144937636 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,243 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-13 Thread via GitHub
alamb commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2144899471 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,243 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

[PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-13 Thread via GitHub
zhuqi-lucas opened a new pull request, #16395: URL: https://github.com/apache/datafusion/pull/16395 ## Which issue does this PR close? - Closes [#16374](https://github.com/apache/datafusion/issues/16374) ## Rationale for this change ```rust //! Example: embeddi