karlovnv commented on issue #12454:
URL: https://github.com/apache/datafusion/issues/12454#issuecomment-2356209473

   > > We had a discussion about join of huge table with small table here: 
[#7000 
(comment)](https://github.com/apache/datafusion/issues/7000#issuecomment-2094813305)
   > > There are several approaches discussed:
   
   
   
   > I don't think either of these would actually solve the issue with outer 
joins.
   
   I understand your point. I mentioned ClickHouse functionality and 
Dictionaries just to show that sometimes it is not necessary to do join when 
you'd like to do join. I agree that this will work only for **inner** joins 
(the we can replace join by getting data from dictionary).
   ClickHouse has copies of dictionaries (as files, as connections to an 
external DB, etc) at each node to avoid sync process of join / dictionary 
replacements across the cluster
   
   > in a distributed environment there is no way to share that state without 
some external mechanism
   
   Yes, this is true. There are some examples:
   Greenplum for instance sends portions of data (and state) across the shards 
to parallelize execution. It's such a difficult thing to develop. Shards of 
ClickHouse don't send their data in that manner, so CH allows to do join only 
two distributed tables, no more. 
   That why some people choose Greenplum for its ability to perform any query 
despite the fact that it can be slow, some chooses CH for its speed despite 
limitations.
   
   So I think that if we can consider small Facts table as a some kind of 
dictionary, and build HashJoin index on each node using copies on this table. 
   
   This can help us to do join without resort to develop an external mechanism 
for communication.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to