hi Franke,
1) We are using 4 indentical AWS machines. 8 vCPUs, 32 GB RAM. 1 TB storage
2) Setting up bloom filters only on two other string columns. Not all of
them.
3) The data is any event data ex: Syslog.
4) Queries usually run on timestamp range with additional predicates on
other columns (mos
What is your hardware setup?
Are the bloom filters necessary on all columns? Usually they make only sense
for non-numeric columns. Updating bloom filters take time and should be avoided
where they do not make sense.
Can you provide an example of the data and the select queries that you execute
o
Hi,
I'm using ORC format for our table storage. The table has a timestamp
column(say TS) and 25 other columns. The other ORC properties we are using
arestorage index and bloom filters. We are loading 100 million records in
to this table on a 4-node cluster.
Our source table is a text table with C