Was thinking also how bing doesn't use posting lists
<http://bitfunnel.org/strangeloop/> and also compiling queries
<https://github.com/BitFunnel/NativeJIT> !
About the queries, I would've think it wouldn't be as high overhead as
queries in in rdbms since those apply on each row while on search they
apply on each bitset.

On Mon, Jan 23, 2017 at 6:04 PM, Jeff Wartes <[email protected]> wrote:

>
>
> I’ve had some curiosity about this question too.
>
>
>
> For a while, I watched for a seastar-like library for the JVM, but
> https://github.com/bestwpw/windmill was the only one I came across, and
> it doesn’t seem to be going anywhere. Since one of the points of the JVM is
> to abstract away the platform, I certainty wonder if the JVM will ever get
> the kinds of machine affinity these other projects see. Your
> one-shard-per-core could probably be faked with multiple JVMs and numactl -
> could be an interesting experiment.
>
>
>
> That said, I’m aware that a phenomenal amount of optimization effort has
> gone into Lucene, and I’d also be interested in hearing about things that
> worked well.
>
>
>
>
>
> *From: *Dorian Hoxha <[email protected]>
> *Reply-To: *"[email protected]" <[email protected]>
> *Date: *Friday, January 20, 2017 at 8:12 AM
> *To: *"[email protected]" <[email protected]>
> *Subject: *How would you architect solr/lucene if you were starting from
> scratch for them to be 10X+ faster/efficient ?
>
>
>
> Hi friends,
>
> I was thinking how scylladb architecture
> <http://www.scylladb.com/technology/architecture/> works compared to
> cassandra which gives them 10x+ performance and lower latency. If you were
> starting lucene and solr from scratch what would you do to achieve
> something similar ?
>
> Different language (rust/c++?) for better SIMD
> <http://blog-archive.griddynamics.com/2015/06/lucene-simd-codec-benchmark-and-future.html>
> ?
>
> Use a GPU with a SSD for posting-list intersection ?(not out yet)
>
> Make it in-memory and use better data structures?
>
> Shard on cores like scylladb (so 1 shard for each core on the machine) ?
>
> External cache (like keeping n redis-servers with big ram/network & slow
> cpu/disk just for cache) ??
>
> Use better data structures (like algolia autocomplete radix
> <https://blog.algolia.com/inside-the-algolia-engine-part-2-the-indexing-challenge-of-instant-search/>
> )
>
> Distributing documents by term instead of id
> <http://research.microsoft.com/en-us/um/people/trishulc/papers/Maguro.pdf>
> ?
>
> Using ASIC / FPGA ?
>
>
>
> Regards,
>
> Dorian
>

Reply via email to