Re: Roaring Bitmap UDFs

2017-12-08 Thread David Capwell
Think bloom filter that's more dynamic. It works well when cardinality is low, but grows quickly to out cost bloom filter as cardinality grows. This data structure supports existence queries, but your email sounds like you want count. If so not really the best fit. On Dec 8, 2017 5:00 PM, "Niti

Roaring Bitmap UDFs

2017-12-08 Thread Nitin Vijayvargiya
Hi all, I'm working on speeding up distinct count calculations, and it looks like roaring bitmaps (RB) is the newest and meanest way for set operations. Anyone here have experience with them? How was the performance compared to hyperloglog and EWAH? A quick google search showed me that it's easier

Cannot create external table on S3; class S3AFileSystem not found

2017-12-08 Thread Scott Halgrim
Hi, I’ve been struggling with this for a few hours, hopefully somebody here can help me out. We have a lot of data in parquet format on S3 and we want to use Hive to query it. I’m running on ubuntu and we have a MySQL metadata store on AWS RDS. The command in the hive client I’m trying to run