Hi, Thanks for the input. I use Hive 2 and still have this issue.
1. Hive version 2 2. Hive on Spark engine 1.3.1 3. Spark 1.5.2 I have added Hive user group to this as well. So hopefully we may get some resolution. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 12 March 2016 at 19:25, Timur Shenkao <t...@timshenkao.su> wrote: > Hi, > > I have suffered from Hive Streaming , Transactions enough, so I can share > my experience with you. > > 1) It's not a problem of Spark. It happens because of "peculiarities" / > bugs of Hive Streaming. Hive Streaming, transactions are very raw > technologies. If you look at Hive JIRA, you'll see several critical bugs > concerning Hive Streaming, transactions. Some of them are resolved in Hive > 2+ only. But Cloudera & Hortonworks ship their distributions with outdated > & buggy Hive. > So use Hive 2+. Earlier versions of Hive didn't run compaction at all. > > 2) In Hive 1.1, I issue the following lines > ALTER TABLE default.foo COMPACT 'MAJOR'; > SHOW COMPACTIONS; > > My manual compaction was shown but it was never fulfilled. > > 3) If you use Hive Streaming, it's not recommended or even forbidden to > insert rows into Hive Streaming tables manually. Only the process that > writes to such table should insert incoming rows sequentially. Otherwise > you'll get unpredictable behaviour. > > 4) Ordinary Hive tables are catalogs with text, ORC, etc. files. > Hive Streaming / transactional tables are catalogs that have numerous > subcatalogs with "delta" prefix. Moreover, there are files with > "flush_length" suffix in some delta subfolders. "flush_length" files have 8 > bytes length. The presence of "flush_length" file in some subfolder means > that Hive writes updates to this subfolder right now. When Hive fails or is > restarted, it begins to write into new delta subfolder with new > "flush_length" file. And old "flush_length" file (that was used before > failure) still remains. > One of the goal of compaction is to delete outdated "flush_length" files. > Not every application / library can read such folder structure or knows > details of Hive Streaming / transactions implementation. Most of the > software solutions still expect ordinary Hive tables as input. > When they encounter subcatalogs or special files "flush_length" file, > applications / libraries either "see nothing" (return 0 or empty result > set) or stumble over "flush_length" files (return unexplainable errors). > > For instance, Facebook Presto couldn't read subfolders by default unless > you activate special parameters. But it stumbles over "flush_length" files > as Presto expect legal ORC files not 8-byte-length text files in folders. > > So, I don't advise you to use Hive Streaming, transactions right now in > real production systems (24 / 7 /365) with hundreds millions of events a > day. > > On Sat, Mar 12, 2016 at 11:24 AM, @Sanjiv Singh <sanjiv.is...@gmail.com> > wrote: > >> Hi All, >> >> I am facing this issue on HDP setup on which COMPACTION is required only >> once for transactional tables to fetch records with Spark SQL. >> On the other hand, Apache setup doesn't required compaction even once. >> >> May be something got triggered on meta-store after compaction, Spark SQL >> start recognizing delta files. >> >> Let know me if needed other details to get root cause. >> >> Try this, >> >> *See complete scenario :* >> >> hive> create table default.foo(id int) clustered by (id) into 2 buckets >> STORED AS ORC TBLPROPERTIES ('transactional'='true'); >> hive> insert into default.foo values(10); >> >> scala> sqlContext.table("default.foo").count // Gives 0, which is wrong >> because data is still in delta files >> >> Now run major compaction: >> >> hive> ALTER TABLE default.foo COMPACT 'MAJOR'; >> >> scala> sqlContext.table("default.foo").count // Gives 1 >> >> hive> insert into foo values(20); >> >> scala> sqlContext.table("default.foo").count* // Gives 2 , no compaction >> required.* >> >> >> >> >> Regards >> Sanjiv Singh >> Mob : +091 9990-447-339 >> > >