Re: Importing currency figures with currency indicator to Hive from CSV

2016-01-16 Thread matshyeq
Try: REGEXP_REPLACE(net,'[^\\d\\.]',''); Thank you, Kind Regards ~Maciek On 15 January 2016 at 17:33, Mich Talebzadeh wrote: > OK this is a convoluted way of doing it using SUBSTR and REGEXP_REPLACE > UDF to get rid of ‘?’ and commas in the curremcy imported from CSV file > > > > > > > > 0: jdb

Re: Converting date format and excel money format in Hive table

2016-01-15 Thread matshyeq
try: select cast(unix_timestamp('02/10/2014', 'dd/MM/')*1000 as timestamp); Kind Regards ~Maciek On 15 January 2016 at 10:15, Mich Talebzadeh wrote: > Hi, > > > > > > I am importing an excel sheet saved as csv file comma separated and > compressed with bzip2 into Hive as external table with b

Re: Passing parameters to Beeline

2016-01-13 Thread matshyeq
Try *--hivevar **name=value* works for hive and should work for beeline too: https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-Beeline–NewCommandLineShell

Re: Bucketing is not leveraged in filter push down ?

2015-09-24 Thread matshyeq
There's JIRAs for that already: HIVE-9523 HIVE-11525 Thank you, Kind Regards ~Maciek On Thu, Sep 24, 2015 at 8:29 AM, Jeff Zhang wrote: > I have one table which is bucketed on column name. Then

Re: WHERE ... NOT IN (...) + NULL values = BUG

2015-07-07 Thread matshyeq
>Obviously, the expected answer is always 2. That's incorrect. It's expected behaviour, SQL standard and I would expect every other DBs behave same way. The direct comparison to NULL returns FALSE. Always. Doesn't matter if used as <> ,=, IN, NOT IN. IS (NOT) NULL is the right way to handle such

Re: Hive on Spark VS Spark SQL

2015-05-20 Thread matshyeq
>From my experience SparkSQL is still way faster than tez. Also, SparkSQL (even 1.2.1 which I'm on) supports *lateral view* On Wed, May 20, 2015 at 3:41 PM, Edward Capriolo wrote: > Beyond window queries, hive still has concepts like cube or lateral view > that many "better than hive" systems do

Re: Default schema

2015-04-14 Thread matshyeq
For the following I suggested: "…or setting param file (like hive-env.sh, hiverc or hive-site.xml…)?" I don't know what property or variable to set up? Would you provide an example excerpt? Thank you, Kind Regards ~Maciek On Tue, Apr 14, 2015 at 10:43 PM, Bala Krishna Gangisetty < b...@altiscal

Re: Bucket pruning

2015-03-23 Thread matshyeq
To me there's practically very little difference between partitioning and bucketing (partitioning defines split criteria explicitly whereas bucketing somewhat implicitly) . Hive however recognises the latter as a separate feature and handles the two in quite different way. There's already a featur

Re: Partitioned table and Bucket Map Join

2015-01-30 Thread matshyeq
) might encourage *hive* to join them efficiently in the subsequent stage... On Thu, Jan 29, 2015 at 6:55 PM, murali parimi < muralikrishna.par...@icloud.com> wrote: > agreed! > > On Jan 29, 2015, at 11:42 PM, matshyeq wrote: > > no confusion here. > My use case is exac

Re: Partitioned table and Bucket Map Join

2015-01-29 Thread matshyeq
gt;> hive.enforce.sorting=true; > > > Hive version 13. > > Any thoughts! > > Thanks, > Murali > > > On Jan 29, 2015, at 07:44 PM, matshyeq wrote: > > My hunch is while partitioning is in fact very similar to bucketing > (actually superior as you have som

Re: Partitioned table and Bucket Map Join

2015-01-29 Thread matshyeq
in the cache as we > have huge memory. But the map sided join didn't happen. What could be the > reason? > > Sent from my iPhone > > > On Jan 29, 2015, at 7:38 AM, matshyeq wrote: > > > > I do have two tables partitioned on the same criteria. >

Partitioned table and Bucket Map Join

2015-01-29 Thread matshyeq
I do have two tables partitioned on the same criteria. Could I still take advantage of Bucket Map Join or better, Sort Merge Bucket Map Join? How? ~Maciek

ORC split size and the number of mappers launched

2015-01-28 Thread matshyeq
How is the number of mappers to be launched calculated exactly? Is the file format and compression taken into the picture (256MB compressed data would give much more MB when mapper decompresses it)? I've created a couple of ORC files (no compression, 1file=1table) with different stripe size settin