Issue w/ CsvBulkUploadTool when column data has "," character

2016-10-07 Thread Riesland, Zack
I am trying to use the CsvBulkUploadTool to get data from Hive to HBase. As I typically do, I created a Hive table w/ the copy of the data that I care about, and with the properties: "row format delimited fields terminated by '|' null defined as 'null' stored as textfile location 'my location'

Can I change a String column's size and preserve the data?

2016-10-06 Thread Riesland, Zack
I have a column on a table that is set to varchar(40). I need to increase that 40, but I don't want to lose any of the data in the table. The only suggestions I've seen online involve dropping the column and re-creating it, or creating a new table. But I would like to preserve the name of this

Using CsvBulkInsert With compressed Hive data

2016-09-29 Thread Riesland, Zack
For a very long time, we've had a workflow that looks like this: Export data from a compressed, orc hive table to another hive table that is "external stored as text file". No compression specified. Then, we point to the folder "x" behind that new table and use CsvBulkInsert to get data to Hbas

Help w/ table that suddenly keeps timing out

2016-08-29 Thread Riesland, Zack
​Our cluster recently had some issue related to network outages*. When all the dust settled, Hbase eventually "healed" itself, and almost everything is back to working well, with a couple of exceptions. In particular, we have one table where almost every (Phoenix) query times out - which was ne

Guidance to improve upsert performance

2016-08-03 Thread Riesland, Zack
Hello, I'm working on a POC to use HBase + Phoenix as a DB layer for a system that consumes several thousand (10,000 to 40,000) messages per second. Our cluster is fairly small: 4 region servers supporting about a dozen tables. We are currently experimenting with salting - our first pass was 4

Help w/ BulkInsert After upgrade/downgrade

2016-05-23 Thread Riesland, Zack
I have a table with a primary key that performs well, as well as 2 indexes, which I created like this: CREATE INDEX _indexed_meterkey_v2 on _indexed_meterkey_immutable_v2 (meter_key) ( and is just some obfuscation for the purposes of posting here) We WERE running Phoenix 4.6, which I had ma

RE: Help with dates

2016-04-05 Thread Riesland, Zack
value: CAST(my_bigint as DATE) Thanks, James On Tue, Apr 5, 2016 at 6:31 AM, Riesland, Zack mailto:zack.riesl...@sensus.com>> wrote: I have ms-based, GMT timestamps in BigInt columns in one of my phoenix tables. It’s easy to work with these in Java, but I’m struggling to find the right

Help with dates

2016-04-05 Thread Riesland, Zack
I have ms-based, GMT timestamps in BigInt columns in one of my phoenix tables. It's easy to work with these in Java, but I'm struggling to find the right syntax to easily read them in a simple query. For example: '1458132989477' I know this is Wed, 16 Mar 2016 12:56:29.477 GMT But when I do so

leveraging hive.hbase.generatehfiles

2016-02-24 Thread Riesland, Zack
We continue to have issues getting large amounts of data from Hive into Phoenix. BulkLoading is very slow and often fails for very large data sets. I stumbled upon this article that seems to present an interesting alternative: https://community.hortonworks.com/articles/2745/creating-hbase-hfiles

RE: Multiple upserts via JDBC

2016-02-19 Thread Riesland, Zack
eb 16, 2016 at 8:10 AM, Riesland, Zack wrote: > I have a handful of VERY small phoenix tables (< 100 entries). > > > > I wrote some javascript to interact with the tables via servlet + JDBC. > > > > I can query the data almost instantaneously, but upserting is > ex

Multiple upserts via JDBC

2016-02-16 Thread Riesland, Zack
I have a handful of VERY small phoenix tables (< 100 entries). I wrote some javascript to interact with the tables via servlet + JDBC. I can query the data almost instantaneously, but upserting is extremely slow - on the order of tens of seconds to several minutes. The main write operation does

Workaround for a hive bug?

2016-01-28 Thread Riesland, Zack
I'm looking for help or workaround ideas for a hive bug. I know this is the Phoenix mailing list, but this issue has to do with getting data from hive into phoenix, and I'm hoping someone might have some ideas. Basically: in order to use the CsvBulkExport tool, I take my source data table (comp

Phoenix and Tableau

2016-01-28 Thread Riesland, Zack
Hey folks, Everything I've read online about connecting Phoenix and Tableau is at least a year old. Has there been any progress on an ODBC driver? Any simple hacks to accomplish this? Thanks!

RE: Telco HBase POC

2016-01-20 Thread Riesland, Zack
I have a similar data pattern and 100ms response time is fairly consistent. I’ve been trying hard to find the right set of configs to get closer to 10-20ms with no luck, but I’m finding that 100ms average is pretty reasonable. From: Willem Conradie [mailto:willem.conra...@pbtgroup.co.za] Sent: W

Guidance on how many regions to plan for

2016-01-18 Thread Riesland, Zack
In the past, my struggles with hbase/phoenix have been related to data ingest. Each night, we ingest lots of data via CsvBulkUpload. After lots of trial and error trying to get our largest table to cooperate, I found a primary key that distributes well if I specify the split criteria on table c

RE: Java Out of Memory Errors with CsvBulkLoadTool

2015-12-18 Thread Riesland, Zack
We are able to ingest MUCH larger sets of data (hundreds of GB) using the CSVBulkLoadTool. However, we have found it to be a huge memory hog. We dug into the source a bit and found that HFileOutputFormat.configureIncrementalLoad(), in using TotalOrderPartitioner and KeyValueReducer, ultimatel

RE: CsvBulkUpload not working after upgrade to 4.6

2015-12-14 Thread Riesland, Zack
easiest thing to do here (if you're up for it) is recompile the phoenix jars (at least the fat client jar) against the specific version of HBase that you've got on your cluster. Assuming that all compiles, it should resolve this issue. - Gabriel On Fri, Dec 11, 2015 at 2:01 PM, Riesl

RE: CsvBulkUpload not working after upgrade to 4.6

2015-12-11 Thread Riesland, Zack
, 2015 at 10:41 AM, Riesland, Zack mailto:zack.riesl...@sensus.com>> wrote: Thanks Samarth, I’m running hbase 0.98.4.2.2.8.0-3150 and phoenix 4.6.0-HBase-0.98 The hbase stuff is there via the HDP 2.2.8 install. It worked before upgrading to 4.6. From: Samarth Jain [mailto:sama...@apac

RE: Help tuning for bursts of high traffic?

2015-12-10 Thread Riesland, Zack
e Did you get 3800ms for stmt.executeQuery() itself or did that time include time spent in retrieving records via resultSet.next() too? On Thu, Dec 10, 2015 at 7:38 AM, Riesland, Zack mailto:zack.riesl...@sensus.com>> wrote: Thanks, I did some experimenting. Now, anytime I get a query t

RE: CsvBulkUpload not working after upgrade to 4.6

2015-12-09 Thread Riesland, Zack
the right hbase-client jar in place? - Samarth On Wed, Dec 9, 2015 at 4:30 AM, Riesland, Zack mailto:zack.riesl...@sensus.com>> wrote: This morning I tried running the same operation from a data node as well as a name node, where phoenix 4.2 is completely gone, and I get the exact sam

RE: CsvBulkUpload not working after upgrade to 4.6

2015-12-09 Thread Riesland, Zack
This morning I tried running the same operation from a data node as well as a name node, where phoenix 4.2 is completely gone, and I get the exact same error. From: Riesland, Zack Sent: Tuesday, December 08, 2015 8:42 PM To: user@phoenix.apache.org Subject: CsvBulkUpload not working after

CsvBulkUpload not working after upgrade to 4.6

2015-12-08 Thread Riesland, Zack
I upgraded our cluster from 4.2.2 to 4.6. After a few hiccups, everything seems to be working: I can connect and interact with the DB using Aqua Studio. My web stuff that queries Phoenix works, using the new client jar. My java code to connect and interact with the DB works, using the new clien

RE: Help tuning for bursts of high traffic?

2015-12-07 Thread Riesland, Zack
, 2015, at 12:20 PM, Riesland, Zack mailto:zack.riesl...@sensus.com>> wrote: James, 2 quick followups, for whatever they’re worth: 1 – There is nothing phoenix-related in /tmp 2 – I added a ton of logging, and played with the properties a bit, and I think I see a pattern: Watching the

RE: Help tuning for bursts of high traffic?

2015-12-04 Thread Riesland, Zack
uning for bursts of high traffic? Any chance of stack dumps from the debug servlet? Impossible to get anywhere with 'pegged the CPU' otherwise. Thanks. On Dec 4, 2015, at 12:20 PM, Riesland, Zack mailto:zack.riesl...@sensus.com>> wrote: James, 2 quick followups, for whate

RE: Help tuning for bursts of high traffic?

2015-12-04 Thread Riesland, Zack
ny further feedback you can provide on this. Hopefully the conversation is helpful to the whole Phoenix community. From: Riesland, Zack Sent: Friday, December 04, 2015 1:36 PM To: user@phoenix.apache.org Cc: geoff.hai...@sensus.com Subject: RE: Help tuning for bursts of high traffic? Thanks, James

RE: Help tuning for bursts of high traffic?

2015-12-04 Thread Riesland, Zack
w?original_referer=https://twitter.com/about/resources/buttons®ion=follow_link&screen_name=megamda&source=followbutton&variant=2.0> [Description: Macintosh HD:Users:Kumarappan:Desktop:linkedin.gif] <http://www.linkedin.com/in/kumarpalaniappan> On Dec 4, 2015, at 6:45 AM, Riesland,

RE: Help tuning for bursts of high traffic?

2015-12-04 Thread Riesland, Zack
4, 2015 at 9:09 AM, Riesland, Zack mailto:zack.riesl...@sensus.com>> wrote: SHORT EXPLANATION: a much higher percentage of queries to phoenix return exceptionally slow after querying very heavily for several minutes. LONGER EXPLANATION: I’ve been using Pheonix for about a year as a data

Help tuning for bursts of high traffic?

2015-12-04 Thread Riesland, Zack
SHORT EXPLANATION: a much higher percentage of queries to phoenix return exceptionally slow after querying very heavily for several minutes. LONGER EXPLANATION: I've been using Pheonix for about a year as a data store for web-based reporting tools and it works well. Now, I'm trying to use the

How to end line on Phoenix queries?

2015-12-03 Thread Riesland, Zack
I'm using Phoenix + Aqua Data Studio. For other kinds of (jdbc) connections, I can run multiple queries: select a, b, c from d; select x from y; However, Phoenix doesn't seem to like the trailing semicolon. If I have a semicolon character at the end of a line, I get an error like this: ERROR

Get a count of open connections?

2015-12-03 Thread Riesland, Zack
Is there some way to find out how many open Connections there are to my Phoenix DB?

RE: Help With CSVBulkLoadTool

2015-10-23 Thread Riesland, Zack
have a stack trace from the log output from when you got this error? And could you tell me if the table name that is being complained about there is an index table name? Tracing through the code, it looks like you could get this exception if an index table doesn't exist (or somehow i

RE: Help With CSVBulkLoadTool

2015-10-23 Thread Riesland, Zack
be some kind of configuration issue with your cluster(s), but if that's the case then I would expect that you'd be getting the same error every time. - Gabriel On Wed, Oct 21, 2015 at 2:42 PM, Riesland, Zack wrote: > Hello, > > > > We recently upgraded our Hadoop stack fro

Help With CSVBulkLoadTool

2015-10-21 Thread Riesland, Zack
Hello, We recently upgraded our Hadoop stack from HDP 2.2.0 to 2.2.8 The phoenix version (phoenix-4.2.0.2.2.8.0) and HBase version (0.98.4.2.2.8.0) did not change (from what I can tell). However, some of our CSVBulkLoadTool jobs have started to fail. I'm not sure whether this is related to the

RE: Connecting to Phoenix from AquaDataStudio?

2015-09-30 Thread Riesland, Zack
p 30, 2015 at 2:10 PM, Riesland, Zack wrote: > Thanks Gabriel, > > I replaced all the Hadoop and hbase related jars under Aqua Data > Studio/lib/apache with the appropriate ones from our cluster and I *think* I > made some progress. > > Seems like I'm now missi

RE: Connecting to Phoenix from AquaDataStudio?

2015-09-30 Thread Riesland, Zack
HBase somewhere (earlier) in the classpath. I don't know anything about Aqua Data Studio, but could it be that it somehow bundles support for HBase 0.94 somewhere (or perhaps there is another JDBC driver on the class path that workds with HBase 0.94?) - Gabriel On Wed, Sep 30, 2015 at 1:37 PM

Connecting to Phoenix from AquaDataStudio?

2015-09-30 Thread Riesland, Zack
Has anyone been able to use Aqua Data Studio w/ Phoenix? I had success w/ DBVisualizer, but am not able to connect from ADS. I tried to create a “generic JDBC connection” using the connection wizard. I pointed at the appropriate driver jar: 4.2.0.2.2.0.0-2041-client.jar in our case. But when I

Capacity Scheduler Queues in CSVBulkLoadTool?

2015-09-24 Thread Riesland, Zack
Hello, Can someone tell me whether it is possible to specify a Capacity Scheduler queue for the CSVBulkLoadTool's MapReduce job to use? Thanks!

How to force timeout when connection fails

2015-09-03 Thread Riesland, Zack
If I use JDBC to DriverManager.getConnection("myPhoenixURL", myProperties), and HBase is down (say, all the region servers are stopped), it takes a VERY long time to timeout. In fact, I'm not sure it does. The flow just stops at that statement until I bring HBase back to life. I tried setting

RE: Help Tuning CsvBulkImport MapReduce

2015-09-01 Thread Riesland, Zack
] Sent: Tuesday, September 01, 2015 6:43 AM To: user@phoenix.apache.org Subject: Re: Help Tuning CsvBulkImport MapReduce On Tue, Sep 1, 2015 at 11:29 AM, Riesland, Zack wrote: > You say I can find information about spills in the job counters. Are > you talking about “failed” map tasks, or is

RE: Help Tuning CsvBulkImport MapReduce

2015-09-01 Thread Riesland, Zack
h really wide rows. How many columns are you importing into your table? - Gabriel On Mon, Aug 31, 2015 at 3:20 PM Riesland, Zack mailto:zack.riesl...@sensus.com>> wrote: I’m looking for some pointers on speeding up CsvBulkImport. Here’s an example: I took about 2 billion rows from hive and

Help Tuning CsvBulkImport MapReduce

2015-08-31 Thread Riesland, Zack
I'm looking for some pointers on speeding up CsvBulkImport. Here's an example: I took about 2 billion rows from hive and exported them to CSV. HDFS decided to translate this to 257 files, each about 1 GB. Running the CsvBulkImport tool against this folder results in 1,835 mappers and then 1 re

RE: Exception from RowCounter

2015-07-26 Thread Riesland, Zack
Subject: Re: Exception from RowCounter PHOENIX-1248 is marked as resolved. Are you using a version of Phoenix before this fix? On Sun, Jul 26, 2015 at 7:22 AM, Riesland, Zack mailto:zack.riesl...@sensus.com>> wrote: Thanks James, I am not able to use salt_buckets because I need to impor

RE: Exception from RowCounter

2015-07-26 Thread Riesland, Zack
ging >that yourself? Thanks, James On Sat, Jul 25, 2015 at 4:04 AM, Riesland, Zack mailto:zack.riesl...@sensus.com>> wrote: I decided to start from scratch with my table schema in attempt to get a better distribution across my regions/region servers. So, I created a table like this: C

RE: Exception from RowCounter

2015-07-26 Thread Riesland, Zack
just salt the data table instead of manually salting it and managing >that yourself? Thanks, James On Sat, Jul 25, 2015 at 4:04 AM, Riesland, Zack mailto:zack.riesl...@sensus.com>> wrote: I decided to start from scratch with my table schema in attempt to get a better distribution acros

Exception from RowCounter

2015-07-25 Thread Riesland, Zack
I decided to start from scratch with my table schema in attempt to get a better distribution across my regions/region servers. So, I created a table like this: CREATE TABLE fma.er_keyed_gz_hashed_indexed_meterkey_immutable ( hashed_key varchar not null, meter_key varchar , ... en

RE: Understanding keys

2015-07-23 Thread Riesland, Zack
ve to scan over more rows than if the primary key (A, B, C) were defined. - Gabriel 1. http://phoenix.apache.org/skip_scan.html On Thu, Jul 23, 2015 at 11:45 AM Riesland, Zack mailto:zack.riesl...@sensus.com>> wrote: This is probably a silly question… please humor me: I’m a Java/J

Understanding column splitting

2015-07-23 Thread Riesland, Zack
I want to make sure that I'm splitting my columns as effectively as possible, and I want to make sure I understand the syntax. Suppose I have a table that is 'split on' ('_B0', '_B1', '_B2', '_B3', '_B4', '_B5', '_B6', '_B7', '_B8', '_B9', '_B

Understanding keys

2015-07-23 Thread Riesland, Zack
This is probably a silly question... please humor me: I'm a Java/JS developer learning about databases as I go. Suppose I have a table with columns A-Z, and declare the primary key to be (A, B, C). I understand that that forces each row to have a unique A, B, C combination. But what does it me

RE: How fast is upsert select?

2015-07-22 Thread Riesland, Zack
e the rows are in your table. On a 8 node cluster, creating an index with 3 columns (char(15),varchar and date) on a 1 billion row table takes about 1 hour 15 minutes. How many rows does your table have and how wide are they? On Wed, Jul 22, 2015 at 8:29 AM, Riesland, Zack wrote: > Tha

RE: How fast is upsert select?

2015-07-22 Thread Riesland, Zack
en as it does a similar task of reading from a Phoenix table and writes the data into the target table using bulk load. Regards Ravi On Wed, Jul 22, 2015 at 6:23 AM, Riesland, Zack mailto:zack.riesl...@sensus.com>> wrote: I want to play with some options for splitting a table to test perfo

How fast is upsert select?

2015-07-22 Thread Riesland, Zack
I want to play with some options for splitting a table to test performance. If I were to create a new table and perform an upsert select * to the table, with billions of rows in the source table, is that like an overnight operation or should it be pretty quick? For reference, we have 6 (beefy)

More help with secondary indexes

2015-07-22 Thread Riesland, Zack
I have a table like this: CREATE TABLE fma. er_keyed_gz_meterkey_split_custid ( meter_key varchar not null, ... sample_point integer not null, ... endpoint_id integer, ... CONSTRAINT pk_rma_er_keyed_filtered PRIMARY KEY (meter_key, sample_point) ) COMPRESSION='G

RE: Help with secondary index

2015-07-21 Thread Riesland, Zack
us know. On Tue, Jul 21, 2015 at 11:39 AM, Riesland, Zack mailto:zack.riesl...@sensus.com>> wrote: This is my first time messing with a secondary index in Phoenix. I used this syntax: create index fma_er_keyed_gz_endpoint_id_include_sample_point on fma.er_keyed_gz_meterkey_split_custid

Help with secondary index

2015-07-21 Thread Riesland, Zack
This is my first time messing with a secondary index in Phoenix. I used this syntax: create index fma_er_keyed_gz_endpoint_id_include_sample_point on fma.er_keyed_gz_meterkey_split_custid (endpoint_id) include (sample_point) SALT_BUCKETS = 550; and I get this error: [Error Code: 1029, SQL Sta

bouncing messages?

2015-07-17 Thread Riesland, Zack
I keep getting emails from the mail system with warnings like the one below. Is anyone else seeing this? This is my work email address and I don't typically have any issues with it... -- Messages to you from the user mailing list seem to have been

RE: How to adjust primary key on existing table

2015-07-14 Thread Riesland, Zack
, Riesland, Zack mailto:zack.riesl...@sensus.com>> wrote: Thanks James, That’s what I thought. If I were to make a NEW table with the same columns, is there a simple way to copy the data from the old table to the new one? From: James Taylor [mailto:jamestay...@apache.org<mailto

RE: How to adjust primary key on existing table

2015-07-14 Thread Riesland, Zack
14 AM, Riesland, Zack mailto:zack.riesl...@sensus.com>> wrote: Thanks James, To clarify: the column already exists on the table, but I want to add it to the primary key. Is that what your example accomplishes? From: James Taylor [mailto:jamestay...@apache.org<mailto:jamestay...@apa

RE: How to adjust primary key on existing table

2015-07-14 Thread Riesland, Zack
table ALTER TABLE t ADD my_new_col VARCHAR PRIMARY KEY The new column must be nullable and the last existing PK column cannot be nullable and fixed width (or varbinary or array). On Tue, Jul 14, 2015 at 10:01 AM, Riesland, Zack mailto:zack.riesl...@sensus.com>> wrote: This is probably

How to adjust primary key on existing table

2015-07-14 Thread Riesland, Zack
This is probably a lame question, but can anyone point me in the right direction for CHANGING and EXISTING primary key on a table? I want to add a column. Is it possible to do that without dropping the table? Thanks!

RE: Phoenix vs Hive

2015-07-14 Thread Riesland, Zack
If the counts are, indeed, different, then the next question is: how are you getting data from hive to phoenix? From: anil gupta [mailto:anilgupt...@gmail.com] Sent: Tuesday, July 14, 2015 3:48 AM To: user@phoenix.apache.org Subject: Re: Phoenix vs Hive You can do major compaction via Hbase shel

RE: Permissions Question

2015-07-10 Thread Riesland, Zack
cture). Assuming that at least one of these works for you (or even if they don't), could you add a ticket in the Phoenix JIRA (https://issues.apache.org/jira/browse/PHOENIX) so that we can track getting a more structural fix for this issue? - Gabriel On Tue, Jul 7, 2015 at 4:53 PM Ries

RE: Permissions Question

2015-07-07 Thread Riesland, Zack
...@gmail.com] Sent: Monday, July 06, 2015 3:11 PM To: user@phoenix.apache.org Subject: Re: Permissions Question The owner of the directory containing HFiles should be 'hbase' user and ownership can set using 'chown' command. On Mon, Jul 6, 2015 at 7:12 AM, Riesland, Zac

Permissions Question

2015-07-06 Thread Riesland, Zack
I've been running CsvBulkLoader as 'hbase' and that has worked well. But I now need to integrate with some scripts that will be run as another user. When I run under a different account, the CsvBulkLoader runs and creates the HFiles, but then encounters permission issues attempting to write the

RE: Help w/ connection issue

2015-07-06 Thread Riesland, Zack
x27;t running on the localhost and/or isn't configured in the local configuration (see http://phoenix.apache.org/bulk_dataload.html). - Gabriel On Mon, Jul 6, 2015 at 12:08 PM Riesland, Zack mailto:zack.riesl...@sensus.com>> wrote: Hello, I’m attempting to use the CsvBulkLoader tool

Help w/ connection issue

2015-07-06 Thread Riesland, Zack
Hello, I'm attempting to use the CsvBulkLoader tool from a new edge node. This edge node is not a data node or region server node on our cluster. It is intended to be used for running scripts and interacting with the cluster nodes. I manually installed all the phoenix files (I copied usr/hdp/

RE: Help interpretting CsvBulkLoader issues?

2015-07-01 Thread Riesland, Zack
After some investigation, I think this is a permissions issue. If I run as ‘hbase’, this works consistently. FYI From: Riesland, Zack Sent: Wednesday, July 01, 2015 7:25 AM To: user@phoenix.apache.org Subject: Help interpretting CsvBulkLoader issues? After using the CsvBulkLoader successfully

Help interpretting CsvBulkLoader issues?

2015-07-01 Thread Riesland, Zack
After using the CsvBulkLoader successfully for a few days, I’m getting some strange behavior this morning. I ran the job on a fairly small ingest of data (around 1/2 billion rows). It seemed to complete successfully. I see this in the logs: Phoenix MapReduce Import Upser

RE: How to count table rows from Java?

2015-06-29 Thread Riesland, Zack
rough the connection to 60 milliseconds (10mins). You can also set the phoenix.query.timeoutMs in your client-side hbase-sites.xml and it'll be used for the query timeout for all connections. Thanks, James On Mon, Jun 29, 2015 at 2:44 AM, Riesland, Zack wrote: > Thanks, James! >

RE: How to count table rows from Java?

2015-06-29 Thread Riesland, Zack
r any statement executing through the connection to 60 milliseconds (10mins). You can also set the phoenix.query.timeoutMs in your client-side hbase-sites.xml and it'll be used for the query timeout for all connections. Thanks, James On Mon, Jun 29, 2015 at 2:44 AM, Riesland,

RE: How to count table rows from Java?

2015-06-29 Thread Riesland, Zack
the less. On Friday, June 26, 2015, Riesland, Zack > wrote: I wrote a Java program that runs nightly and collects metrics about our hive tables. I would like to include HBase tables in this as well. Since select count(*) is slow and not recommended on Phoenix, what are my alternat

How to count table rows from Java?

2015-06-26 Thread Riesland, Zack
I wrote a Java program that runs nightly and collects metrics about our hive tables. I would like to include HBase tables in this as well. Since select count(*) is slow and not recommended on Phoenix, what are my alternatives from Java? Is there a way to call org.apache.hadoop.hbase.mapreduce.

RE: Bug in CsvBulkLoad tool?

2015-06-25 Thread Riesland, Zack
jobs, but any kind of job), as then if there is any kind of cleanup or something similar done in the driver program, it'll still get run even if the ssh session gets dropped. - Gabriel On Thu, Jun 25, 2015 at 8:47 PM Riesland, Zack mailto:zack.riesl...@sensus.com>> wrote: OH! So ev

RE: Bug in CsvBulkLoad tool?

2015-06-25 Thread Riesland, Zack
n doesn't end even if you drop your ssh connection. - Gabriel On Thu, Jun 25, 2015 at 8:27 PM Riesland, Zack mailto:zack.riesl...@sensus.com>> wrote: Thanks Gabriel, Then perhaps I discovered something interesting. After my last email, I created a new table with the exact same scrip

RE: Bug in CsvBulkLoad tool?

2015-06-25 Thread Riesland, Zack
ld be around the same value as well. Could you post the values that you've got on those counters? - Gabriel On Thu, Jun 25, 2015 at 4:41 PM Riesland, Zack mailto:zack.riesl...@sensus.com>> wrote: I started writing a long response, and then noticed something: When I created my new

RE: Bug in CsvBulkLoad tool?

2015-06-25 Thread Riesland, Zack
Thu, Jun 25, 2015 at 3:11 AM, Riesland, Zack mailto:zack.riesl...@sensus.com>> wrote: Earlier this week I was surprised to find that, after dumping tons of data from a Hive table to an HBase table, about half of the data didn’t end up in HBase. So, yesterday, I created a new Phoenix table.

Bug in CsvBulkLoad tool?

2015-06-25 Thread Riesland, Zack
Earlier this week I was surprised to find that, after dumping tons of data from a Hive table to an HBase table, about half of the data didn't end up in HBase. So, yesterday, I created a new Phoenix table. This time, I'm splitting on the first 6 characters of the key, which gives me about 1700 r

RE: SocketTimeoutException on Update Statistics

2015-06-24 Thread Riesland, Zack
that the fact that the command returns immediately isn't necessarily a bad thing (as long as you're not getting an error from it). - Gabriel On Wed, Jun 24, 2015 at 12:14 PM Riesland, Zack mailto:zack.riesl...@sensus.com>> wrote: Quick update: I found that I am able to execute ‘up

RE: SocketTimeoutException on Update Statistics

2015-06-24 Thread Riesland, Zack
causing “update statistics” not to work. From: Riesland, Zack Sent: Wednesday, June 24, 2015 5:07 AM To: user@phoenix.apache.org Subject: RE: SocketTimeoutException on Update Statistics Update: I read through some old posts about timeouts and noticed that HBASE_CONF_PATH was not set on this

RE: How To Count Rows In Large Phoenix Table?

2015-06-24 Thread Riesland, Zack
icient. You have to increase HBase RPC timeout as well - hbase.rpc.timeout. 3. Upgrading to HBase 1.1 will resolve your timeout issues (it has support for long running scanners), but this is probably not the option? -Vlad On Tue, Jun 23, 2015 at 6:19 AM, Riesland, Zack mailto:zack.riesl...@sens

RE: SocketTimeoutException on Update Statistics

2015-06-24 Thread Riesland, Zack
Jun 24 04:57:23 EDT 2015, null, java.net.SocketTimeoutException: callTimeout=6, callDuration=62310: row '' on table '' at region=,,1434377989918.552c1ed6d6d0c65ec30f467ed11ae0c3., hostname=,60020,1434375519767, seqNum=2 (state=08000,code=101) From: Riesland, Zack Sent: Tues

SocketTimeoutException on Update Statistics

2015-06-23 Thread Riesland, Zack
thing (in terms of querying) as if you were to split the regions. - Gabriel On Tue, Jun 23, 2015 at 7:56 PM Riesland, Zack mailto:zack.riesl...@sensus.com>> wrote: Thanks Gabriel, That’s all very helpful. I’m not at all sure that the timeouts are related to compactions. This is just

RE: CsvBulkLoad output questions

2015-06-23 Thread Riesland, Zack
rg/update_statistics.html On Tue, Jun 23, 2015 at 5:25 PM Riesland, Zack mailto:zack.riesl...@sensus.com>> wrote: This question is mostly a followup based on my earlier mail (below). I’m re-consuming this data, one (5GB) csv file at a time. I see that in consuming this file, there was on

CsvBulkLoad output questions

2015-06-23 Thread Riesland, Zack
Bytes Written=702177539 From: Riesland, Zack Sent: Tuesday, June 23, 2015 9:20 AM To: 'user@phoenix.apache.org' Subject: RE: How To Count Rows In Large Phoenix Table? Anil: Thanks for the tip about mapreduce.RowCounter. That takes about 70 minutes, but it works! Unfortunately, I only go

RE: How To Count Rows In Large Phoenix Table?

2015-06-23 Thread Riesland, Zack
Jun 22, 2015 at 12:08 PM, Ciureanu Constantin mailto:ciureanu.constan...@gmail.com>> wrote: Hive can connect to HBase and insert directly into any direction. Don't know if it also works via Phoenix... Counting is too slow on a single threaded job /command line - you should write a m

How To Count Rows In Large Phoenix Table?

2015-06-22 Thread Riesland, Zack
I had a very large Hive table that I needed in HBase. After asking around, I came to the conclusion that my best bet was to: 1 - export the hive table to a CSV 'file'/folder on the HDFS 2 - Use the org.apache.phoenix.mapreduce.CsvBulkLoadTool to import the data. I found that if I tried to pass t

Renaming a Phoenix Table

2015-06-19 Thread Riesland, Zack
Is it possible to rename a table in Phoenix? If so, how? I'm double-checking with the experts because if I screw this up, it will take 3 days to re-ingest all the data...

RE: Phoenix and Hive

2015-06-17 Thread Riesland, Zack
At HadoopSummit, Hortonworks hinted at a solution for this coming later in the year. I think the idea is a single driver that can interact with Hive, HBase, Phoenix, and others, and supports joining data across the connections. They didn’t provide very solid specifics, but there will probably b

RE: How to increase call timeout/count rows?

2015-06-16 Thread Riesland, Zack
15, 2015 at 6:49 AM, Riesland, Zack wrote: > Whenever I run a non-typical query (not filtered by the primary key), > I get an exception like this one. > > > > I tried modifying each of the following in custom hbase-site to > increase the > timeout:

How to increase call timeout/count rows?

2015-06-15 Thread Riesland, Zack
Whenever I run a non-typical query (not filtered by the primary key), I get an exception like this one. I tried modifying each of the following in custom hbase-site to increase the timeout: Hbase.client.scanner.timeout.period Hbase.regionserver.lease.period Hbase.rpc.shortoperation.timeout Hbas

RE: Guidance on table splitting

2015-06-15 Thread Riesland, Zack
table splitting Can you provide the Queries which you would be running on your table? Also use the MR Bulkload instead of using the CSV load tool. From: Riesland, Zack [mailto:zack.riesl...@sensus.com] Sent: Monday, June 15, 2015 4:03 PM To: user@phoenix.apache.org<mailto:u

RE: How to change region size limit

2015-06-15 Thread Riesland, Zack
@phoenix.apache.org Subject: RE: How to change region size limit It totally depends on the type of Query you would be running. If its point query then it makes sense else aggregates and top N queries might run slow. More load on the client for deriving final result. From: Riesland, Zack [mailto:zack.riesl

How to change region size limit

2015-06-15 Thread Riesland, Zack
At the Hadoop Summit last week, some guys from Yahoo presented on why it is wise to keep region size fairly small and region count fairly large. I am looking at my HBase config, but there are a lot of numbers that look like they're related to region size. What parameter limits the data size of

Guidance on table splitting

2015-06-15 Thread Riesland, Zack
I'm new to Hbase and to Phoenix. I needed to build a GUI off of a huge data set from HDFS, so I decided to create a couple of Phoenix tables, dump the data using the CSV bulk load tool, and serve the GUI from there. This all 'works', but as the data set grows, I would like to improve my table