Re: Using Phoenix Bulk Upload CSV to upload 200GB data

2015-09-11 Thread Krishna
1400 mappers on 9 nodes is about 155 mappers per datanode which sounds high to me. There are very few specifics in your mail. Are you using YARN? Can you provide details like table structure, # of rows & columns, etc. Do you have an error stack? On Friday, September 11, 2015, Gaurav Kanade wrote

Using Phoenix Bulk Upload CSV to upload 200GB data

2015-09-11 Thread Gaurav Kanade
Hi All I am new to Apache Phoenix (and relatively new to MR in general) but I am trying a bulk insert of a 200GB tar separated file in an HBase table. This seems to start off fine and kicks off about ~1400 mappers and 9 reducers (I have 9 data nodes in my setup). At some point I seem to be runnin

Re: yet another question...perhaps dumb...JOIN with two conditions

2015-09-11 Thread Aaron Bossert
Don't have a quantum computer...but am on a small supercomputer ;). 1500 cores, 6TB of memory, 40TB of SSD, and a few hundred TB of spinning disks... Sent from my iPhone > On Sep 11, 2015, at 1:23 PM, James Heather wrote: > > With your query as it stands, you're trying to construct 250K*270M p

Re: yet another question...perhaps dumb...JOIN with two conditions

2015-09-11 Thread James Heather
With your query as it stands, you're trying to construct 250K*270M pairs before filtering them. That's 67.5 trillion. You will need a quantum computer. I think you will be better off restructuring... James On 11 Sep 2015 5:34 pm, "M. Aaron Bossert" wrote: > AH! Now I get it...I am running on a

Re: yet another question...perhaps dumb...JOIN with two conditions

2015-09-11 Thread M. Aaron Bossert
AH! Now I get it...I am running on a pretty beefy cluster...I would have thought this would work, even if a bit slower. Do you know which timeout settings I would need to alter to get this to work? On Fri, Sep 11, 2015 at 12:26 PM, Maryann Xue wrote: > Yes, I know. That timeout was because Pho

Re: yet another question...perhaps dumb...JOIN with two conditions

2015-09-11 Thread Maryann Xue
Yes, I know. That timeout was because Phoenix was doing CROSS JOIN which made progressing with each row very slow. Even if it could succeed, it would take a long time to complete. Thanks, Maryann On Fri, Sep 11, 2015 at 11:58 AM, M. Aaron Bossert wrote: > So, I've tried it both ways. The IPV4R

Re: yet another question...perhaps dumb...JOIN with two conditions

2015-09-11 Thread M. Aaron Bossert
So, I've tried it both ways. The IPV4RANGES table is small at around 250k rows, while the other table is around 270M rows. I did a bit of googling and see that the error I am seeing is related to hbase timeouts-ish...Here is the description: "Thrown if a region server is passed an unknown scanne

Re: yet another question...perhaps dumb...JOIN with two conditions

2015-09-11 Thread Maryann Xue
Hi Aaron, As Jaime pointed out, it is a non-equi join. And unfortunately it is handled as CROSS join in Phoenix and thus is not very efficient. For each row from the left side, it will be joined with all of the rows from the right side before the condition is a applied to filter the joined result.

Re: setting up community repo of Phoenix for CDH5?

2015-09-11 Thread Andrew Purtell
Or once parameterized, add a default off profile that redefines them all in one shot after the builder activates the profile on the maven command line with -P ... > On Sep 11, 2015, at 7:05 AM, Andrew Purtell wrote: > > The group IDs and versions can be parameterized in the POM so they can

Re: setting up community repo of Phoenix for CDH5?

2015-09-11 Thread Andrew Purtell
The group IDs and versions can be parameterized in the POM so they can be overridden on the maven command line with -D. That would be easy and something I think we could get committed without any controversy. > On Sep 11, 2015, at 6:53 AM, James Heather wrote: > > Yes, my plan is to create a

Re: yet another question...perhaps dumb...JOIN with two conditions

2015-09-11 Thread M. Aaron Bossert
Not sure where the problem is, but when I run the suggested query, I get the following error...and when I try is with the sort/merge join hint, I get yet a different error: java.lang.RuntimeException: org.apache.phoenix.exception.PhoenixIOException: org.apache.phoenix.exception.PhoenixIOException:

Re: setting up community repo of Phoenix for CDH5?

2015-09-11 Thread James Heather
Yes, my plan is to create a fork of the main repo, so that we can still merge new Phoenix code into the CDH-compatible version. Before that, I do wonder whether it's possible to suggest a few changes to the main repo that would allow for compiling a CDH-compatible version, without needing to m

Re: setting up community repo of Phoenix for CDH5?

2015-09-11 Thread Andrew Purtell
The first step I think is a repo with code that compiles. Please initialize it by forking github.com/apache/phoenix so we have common ancestors. Once we have a clear idea (by diff) what is required we can figure out if we can support compatibility in some way. > On Sep 9, 2015, at 11:00 PM, Kr

Re: Error: Encountered exception in sub plan [0] execution.

2015-09-11 Thread Maryann Xue
Hi Alberto, Could you please check in your server log if there's an ERROR, probably something like InsufficientMemoryException? Thanks, Maryann On Fri, Sep 11, 2015 at 7:04 AM, Alberto Gonzalez Mesas wrote: > Hi! > > I create two tables: > > CREATE TABLE "Customers2" ("CustomerID" VARCHAR NOT

Error: Encountered exception in sub plan [0] execution.

2015-09-11 Thread Alberto Gonzalez Mesas
Hi! I create two tables: CREATE TABLE "Customers2" ("CustomerID" VARCHAR NOT NULL PRIMARY KEY, "C"."CustomerName" VARCHAR, "C"."Country" VARCHAR ) and CREATE TABLE "Orders2" ("OrderID" VARCHAR NOT NULL PRIMARY KEY, "O"."CustomerID" VARCHAR, "O"."Date" VARCHAR, "O"."ItemID" V

Re: index creation partly succeeds if it times out

2015-09-11 Thread James Heather
Ah, too late, I'm afraid. I dropped it. James On 11/09/15 11:41, rajeshb...@apache.org wrote: James, It should be in building state. Can you check what's the state of it? Thanks, Rajeshbabu. On Fri, Sep 11, 2015 at 4:04 PM, James Heather mailto:james.heat...@mendeley.com>> wrote: Hi Ra

Re: index creation partly succeeds if it times out

2015-09-11 Thread rajeshb...@apache.org
James, It should be in building state. Can you check what's the state of it? Thanks, Rajeshbabu. On Fri, Sep 11, 2015 at 4:04 PM, James Heather wrote: > Hi Rajeshbabu, > > Thanks--yes--I've done that. I'm now recreating the index with a long > timeout. > > I reported it because it seemed to me

Re: index creation partly succeeds if it times out

2015-09-11 Thread James Heather
Hi Rajeshbabu, Thanks--yes--I've done that. I'm now recreating the index with a long timeout. I reported it because it seemed to me to be a bug: Phoenix thinks that the index is there, but it's not. It ought to get cleaned up after a timeout. James On 11/09/15 11:32, rajeshb...@apache.org

Re: index creation partly succeeds if it times out

2015-09-11 Thread rajeshb...@apache.org
Hi James, You can drop the partially created index and try following steps 1) Add the following property to hbase-site.xml at phoenix client side. phoenix.query.timeoutMs double of default value 2) Export the HBASE_CONF_PATH with the configuration directory where hbase-site.xml present. 3) then

index creation partly succeeds if it times out

2015-09-11 Thread James Heather
I just tried to create an index on a column for a table with 200M rows. Creating the index timed out: 0: jdbc:phoenix:172.31.31.143> CREATE INDEX idx_lastname ON loadtest.testing (lastname); Error: Operation timed out (state=TIM01,code=6000) java.sql.SQLTimeoutException: Operation ti