Re: Dynamic Scaling of Accumulo

2023-04-05 Thread Dave Marion
Edit: In their opinion, they didn't see the need to manage the online/ondemand state at anything lower than the Tablet level Should read: In their opinion, they didn't see the need to manage the online/ondemand state at anything lower than the Table level On Wed, Apr 5, 2023 at 1:05 PM Dave Marion

Re: Dynamic Scaling of Accumulo

2023-04-05 Thread Dave Marion
I had a chance to talk with some users about how they would use this feature. In their opinion, they didn't see the need to manage the online/ondemand state at anything lower than the Tablet level. If we were to do that, then that would potentially put a management burden on them to make sure that

Re: Dynamic Scaling of Accumulo

2023-04-04 Thread Keith Turner
On Mon, Apr 3, 2023 at 3:33 PM Dave Marion wrote: > > I could see that working initially, but I think you would get some drift > over time as splits or merges happen. In your example, what happens when Drift is definitely something to consider in the design and users would definitely be changing

Re: Dynamic Scaling of Accumulo

2023-04-03 Thread Dave Marion
I could see that working initially, but I think you would get some drift over time as splits or merges happen. In your example, what happens when later someone adds splits for a - z? How would we know to mark (-inf, a] and (a, b] as HOSTED and (b,c] as ONDEMAND? In a table where the row is time and

Re: Dynamic Scaling of Accumulo

2023-04-03 Thread Keith Turner
On Mon, Apr 3, 2023 at 10:45 AM Dave Marion wrote: > > Looking through the code to see what would have to change to remove the > ondemand table state, I'm struggling to find a way to implement this > without having an ondemand state. Currently, the ondemand table state is We could have tablet sta

Re: Dynamic Scaling of Accumulo

2023-04-03 Thread Dave Marion
Looking through the code to see what would have to change to remove the ondemand table state, I'm struggling to find a way to implement this without having an ondemand state. Currently, the ondemand table state is set in ZooKeeper as the ZTABLE_STATE and both the client and the server use it. When

Re: Dynamic Scaling of Accumulo

2023-03-29 Thread Christopher
On Wed, Mar 29, 2023 at 5:33 AM Dave Marion wrote: > > > I think we should deprecate support for offline table scanning, since it > shouldn't be needed with the availability of ScanServers. > > Just making sure I understand your suggestion - you mean removing the > OfflineScanner and the ability t

Re: Dynamic Scaling of Accumulo

2023-03-29 Thread Dave Marion
> I think we should deprecate support for offline table scanning, since it shouldn't be needed with the availability of ScanServers. Just making sure I understand your suggestion - you mean removing the OfflineScanner and the ability to scan over offline tables in the MapReduce code, but we should

Re: Dynamic Scaling of Accumulo

2023-03-28 Thread Keith Turner
Changing the behavior of online tables instead of adding a new table state seems reasonable. One possible way to do this is that all tablets in an online table have a default goal state of hosted. A user can somehow define ranges of an online table to load tablets ondemand as needed. Could add a

Re: Dynamic Scaling of Accumulo

2023-03-28 Thread Christopher
I think we should deprecate support for offline table scanning, since it shouldn't be needed with the availability of ScanServers. Any MapReduce that previously relied on scanning offline tables could be made to use that instead. I agree there is a need to have an immutable table state, for which

Re: Dynamic Scaling of Accumulo

2023-03-28 Thread Drew Farris
On Mon, Mar 27, 2023 at 2:16 PM Keith Turner wrote: > One realization that came out examining the different table states is > that export table currently relies on the fact that offline tables > will not delete files. If we enable compactions on offline tables > then that could cause files to be

Re: Dynamic Scaling of Accumulo

2023-03-27 Thread Keith Turner
On Fri, Mar 24, 2023 at 9:27 AM Drew Farris wrote: > > I'll echo that the bulk import to offline tables is a useful feature and it > would be great to maintain this if we can. > > In the import table use case, for example, keeping the table offline allows > us to perform external validation on the

Re: Dynamic Scaling of Accumulo

2023-03-24 Thread Drew Farris
I'll echo that the bulk import to offline tables is a useful feature and it would be great to maintain this if we can. In the import table use case, for example, keeping the table offline allows us to perform external validation on the metadata table that all expected rfiles have been imported pri

Re: Dynamic Scaling of Accumulo

2023-03-23 Thread Christopher
In that case, I think it's probably sufficient to let the users know the risks of bulk importing and never bringing it online for compactions. It seems like that's a risk some users might be okay with for their use case. On Thu, Mar 23, 2023, 19:38 Dave Marion wrote: > Yes, if the table is never

Re: Dynamic Scaling of Accumulo

2023-03-23 Thread Dave Marion
Yes, if the table is never brought online. I believe that Keith said that the table could still be scanned when offline with existing MapReduce code or the OfflineScanner, which presents an issue that is not currently handled. I think we discussed today that the same thing could be achieved with ta

Re: Dynamic Scaling of Accumulo

2023-03-23 Thread Christopher
What do you mean by "when not used in this manner"? What other way is there to use that feature? Do you mean simply never being brought online? Would it be possible to support (external) compactions for an offline table? I feel like that's a pretty useful feature to revert, and would want to cons

Re: Dynamic Scaling of Accumulo

2023-03-23 Thread Dave Marion
Keith and I had a discussion today (that included some user input) regarding table operations with the new OnDemand table concept. I have put the notes up on the wiki at: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=247828052. One thing that came out of that is that we may want

Re: Dynamic Scaling of Accumulo

2023-03-20 Thread Dave Marion
Following up on this. Discussion and design documents are up on the wiki[1]. There is a GitHub project[2] for planning out some of the tasks, which are then turned into issues. Some of the issues have draft PRs submitted for them. [1] https://cwiki.apache.org/confluence/display/ACCUMULO/Elasticity

Dynamic Scaling of Accumulo

2023-02-22 Thread Dave Marion
Except for the new bulk import code, Accumulo requires that tables are in an online state to work with them (ingest, scan, compact, split, etc.). In some cases this could become cost prohibitive and resource inefficient as resources necessary to keep the tables online might be unused. I'd like to p