Re: Bug in either InMemoryMap or NativeMap

Keith Turner Fri, 19 Feb 2016 15:09:49 -0800

On Fri, Feb 19, 2016 at 5:59 PM, Dan Blum <[email protected]> wrote:

> We are already using logical time. I can definitely change to using
> multiple mutations or more likely sum up the values myself and make a
> single put call.
>


That sounds like a good way to go.  I created an issue.

https://issues.apache.org/jira/browse/ACCUMULO-4148


>
>
> *From:* Keith Turner [mailto:[email protected]]
> *Sent:* Friday, February 19, 2016 5:57 PM
> *To:* [email protected]
>
> *Subject:* Re: Bug in either InMemoryMap or NativeMap
>
>
>
>
>
>
>
> On Fri, Feb 19, 2016 at 5:14 PM, Dan Blum <[email protected]> wrote:
>
> Yes, please open an issue for this.
>
>
>
> In the meantime, as a workaround is it safe to assign an arbitrary
> increasing timestamp when calling Mutation.put()? That seems the simplest
> way to get the ColumnUpdates to be treated properly.
>
>
>
> Seems like that would work, but then you may have to keep track of the
> next timestamp across processes.
>
> A possible alternative is to configure the table to use logical time and
> multiple mutations.  Logical time ensures every mutation is assigned a
> unique timestamp. The following program is an example of this.
>
>     String table = getUniqueNames(1)[0];
>     Connector c = getConnector();
>     c.tableOperations().create(table,
>         new
> NewTableConfiguration().setTimeType(TimeType.LOGICAL).withoutDefaultIterators());
>
>     BatchWriterConfig config = new BatchWriterConfig();
>     BatchWriter writer = c.createBatchWriter(table, config);
>
>     Mutation m = new Mutation("row");
>     m.put("cf1", "cq1", new Value("abc".getBytes()));
>     writer.addMutation(m);
>     m = new Mutation("row");
>     m.put("cf1", "cq1", new Value("xyz".getBytes()));
>     writer.addMutation(m);
>     writer.close();
>
>     Scanner scanner = c.createScanner(table, Authorizations.EMPTY);
>     for (Entry<Key,Value> entry : scanner) {
>       System.out.println(entry);
>     }
>
>
> This program prints
>
>   row cf1:cq1 [] 2 false=xyz
>   row cf1:cq1 [] 1 false=abc
>
>
>
> Accumulo assigned the timestamps 1 and 2.    In this case Accumulo will
> keep track of the next timestamp for you.
>
>
>
> If you do not use logical time, then the two mutations would likely get
> the same timestamp because they arrived in the same millisecond.
>
>
>
> *From:* Keith Turner [mailto:[email protected]]
> *Sent:* Friday, February 19, 2016 5:11 PM
> *To:* [email protected]
> *Cc:* Jonathan Lasko; Maxwell Jordan; [email protected]
> *Subject:* Re: Bug in either InMemoryMap or NativeMap
>
>
>
>
>
>
>
> On Fri, Feb 19, 2016 at 3:34 PM, Dan Blum <[email protected]> wrote:
>
> (Resend: I forgot to actually subscribe before sending originally.)
>
> I noticed a difference in behavior between our cluster and our tests
> running
> on MiniCluster: when multiple put() calls are made to a Mutation with the
> same CF, CQ, and CV and no explicit timestamp, on a live cluster only the
> last one is written, whereas in Mini all of them are.
>
> Of course in most cases it wouldn't matter but if there is a Combiner set
> on
> the column (which is the case I am dealing with) then it does.
>
> I believe the difference in behavior is due to code in NativeMap._mutate
> and
> InMemoryMap.DefaultMap.mutate. In the former if there are multiple
> ColumnUpdates in a Mutation they all get written with the same
> mutationCount
> value; I haven't looked at the C++ map code but I assume that this means
> that entries with the same CF/CQ/CV/timestamp will overwrite each other. In
> contrast, in DefaultMap multiple ColumnUpdates are stored with an
> incrementing kvCount, so the keys will necessarily be distinct.
>
>
>
> You made this issue easy to track down.
>
>
>
> This seems like a bug w/ the native map.  The code allocates a unique int
> for each key/value in the mutation.
>
>
>
> https://github.com/apache/accumulo/blob/rel/1.6.5/server/tserver/src/main/java/org/apache/accumulo/tserver/InMemoryMap.java#L476
>
>
> It seems like the native map code should increment like the DefaultMap
> code does.  Specifically it seems like the following code should increment
> mutationCount (coordinating with the code that calls it)
>
>
> https://github.com/apache/accumulo/blob/rel/1.6.5/server/tserver/src/main/java/org/apache/accumulo/tserver/NativeMap.java#L532
>
>
>
> Would you like to open an issue in Jira?
>
>
>
>
> My main question is: which of these is the intended behavior? We'll
> obviously need to change our code to work with NativeMap's current
> implementation regardless (since we don't want to use the Java maps on a
> live cluster), but it would be useful to know if that change is temporary
> or
> permanent.
>
> My secondary question is whether there is any trick to getting native maps
> to work in MiniCluster, which would be very helpful for our testing. I
> changed the configuration XML we use and I can see that it picks up the
> change - server.Accumulo logs "tserver.memory.maps.native.enabled = true,"
> but NativeMap never logs that it tries to load the library so the setting
> seems to be dropped somewhere.
>
>
>
>
>

Re: Bug in either InMemoryMap or NativeMap

Reply via email to