On Sep 18, 2013, at 8:20 AM, Alan Bateman <[email protected]> wrote:
> On 15/09/2013 17:27, Paul Sandoz wrote: >> Hi, >> >> http://cr.openjdk.java.net/~psandoz/tl/JDK-8024341-pattern-splitAsStream/webrev/ >> >> This fixes an issue with Pattern.splitAsStream reporting empty trailing >> elements and aligns with the functionality of Pattern.split(CharSequence >> input). >> >> The matching iterator passed to the stream was updated to aggressively >> consume and keep a count of a sequence of empty matching elements such that >> those elements can either be reported if not trailing, or discarded if >> trailing. >> >> Paul. > It make sense to adjust the spec to have it consistent with > split(CharSequence). > > On the implementation then I had to read it a few times to understand how > emptyElementCount is used. I wonder if it could be done in a simpler way, say > just setting a flag when current reaches input.length? Maybe you have tried > this already. > The problem is when an empty matching element is encountered we don't know if it is trailing or not. This can only be determined when, later on, a non-empty matching element is encountered and/or there are no further matches. Thus we need to aggressively consume empty matching elements and retain how many have been encountered in case we need report them, for example, here is a particular test exercising this: description = "Many repeated separators before last match"; input = "fooooo:"; pattern = Pattern.compile("o"); expected = new ArrayList<>(); expected.add("f"); expected.add(""); expected.add(""); expected.add(""); expected.add(""); expected.add(":"); // At this point we know the previously encountered matching empty elements need to be reported and not discarded I don't think it is practically possible in general to derive the number of empty elements from a start and end index since we don't know easily know the lengths of strings matched by the pattern in the input. > On the test then you probably should include 8016846 in @bug tag as otherwise > it looks like it was added specifically for 8024341. > Thanks, updated. I wish there was a way to automate this by adding bug ids to meta-data to files in the repository. Any commit with tests would automatically update the test meta-data with the correspond bug id. Paul.
