Hi Vadim,

What do you mean by "copied benchmarks"? What changed singe previous
iteration and why results are so different?

As for duplicated loop, you don't need it. BinaryOutputStream allows to
write a value to a particular position (even before already written data).
So you can reserve 4 bytes for length, remember position, calculate length
while encoding and writing bytes, and then write length.

-Val

On Fri, Mar 3, 2017 at 12:45 AM, Вадим Опольский <vaopols...@gmail.com>
wrote:

> Valentin,
>
> What do you think about duplicated cycle in strToBinaryOutputStream ?
>
> How to calculate StrLen для outBinaryHeap without this cycle ?
>
> public class BinaryUtilsNew extends BinaryUtils {
>
>     public static int getStrLen(String val) {
>         int strLen = val.length();
>         int utfLen = 0;
>         int c;
>
>         // Determine length of resulting byte array.
>
>
>
>
> *for (int cnt = 0; cnt < strLen; cnt++) {            c = val.charAt(cnt);     
>        if (c >= 0x0001 && c <= 0x007F)*                utfLen++;
>        *     else if (c > 0x07FF)*
>                 utfLen += 3;
>             else
>                 utfLen += 2;
>         }
>
>         return utfLen;
>     }
>
>     public static void strToUtf8BytesDirect(BinaryOutputStream outBinaryHeap, 
> String val) {
>
>         int strLen = val.length();
>         int c, cnt;
>
>         int position = 0;
>
>         outBinaryHeap.unsafeEnsure(1 + 4);
>
> *   outBinaryHeap.unsafeWriteByte(GridBinaryMarshaller.STRING);        
> outBinaryHeap.unsafeWriteInt(getStrLen(val));*
>
>
>
> * for (cnt = 0; cnt < strLen; cnt++) {            c = val.charAt(cnt);*
>        *     if (c >= 0x0001 && c <= 0x007F)*
>                 outBinaryHeap.writeByte((byte) c);
>          *   else if (c > 0x07FF) {*
>                 outBinaryHeap.writeByte((byte)(0xE0 | (c >> 12) & 0x0F));
>                 outBinaryHeap.writeByte((byte)(0x80 | (c >> 6) & 0x3F));
>                 outBinaryHeap.writeByte((byte)(0x80 | (c & 0x3F)));
>             }
>             else {
>                 outBinaryHeap.writeByte((byte)(0xC0 | ((c >> 6) & 0x1F)));
>                 outBinaryHeap.writeByte((byte)(0x80 | (c  & 0x3F)));
>             }
>         }
>     }
>
>
> Vadim
>
>
>
> 2017-03-03 2:00 GMT+03:00 Valentin Kulichenko <
> valentin.kuliche...@gmail.com>:
>
>> Vadim,
>>
>> Looks better now. Can you also try to modify the benchmark so that
>> marshaller and writer are created outside of the measured method? I.e. the
>> benchmark methods should be as simple as this:
>>
>>     @Benchmark
>>     public void binaryHeapOutputStreamDirect() throws Exception {
>>         writer.doWriteStringDirect(message);
>>     }
>>
>>     @Benchmark
>>     public void binaryHeapOutputStreamInDirect() throws Exception {
>>         writer.doWriteString(message);
>>     }
>>
>> In any case, do I understand correctly that it didn't actually make any
>> performance difference? If so, I think we can close the ticket.
>>
>> Vova, can you also take a look and provide your thoughts?
>>
>> -Val
>>
>> On Thu, Mar 2, 2017 at 1:27 PM, Вадим Опольский <vaopols...@gmail.com>
>> wrote:
>>
>>> Hi Valentin!
>>>
>>> I've created:
>>>
>>> new method strToUtf8BytesDirect in BinaryUtilsNew
>>> https://github.com/javaller/MyBenchmark/blob/master/src/main
>>> /java/org/sample/BinaryUtilsNew.java
>>>
>>> new method doWriteStringDirect in BinaryWriterExImplNew
>>> https://github.com/javaller/MyBenchmark/blob/master/src/main
>>> /java/org/sample/BinaryWriterExImplNew.java
>>>
>>> benchmarks for BinaryWriterExImpl doWriteString and BinaryWriterExImplNew
>>> doWriteStringDirect
>>> https://github.com/javaller/MyBenchmark/blob/master/src/main
>>> /java/org/sample/ExampleTest.java
>>>
>>> This is a result of comparing:
>>>
>>> Benchmark
>>> Mode  Cnt   Score               Error         
>>> UnitsExampleTest.binaryHeapOutputStreamDirect
>>> avgt   50  1128448,743 ± 13536,689  
>>> ns/opExampleTest.binaryHeapOutputStreamInDirect
>>> avgt   50  1127270,695 ± 17309,256  ns/op
>>>
>>> Vadim
>>>
>>> 2017-03-02 1:02 GMT+03:00 Valentin Kulichenko <
>>> valentin.kuliche...@gmail.com>:
>>>
>>>> Hi Vadim,
>>>>
>>>> We're getting closer :) I would actually like to see the test for
>>>> actual implementation of BinaryWriterExImpl#doWriteString method.
>>>> Logic in binaryHeapOutputInDirect() confuses me a bit and I'm not sure
>>>> comparison is valid.
>>>>
>>>> Can you please do the following:
>>>>
>>>> 1. Create new BinaryUtils#strToUtf8BytesDirect method, copy-paste the
>>>> code from existing BinaryUtils#strToUtf8Bytes and modify it so that it
>>>> takes BinaryOutputStream as an argument and writes to it directly. Do not
>>>> create stream inside this method, as it's the same as creating new array.
>>>> 2. Create new BinaryWriterExImpl#doWriteStringDirect, copy-paste the
>>>> code from existing BinaryWriterExImpl#doWriteString and modify it so
>>>> that it uses BinaryUtils#strToUtf8BytesDirect and doesn't
>>>> call out.writeByteArray.
>>>> 3. Create benchmark for BinaryWriterExImpl#doWriteString method. I.e.,
>>>> create an instance of BinaryWriterExImpl and call doWriteString() in
>>>> benchmark method.
>>>> 4. Similarly, create benchmark for BinaryWriterExImpl#doWriteStri
>>>> ngDirect.
>>>> 5. Compare results.
>>>>
>>>> This will give us clear picture of how these two approaches perform.
>>>> Your current results are actually promising, but I would like to confirm
>>>> them.
>>>>
>>>> -Val
>>>>
>>>> On Wed, Mar 1, 2017 at 6:17 AM, Вадим Опольский <vaopols...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Valentin!
>>>>>
>>>>> Thank you for comments.
>>>>>
>>>>> There is a new method which writes directly to BinaryOutputStream
>>>>> instead of intermediate array.
>>>>> https://github.com/javaller/MyBenchmark/blob/master/src/main
>>>>> /java/org/sample/BinaryUtilsNew.java
>>>>>
>>>>> There is benchmark.
>>>>> https://github.com/javaller/MyBenchmark/blob/master/src/main
>>>>> /java/org/sample/MyBenchmark.java
>>>>>
>>>>> Unit test
>>>>> https://github.com/javaller/MyBenchmark/blob/master/src/main
>>>>> /java/org/sample/BinaryOutputStreamTest.java
>>>>>
>>>>> Statistics
>>>>> https://github.com/javaller/MyBenchmark/blob/master/out_01_03_17.txt
>>>>>
>>>>> Benchmark
>>>>>  Mode       Cnt    Score        Error  Units 
>>>>> MyBenchmark.binaryHeapOutputIn
>>>>> Direct            avgt          50  111,337 ± 0,742  ns/op
>>>>> MyBenchmark.binaryHeapOutputStreamDirect   avgt          50   23,847
>>>>> ± 0,303    ns/op
>>>>>
>>>>>
>>>>> Vadim
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2017-02-28 4:29 GMT+03:00 Valentin Kulichenko <
>>>>> valentin.kuliche...@gmail.com>:
>>>>>
>>>>>> Hi Vadim,
>>>>>>
>>>>>> Looks like you accidentally removed dev list from the thread, adding
>>>>>> it back.
>>>>>>
>>>>>> I think there is still misunderstanding. What I propose is to modify
>>>>>> the BinaryUtils#strToUtf8Bytes so that it writes directly to 
>>>>>> BinaryOutputStream
>>>>>> instead of intermediate array. This should decrease memory consumption 
>>>>>> and
>>>>>> can also increase performance as we will avoid 'writeByteArray' step
>>>>>> at the end.
>>>>>>
>>>>>> Does it make sense to you?
>>>>>>
>>>>>> -Val
>>>>>>
>>>>>> On Mon, Feb 27, 2017 at 6:55 AM, Вадим Опольский <
>>>>>> vaopols...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi, Valentin!
>>>>>>>
>>>>>>> What do you think about using the methods of BinaryOutputStream:
>>>>>>>
>>>>>>> 1) writeByteArray(byte[] val)
>>>>>>> 2) writeCharArray(char[] val)
>>>>>>> 3) write (byte[] arr, int off, int len)
>>>>>>>
>>>>>>> String val = "Test";
>>>>>>>     out.writeByteArray( val.getBytes(UTF_8));
>>>>>>>
>>>>>>>  String val = "Test";
>>>>>>>     out.writeCharArray(str.toCharArray());
>>>>>>>
>>>>>>> String val = "Test"
>>>>>>> InputStream stream = new ByteArrayInputStream(
>>>>>>> exampleString.getBytes(StandartCharsets.UTF_8));
>>>>>>> byte[] buffer = new byte[1024];
>>>>>>> while ((buffer = stream.read()) != -1) {
>>>>>>> out.writeByteArray(buffer);
>>>>>>> }
>>>>>>>
>>>>>>> What else can we use ?
>>>>>>>
>>>>>>> Vadim
>>>>>>>
>>>>>>>
>>>>>>> 2017-02-25 2:21 GMT+03:00 Valentin Kulichenko <
>>>>>>> valentin.kuliche...@gmail.com>:
>>>>>>>
>>>>>>>> Hi Vadim,
>>>>>>>>
>>>>>>>> Which method implements the approach described in the ticket? From
>>>>>>>> what I see, all writeToStringX versions are still encoding into an
>>>>>>>> intermediate array and then call out.writeByteArray. What we need to 
>>>>>>>> test
>>>>>>>> is the approach where bytes are written directly into the stream during
>>>>>>>> encoding. Encoding algorithm itself should stay the same for now, 
>>>>>>>> otherwise
>>>>>>>> we will not know how to interpret the result.
>>>>>>>>
>>>>>>>> It looks like there is some misunderstanding here, so please let me
>>>>>>>> know anything is still unclear. I will be happy to answer your 
>>>>>>>> questions.
>>>>>>>>
>>>>>>>> -Val
>>>>>>>>
>>>>>>>> On Wed, Feb 22, 2017 at 7:22 PM, Valentin Kulichenko <
>>>>>>>> valentin.kuliche...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Vadim,
>>>>>>>>>
>>>>>>>>> Thanks, I will review this week.
>>>>>>>>>
>>>>>>>>> -Val
>>>>>>>>>
>>>>>>>>> On Wed, Feb 22, 2017 at 2:28 AM, Вадим Опольский <
>>>>>>>>> vaopols...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Valentin!
>>>>>>>>>>
>>>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-13
>>>>>>>>>>
>>>>>>>>>> I created BinaryWriterExImplNew (extended of BinaryWriterExImpl) and
>>>>>>>>>> added new methods with changes described in the ticket
>>>>>>>>>>
>>>>>>>>>> https://github.com/javaller/MyBenchmark/blob/master/src/main
>>>>>>>>>> /java/org/sample/BinaryWriterExImplNew.java
>>>>>>>>>>
>>>>>>>>>> I created a benchmark for BinaryWriterExImplNew
>>>>>>>>>>
>>>>>>>>>> https://github.com/javaller/MyBenchmark/blob/master/src/main
>>>>>>>>>> /java/org/sample/ExampleTest.java
>>>>>>>>>>
>>>>>>>>>> I run benchmark and compared results
>>>>>>>>>>
>>>>>>>>>> https://github.com/javaller/MyBenchmark/blob/master/totalstat.txt
>>>>>>>>>>
>>>>>>>>>> # Run complete. Total time: 00:10:24
>>>>>>>>>> Benchmark                                    Mode  Cnt
>>>>>>>>>> Score       Error  Units
>>>>>>>>>> ExampleTest.binaryHeapOutputStream1          avgt   50
>>>>>>>>>> 1114999,207 ± 16756,776  ns/op
>>>>>>>>>> ExampleTest.binaryHeapOutputStream2          avgt   50
>>>>>>>>>> 1118149,320 ± 17515,961  ns/op
>>>>>>>>>> ExampleTest.binaryHeapOutputStream3          avgt   50
>>>>>>>>>> 1113678,657 ± 17652,314  ns/op
>>>>>>>>>> ExampleTest.binaryHeapOutputStream4          avgt   50
>>>>>>>>>> 1112415,051 ± 18273,874  ns/op
>>>>>>>>>> ExampleTest.binaryHeapOutputStream5          avgt   50
>>>>>>>>>> 1111366,583 ± 18282,829  ns/op
>>>>>>>>>> ExampleTest.binaryHeapOutputStreamACSII   avgt   50  1112079,667
>>>>>>>>>> ± 16659,532  ns/op
>>>>>>>>>> ExampleTest.binaryHeapOutputStreamUTFCustom  avgt   50
>>>>>>>>>> 1114949,759 ± 16809,669  ns/op
>>>>>>>>>> ExampleTest.binaryHeapOutputStreamUTFNIO        avgt   50
>>>>>>>>>> 1121462,325 ± 19836,466  ns/op
>>>>>>>>>>
>>>>>>>>>> Is it OK? Whats the next step? Do I have to move this
>>>>>>>>>> JMH benchmark to the Ignite project ?
>>>>>>>>>>
>>>>>>>>>> Vadim Opolski
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2017-02-21 1:06 GMT+03:00 Valentin Kulichenko <
>>>>>>>>>> valentin.kuliche...@gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> Hi Vadim,
>>>>>>>>>>>
>>>>>>>>>>> I'm not sure I understand your benchmarks and how they verify
>>>>>>>>>>> the optimization discussed here. Basically, here is what needs to 
>>>>>>>>>>> be done:
>>>>>>>>>>>
>>>>>>>>>>> 1. Create a benchmark for BinaryWriterExImpl#doWriteString
>>>>>>>>>>> method.
>>>>>>>>>>> 2. Run the benchmark with current implementation.
>>>>>>>>>>> 3. Make the change described in the ticket.
>>>>>>>>>>> 4. Run the benchmark with these changes.
>>>>>>>>>>> 5. Compare results.
>>>>>>>>>>>
>>>>>>>>>>> Makes sense? Let me know if anything is unclear.
>>>>>>>>>>>
>>>>>>>>>>> -Val
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Feb 20, 2017 at 8:51 AM, Вадим Опольский <
>>>>>>>>>>> vaopols...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello everybody!
>>>>>>>>>>>>
>>>>>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-13
>>>>>>>>>>>>
>>>>>>>>>>>> Valentin, I just have finished benchmark (with JMH) -
>>>>>>>>>>>> https://github.com/javaller/MyBenchmark.git
>>>>>>>>>>>>
>>>>>>>>>>>> It collect data about time working of serialization.
>>>>>>>>>>>>
>>>>>>>>>>>> For instance - https://github.com/javaller/My
>>>>>>>>>>>> Benchmark/blob/master/out200217.txt
>>>>>>>>>>>>
>>>>>>>>>>>> To start it you have to do next:
>>>>>>>>>>>>
>>>>>>>>>>>> 1) clone it - git colne https://github.com/javal
>>>>>>>>>>>> ler/MyBenchmark.git
>>>>>>>>>>>>
>>>>>>>>>>>> 2) install it - mvn install
>>>>>>>>>>>>
>>>>>>>>>>>> 3) run benchmarks -  java -Xms1024m -Xmx4096m -jar
>>>>>>>>>>>> target\benchmarks.jar
>>>>>>>>>>>>
>>>>>>>>>>>> Vadim Opolski
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2017-02-15 0:52 GMT+03:00 Valentin Kulichenko <
>>>>>>>>>>>> valentin.kuliche...@gmail.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>> Vladimir,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think we misunderstood each other. My understanding of this
>>>>>>>>>>>>> optimization is the following.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Currently string serialization is done in two steps (see
>>>>>>>>>>>>> BinaryWriterExImpl#doWriteString):
>>>>>>>>>>>>>
>>>>>>>>>>>>> strArr = BinaryUtils.strToUtf8Bytes(val); // Encode string
>>>>>>>>>>>>> into byte array.
>>>>>>>>>>>>> out.writeByteArray(strArr);                      // Write byte
>>>>>>>>>>>>> array into stream.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What this ticket suggests is to write directly into stream
>>>>>>>>>>>>> while string is encoded, without intermediate array. This both 
>>>>>>>>>>>>> reduces
>>>>>>>>>>>>> memory consumption and eliminates array copy step.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I updated the ticket and added this explanation there.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Vadim, can you create a micro benchmark and check if it gives
>>>>>>>>>>>>> any improvement?
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Val
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Feb 12, 2017 at 10:38 PM, Vladimir Ozerov <
>>>>>>>>>>>>> voze...@gridgain.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It is hard to say whether it makes sense or not. No doubt, it
>>>>>>>>>>>>>> could speed up marshalling process at the cost of 2x memory 
>>>>>>>>>>>>>> required for
>>>>>>>>>>>>>> strings. From my previous experience with marshalling 
>>>>>>>>>>>>>> micro-optimizations,
>>>>>>>>>>>>>> we will hardly ever notice speedup in distributed environment.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But, there is another sied - it could speedup our queries,
>>>>>>>>>>>>>> because we will not have to unmarshal string on every field 
>>>>>>>>>>>>>> access. So I
>>>>>>>>>>>>>> would try to make this optimization optional and then measure 
>>>>>>>>>>>>>> query
>>>>>>>>>>>>>> performance with classes having lots of strings. It could give us
>>>>>>>>>>>>>> interesting results.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Feb 13, 2017 at 5:37 AM, Valentin Kulichenko <
>>>>>>>>>>>>>> valentin.kuliche...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Vladimir,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can you please take a look and provide your thoughts? Can
>>>>>>>>>>>>>>> this be applied to binary marshaller? From what I recall, it 
>>>>>>>>>>>>>>> serializes
>>>>>>>>>>>>>>> string a bit differently from optimized marshaller, so I'm not 
>>>>>>>>>>>>>>> sure.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Val
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Feb 10, 2017 at 5:16 PM, Dmitriy Setrakyan <
>>>>>>>>>>>>>>> dsetrak...@apache.org> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Feb 9, 2017 at 11:26 PM, Valentin Kulichenko <
>>>>>>>>>>>>>>>> valentin.kuliche...@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> > Hi Vadim,
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > I don't think it makes much sense to invest into
>>>>>>>>>>>>>>>> OptimizedMarshaller.
>>>>>>>>>>>>>>>> > However, I would check if this optimization is applicable
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> > BinaryMarshaller, and if yes, implement it.
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Val, in this case can you please update the ticket?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > -Val
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > On Thu, Feb 9, 2017 at 11:05 PM, Вадим Опольский <
>>>>>>>>>>>>>>>> vaopols...@gmail.com>
>>>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > > Dear sirs!
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> > > I want to resolve issue IGNITE-13 -
>>>>>>>>>>>>>>>> > > https://issues.apache.org/jira/browse/IGNITE-13
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> > > Is it actual?
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> > > Vadim Opolski
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to