Re: [go-nuts] Re: Sort a huge slice of data around 2GB

Michael Jones Thu, 30 Nov 2017 10:00:20 -0800

The general way this is done is to extract the keys needed for the sorting
along, sort this index, and then either use the index as an access
mechanism to access the unmodified bulk data, or, to do a single pass
reordering.


On Thu, Nov 30, 2017 at 8:36 AM, Subramanian K <subub...@gmail.com> wrote:

> Thanks all, did some metrics on comparison function "Less", had JSON data
> in this huge file, had to unmarshall JSON data and then do a comparison
> based on obtained data. unmarshalling took 98% of time in this comparison.
> Looking to avoid unmarshaliing and finding an alternative way to do this.
>
> Thanks all for your time and suggestions.
>
> Regards,
> Subramanian. K
>
>
> On Thursday, 30 November 2017 19:53:01 UTC+5:30, Slawomir Pryczek wrote:
>>
>> It should be very simple if you have additional 2G of memory. You divide
>> the data to X parts where X is power of 2 and X needs to be less than
>> number of cores available. Eg. for 2000MB it can be 250x8. Then you sort it
>> in paralell using built-in sorting function and at the end you just
>> concatenate the arrays by scanning them and picking smallest element as you
>> move forward. You turn 250x8 => 500x4 => 1000x2 => 2000.
>>
>> It should take about 15-20 minutes to write, and the sorting would
>> probably take 5-6 minutes instead of half hour for 8 threads.
>>
>> In the concat phase you can also paralellize the process more, by getting
>> the average of middle elements, then finding position where element is less
>> than this value and just breaking the array, at this point into 2 sepatate
>> arrays. Eg. if you're doing the last step and you see the element in Arr1
>> at position 500000 is 91821 and at the same position in Arr2 you have
>> 1782713 => average is 937267 you're using binary search to find position of
>> 937267 or anything that is closest in both arrays - then you can break the
>> arrays to Arr11 +Arr12 / Arr21 + Arr22 and you just concat Arr11 c Arr21
>> and Arr21 c Arr22. But that probably would take more time and is not
>> necessairly worth the effor.
>>
>>
>>
>> W dniu środa, 29 listopada 2017 15:19:13 UTC+1 użytkownik Subramanian K
>> napisał:
>>>
>>> Hi
>>>
>>> I am using native sort provided part of the package and to process 48MB
>>> of slice data, it takes ~45sec.
>>> To run 2GB of data it takes really long time, I am trying to split these
>>> to buckets and make it run concurrently, finally need to collate results of
>>> all these small sorted buckets.
>>>
>>> Do we have any sort package which can sort huge data swiftly?
>>>
>>> Regards,
>>> Subu. K
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Michael T. Jones
michael.jo...@gmail.com

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] Re: Sort a huge slice of data around 2GB

Reply via email to