> On 9/05/2024, at 11:58 PM, Vladimir Dergachev <volo...@mindspring.com> wrote:
>
>
>
> On Thu, 9 May 2024, Sameh Abdulah wrote:
>
>> Hi,
>>
>> I need to serialize and save a 20K x 20K matrix as a binary file. This
>> process is significantly slower in R compared to Python (4X slower).
>>
>> I'm not sure about the best approach to optimize the below code. Is it
>> possible to parallelize the serialization function to enhance performance?
>
> Parallelization should not help - a single CPU thread should be able to
> saturate your disk or your network, assuming you have a typical computer.
>
> The problem is possibly the conversion to text, writing it as binary should
> be much faster.
>
FWIW serialize() is binary so there is no conversion to text:
> serialize(1:10+0L, NULL)
[1] 58 0a 00 00 00 03 00 04 02 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
[26] 00 0d 00 00 00 0a 00 00 00 01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00
[51] 05 00 00 00 06 00 00 00 07 00 00 00 08 00 00 00 09 00 00 00 0a
It uses the native representation so it is actually not as bad as it sounds.
One aspect I forgot to mention in the earlier thread is that if you don't need
to exchange the serialized objects between machines with different endianness
then avoiding the swap makes it faster. E.g, on Intel (which is little-endian
and thus needs swapping):
> a=1:1e8/2
> system.time(serialize(a, NULL))
user system elapsed
2.123 0.468 2.661
> system.time(serialize(a, NULL, xdr=FALSE))
user system elapsed
0.393 0.348 0.742
Cheers,
Simon
______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel