I was using the Git version and was running with various sketches. I
thought the slowness is from Python, but I was able to scan through the
same data calculating the same statistics with the Java library in roughly
3 minutes.

Any idea why there's such a big difference between the two languages?

- Andy

On Fri, Jun 26, 2020, 21:02 Jon Malkin <jmal...@apache.org> wrote:

> I haven't done long running python tests recently but I haven't seen that.
>
> After you using a release version of the library or did you check out from
> git? And which sketch or sketches are you using?
>
> I've compiled the library in debug mode (gotta modify setup.py to force
> that) and run python via gdb but that's not gonna work nicely on 1.6gb of
> data. It's sloooooooowwwwwww.
>
>   jon
>
>
> On Fri, Jun 26, 2020, 4:39 PM Andy Dang <nam...@gmail.com> wrote:
>
>> Hi all,
>>
>> I've been trying to integrate Datasketches into our ecosystem - really
>> great work!
>>
>> However, when I tried to run various sketches with the lending club data
>> from Kaggle (1.6GB in size) on the raw CSV data in Python on my MacOS. I
>> noticed after a while that the process will crash with a mysterious
>> segfault on my Mac OS (Catalina)
>> My CLang version:
>>
>> *➜  **Workspace* c++ --version
>>
>> Apple clang version 11.0.0 (clang-1100.0.33.17)
>>
>> Target: x86_64-apple-darwin19.5.0
>>
>> Thread model: posix
>>
>> InstalledDir: /Library/Developer/CommandLineTools/usr/bin
>>
>> *➜  **Workspace* gcc --version
>>
>> Configured with: --prefix=/Library/Developer/CommandLineTools/usr
>> --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
>>
>> Apple clang version 11.0.0 (clang-1100.0.33.17)
>>
>> Target: x86_64-apple-darwin19.5.0
>>
>> Thread model: posix
>>
>> InstalledDir: /Library/Developer/CommandLineTools/usr/bin
>>
>> Replacing this with Miniconda cxx toolchain solves the problem.
>>
>> I'll get a script along with the data for reproducibility, but before
>> that I wonder if anyone has come across this issue before?
>>
>> Cheers!
>> - Andy
>>
>

Reply via email to