I haven't done long running python tests recently but I haven't seen that.

After you using a release version of the library or did you check out from
git? And which sketch or sketches are you using?

I've compiled the library in debug mode (gotta modify setup.py to force
that) and run python via gdb but that's not gonna work nicely on 1.6gb of
data. It's sloooooooowwwwwww.

  jon


On Fri, Jun 26, 2020, 4:39 PM Andy Dang <nam...@gmail.com> wrote:

> Hi all,
>
> I've been trying to integrate Datasketches into our ecosystem - really
> great work!
>
> However, when I tried to run various sketches with the lending club data
> from Kaggle (1.6GB in size) on the raw CSV data in Python on my MacOS. I
> noticed after a while that the process will crash with a mysterious
> segfault on my Mac OS (Catalina)
> My CLang version:
>
> *➜  **Workspace* c++ --version
>
> Apple clang version 11.0.0 (clang-1100.0.33.17)
>
> Target: x86_64-apple-darwin19.5.0
>
> Thread model: posix
>
> InstalledDir: /Library/Developer/CommandLineTools/usr/bin
>
> *➜  **Workspace* gcc --version
>
> Configured with: --prefix=/Library/Developer/CommandLineTools/usr
> --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
>
> Apple clang version 11.0.0 (clang-1100.0.33.17)
>
> Target: x86_64-apple-darwin19.5.0
>
> Thread model: posix
>
> InstalledDir: /Library/Developer/CommandLineTools/usr/bin
>
> Replacing this with Miniconda cxx toolchain solves the problem.
>
> I'll get a script along with the data for reproducibility, but before that
> I wonder if anyone has come across this issue before?
>
> Cheers!
> - Andy
>

Reply via email to