I was using the Git version and was running with various sketches. I thought the slowness is from Python, but I was able to scan through the same data calculating the same statistics with the Java library in roughly 3 minutes.
Any idea why there's such a big difference between the two languages? - Andy On Fri, Jun 26, 2020, 21:02 Jon Malkin <jmal...@apache.org> wrote: > I haven't done long running python tests recently but I haven't seen that. > > After you using a release version of the library or did you check out from > git? And which sketch or sketches are you using? > > I've compiled the library in debug mode (gotta modify setup.py to force > that) and run python via gdb but that's not gonna work nicely on 1.6gb of > data. It's sloooooooowwwwwww. > > jon > > > On Fri, Jun 26, 2020, 4:39 PM Andy Dang <nam...@gmail.com> wrote: > >> Hi all, >> >> I've been trying to integrate Datasketches into our ecosystem - really >> great work! >> >> However, when I tried to run various sketches with the lending club data >> from Kaggle (1.6GB in size) on the raw CSV data in Python on my MacOS. I >> noticed after a while that the process will crash with a mysterious >> segfault on my Mac OS (Catalina) >> My CLang version: >> >> *➜ **Workspace* c++ --version >> >> Apple clang version 11.0.0 (clang-1100.0.33.17) >> >> Target: x86_64-apple-darwin19.5.0 >> >> Thread model: posix >> >> InstalledDir: /Library/Developer/CommandLineTools/usr/bin >> >> *➜ **Workspace* gcc --version >> >> Configured with: --prefix=/Library/Developer/CommandLineTools/usr >> --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/4.2.1 >> >> Apple clang version 11.0.0 (clang-1100.0.33.17) >> >> Target: x86_64-apple-darwin19.5.0 >> >> Thread model: posix >> >> InstalledDir: /Library/Developer/CommandLineTools/usr/bin >> >> Replacing this with Miniconda cxx toolchain solves the problem. >> >> I'll get a script along with the data for reproducibility, but before >> that I wonder if anyone has come across this issue before? >> >> Cheers! >> - Andy >> >