You mean you were calling the java library from python? Our testing generally has generally shown C++ to be faster.
This is still too vague for me to be able to say much. There's no specific git version (tag or hash), no code, and no data. jon On Mon, Jun 29, 2020 at 9:08 AM Andy Dang <nam...@gmail.com> wrote: > I was using the Git version and was running with various sketches. I > thought the slowness is from Python, but I was able to scan through the > same data calculating the same statistics with the Java library in roughly > 3 minutes. > > Any idea why there's such a big difference between the two languages? > > - Andy > > On Fri, Jun 26, 2020, 21:02 Jon Malkin <jmal...@apache.org> wrote: > >> I haven't done long running python tests recently but I haven't seen that. >> >> After you using a release version of the library or did you check out >> from git? And which sketch or sketches are you using? >> >> I've compiled the library in debug mode (gotta modify setup.py to force >> that) and run python via gdb but that's not gonna work nicely on 1.6gb of >> data. It's sloooooooowwwwwww. >> >> jon >> >> >> On Fri, Jun 26, 2020, 4:39 PM Andy Dang <nam...@gmail.com> wrote: >> >>> Hi all, >>> >>> I've been trying to integrate Datasketches into our ecosystem - really >>> great work! >>> >>> However, when I tried to run various sketches with the lending club data >>> from Kaggle (1.6GB in size) on the raw CSV data in Python on my MacOS. I >>> noticed after a while that the process will crash with a mysterious >>> segfault on my Mac OS (Catalina) >>> My CLang version: >>> >>> *➜ **Workspace* c++ --version >>> >>> Apple clang version 11.0.0 (clang-1100.0.33.17) >>> >>> Target: x86_64-apple-darwin19.5.0 >>> >>> Thread model: posix >>> >>> InstalledDir: /Library/Developer/CommandLineTools/usr/bin >>> >>> *➜ **Workspace* gcc --version >>> >>> Configured with: --prefix=/Library/Developer/CommandLineTools/usr >>> --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/4.2.1 >>> >>> Apple clang version 11.0.0 (clang-1100.0.33.17) >>> >>> Target: x86_64-apple-darwin19.5.0 >>> >>> Thread model: posix >>> >>> InstalledDir: /Library/Developer/CommandLineTools/usr/bin >>> >>> Replacing this with Miniconda cxx toolchain solves the problem. >>> >>> I'll get a script along with the data for reproducibility, but before >>> that I wonder if anyone has come across this issue before? >>> >>> Cheers! >>> - Andy >>> >>