Re: [I] Spark executor fail to start occasionally with SIGILL [datafusion-comet]

2025-04-11 Thread via GitHub
parthchandra commented on issue #1598: URL: https://github.com/apache/datafusion-comet/issues/1598#issuecomment-2798109113 > > Wow that's phenomenal! Are you able to share some (vague if necessary) descriptions of your workload, cluster hardware, storage source, and what sort of tuning (if

Re: [I] Spark executor fail to start occasionally with SIGILL [datafusion-comet]

2025-04-10 Thread via GitHub
mbutrovich commented on issue #1598: URL: https://github.com/apache/datafusion-comet/issues/1598#issuecomment-2786513619 > > > Unfortunately can't see any visible gains of working with comet vs bare Spark 😕 > > > > > > Were you able to confirm that queries were running in Comet?

Re: [I] Spark executor fail to start occasionally with SIGILL [datafusion-comet]

2025-04-10 Thread via GitHub
andygrove commented on issue #1598: URL: https://github.com/apache/datafusion-comet/issues/1598#issuecomment-2786444191 > Unfortunately can't see any visible gains of working with comet vs bare Spark 😕 Were you able to confirm that queries were running in Comet? -- This is an auto

Re: [I] Spark executor fail to start occasionally with SIGILL [datafusion-comet]

2025-04-10 Thread via GitHub
mbutrovich commented on issue #1598: URL: https://github.com/apache/datafusion-comet/issues/1598#issuecomment-2794162991 > Do you want me to share here or under separate thread? Maybe in a new discussion you could share your experience: https://github.com/apache/datafusion-comet/disc

Re: [I] Spark executor fail to start occasionally with SIGILL [datafusion-comet]

2025-04-10 Thread via GitHub
mixermt commented on issue #1598: URL: https://github.com/apache/datafusion-comet/issues/1598#issuecomment-2794100210 > Wow that's phenomenal! Are you able to share some (vague if necessary) descriptions of your workload, cluster hardware, storage source, and what sort of tuning (if an

Re: [I] Spark executor fail to start occasionally with SIGILL [datafusion-comet]

2025-04-08 Thread via GitHub
mixermt commented on issue #1598: URL: https://github.com/apache/datafusion-comet/issues/1598#issuecomment-2786476327 > > Unfortunately can't see any visible gains of working with comet vs bare Spark 😕 > > Were you able to confirm that queries were running in Comet? After test

Re: [I] Spark executor fail to start occasionally with SIGILL [datafusion-comet]

2025-04-07 Thread via GitHub
mixermt commented on issue #1598: URL: https://github.com/apache/datafusion-comet/issues/1598#issuecomment-2782887646 Closing the issue, all working without visible issues after compilation of comet with `-Ctarget-cpu=broadwell` Unfortunately can't see any visible gains of working wi

Re: [I] Spark executor fail to start occasionally with SIGILL [datafusion-comet]

2025-04-07 Thread via GitHub
mixermt closed issue #1598: Spark executor fail to start occasionally with SIGILL URL: https://github.com/apache/datafusion-comet/issues/1598 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [I] Spark executor fail to start occasionally with SIGILL [datafusion-comet]

2025-04-05 Thread via GitHub
mbutrovich commented on issue #1598: URL: https://github.com/apache/datafusion-comet/issues/1598#issuecomment-2775854523 So the machines that crash are Broadwell, Comet was compiled on Skylake. There's a decent difference in CPU features there. I'd be curious to know what `rustc --print ta

[I] Spark executor fail to start occasionally with SIGILL [datafusion-comet]

2025-04-05 Thread via GitHub
mixermt opened a new issue, #1598: URL: https://github.com/apache/datafusion-comet/issues/1598 ### Describe the bug Hi, We experience occasional failures of Spark executors with following ```│ # A fatal error has been detected by the Java Runtime Environment:

Re: [I] Spark executor fail to start occasionally with SIGILL [datafusion-comet]

2025-04-04 Thread via GitHub
mbutrovich commented on issue #1598: URL: https://github.com/apache/datafusion-comet/issues/1598#issuecomment-2776011568 A release build will by default set target-cpu=native: https://github.com/apache/datafusion-comet/blob/c5e78b6b59778f0429f0fc8157c6a959bfd9d4c3/Makefile#L101 which

Re: [I] Spark executor fail to start occasionally with SIGILL [datafusion-comet]

2025-04-03 Thread via GitHub
mbutrovich commented on issue #1598: URL: https://github.com/apache/datafusion-comet/issues/1598#issuecomment-2776293399 Just `-Ctarget-cpu=broadwell`. Sorry I should have been more specific. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [I] Spark executor fail to start occasionally with SIGILL [datafusion-comet]

2025-04-03 Thread via GitHub
mixermt commented on issue #1598: URL: https://github.com/apache/datafusion-comet/issues/1598#issuecomment-2776287792 While its possible to execute on specific machines I prefer not to break our K8s resource pool. So in order to compile to broadwell, should I just concat it to target

Re: [I] Spark executor fail to start occasionally with SIGILL [datafusion-comet]

2025-04-03 Thread via GitHub
mixermt commented on issue #1598: URL: https://github.com/apache/datafusion-comet/issues/1598#issuecomment-2775954347 This is the output of `rustc --print target-cpus` Just to note, I've built the comet with default params using basic `make release-nogit PROFILES="-Pspark-3.5 -Pscala

Re: [I] Spark executor fail to start occasionally with SIGILL [datafusion-comet]

2025-04-03 Thread via GitHub
mixermt commented on issue #1598: URL: https://github.com/apache/datafusion-comet/issues/1598#issuecomment-2774725949 Comet was built on server with following lscpu ``` Architecture:x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s):

Re: [I] Spark executor fail to start occasionally with SIGILL [datafusion-comet]

2025-04-03 Thread via GitHub
mixermt commented on issue #1598: URL: https://github.com/apache/datafusion-comet/issues/1598#issuecomment-2774706637 We compiled comet by ourself (spark 3.5, scala 2.12), without any visible issues. We have a wide variety of servers in our K8s cluster (all kinds of Dell's PowerEdge

Re: [I] Spark executor fail to start occasionally with SIGILL [datafusion-comet]

2025-04-02 Thread via GitHub
mbutrovich commented on issue #1598: URL: https://github.com/apache/datafusion-comet/issues/1598#issuecomment-2772708589 Interesting. That's in the Rust stdlib, I think. The fact that it's trying to hash something makes me wonder if it tried to use an instruction (e.g., AVX2) that the syst