FlatMapGroupsWithStateFunction is called thrice - Production use case.

2021-03-10 Thread Kuttaiah Robin
Hello, I have a use case where I need to read events(non correlated) from a source kafka topic, then correlate and push forward to another target topic. I use spark structured streaming with FlatMapGroupsWithStateFunction along with GroupStateTimeout.ProcessingTimeTimeout() . After each timeout

Re: [spark-core] docker-image-tool.sh question...

2021-03-10 Thread Muthu Jayakumar
Hello Attila, Thank you for verifying this for me. I was looking at Step 1/18 : ARG java_image_tag=11-jre-slim and presumed that the docker image is built using JRE 11. I can confirm that, (1) $ docker image history 3ef86250a35b IMAGE CREATED CREATED BY SIZE CO

Re: compile spark 3.1.1 error

2021-03-10 Thread jiahong li
Maybe it is my environment cause jiahong li 于2021年3月11日周四 上午11:14写道: > it not the cause,when i set -Phadoop-2.7 instead of > -Dhadoop.version=2.6.0-cdh5.13.1, the same errors come out. > > Attila Zsolt Piros 于2021年3月10日周三 下午8:56写道: > >> I see, this must be because of hadoop version you are sele

Re: compile spark 3.1.1 error

2021-03-10 Thread jiahong li
it not the cause,when i set -Phadoop-2.7 instead of -Dhadoop.version=2.6.0-cdh5.13.1, the same errors come out. Attila Zsolt Piros 于2021年3月10日周三 下午8:56写道: > I see, this must be because of hadoop version you are selecting by using > "-Dhadoop.version=2.6.0-cdh5.13.1". > Spark 3.1.1 only support h

Re: [spark-core] docker-image-tool.sh question...

2021-03-10 Thread Attila Zsolt Piros
Hi Muthu! I tried and at my side it is working just fine: $ ./bin/docker-image-tool.sh -r docker.io/sample-spark -b java_image_tag=8-jre-slim -t 3.1.1 build Sending build context to Docker daemon 228.3MB Step 1/18 : ARG java_image_tag=11-jre-slim Step 2/18 : FROM openjdk:${java_image_tag} *8-jr

Re: How to control count / size of output files for

2021-03-10 Thread m li
hi Thank you. The suggestion is very good. There is no need to use "repartitionByRange", However, there is a little doubt that if the output file is required to be globally ordered, "repartition" will disrupt the order of the data, and the result of using "coalesce" is correct Best Regards, m li

Re: [jira] [Commented] (SPARK-34648) Reading Parquet F =?utf-8?Q?iles_in_Spark_Extremely_Slow_for_Large_Number_of_Files??=

2021-03-10 Thread Kent Yao
Hi Pankaj,Have you tried spark.sql.parquet.respectSummaryFiles=true? Bests, Kent Yao @ Data Science Center, Hangzhou Research Institute, NetEase Corp

Re: [jira] [Commented] (SPARK-34648) Reading Parquet Files in Spark Extremely Slow for Large Number of Files?

2021-03-10 Thread 钟雨
Hi Pankaj, Can you show your detail code and Job/Stage Info? Which Stage is slow? Pankaj Bhootra 于2021年3月10日周三 下午12:32写道: > Hi, > > Could someone please revert on this? > > > Thanks > Pankaj Bhootra > > > On Sun, 7 Mar 2021, 01:22 Pankaj Bhootra, wrote: > >> Hello Team >> >> I am new to Spark

Re: compile spark 3.1.1 error

2021-03-10 Thread Attila Zsolt Piros
I see, this must be because of hadoop version you are selecting by using "-Dhadoop.version=2.6.0-cdh5.13.1". Spark 3.1.1 only support hadoop-2.7 and hadoop-3.2, at least these two can be given via profiles: -Phadoop-2.7 and -Phadoop-3.2 (the default). On Wed, Mar 10, 2021 at 12:26 PM jiahong li

Re: compile spark 3.1.1 error

2021-03-10 Thread jiahong li
i use ./build/mvn to compile ,and after execute command :./build/zinc-0.3.15/bin/zinc -shutdown and execute command like this: /dev/make-distribution.sh --name custom-spark --pip --tgz -Phive -Phive-thriftserver -Pyarn -Dhadoop.version=2.6.0-cdh5.13.1 -DskipTests same error appear. and execute com

Re: compile spark 3.1.1 error

2021-03-10 Thread Attila Zsolt Piros
hi! Are you compiling Spark itself? Do you use "./build/mvn" from the project root? If you compiled an other version of Spark before and there the scala version was different then zinc/nailgun could cached the old classes which can cause similar troubles. In that case this could help: ./build/zin

compile spark 3.1.1 error

2021-03-10 Thread jiahong li
hi, everybody, when i compile spark 3.1.1 from tag v3.1.1 ,encounter error like this: INFO] --- scala-maven-plugin:4.3.0:compile (scala-compile-first) @ spark-core_2.12 --- [INFO] Using incremental compilation using Mixed compile order [INFO] Compiler bridge file: .sbt/1.0/zinc/org.scala-sbt/org.s