First I believe you mean on the Dataset API rather than the dataframe API.
You can easily add the partition index as a new column to your dataframe using
spark_partition_id()
Then a normal mapPartitions should work fine (i.e. you should create the
appropriate case class which includes the partiti
Hello All,
I am using spark 2.0.2 and spark-streaming-kafka-0-10_2.11 .
I am setting enable.auto.commit to false, and manually want to commit
the offsets after my output operation is successful. So when a
exception is raised during during the processing I do not want the
offsets to be committed. B
Hi everybody
I'm trying to connect Spark to Hive.
Hive uses Derby Server for metastore_db.
$SPARK_HOME/conf/hive-site.xml
javax.jdo.option.ConnectionURL
jdbc:derby://derby:1527/metastore_db;create=true
JDBC connect string for a JDBC metastore
javax.jdo.option.ConnectionDriverName
org.ap
Uh believe me there are lots of ppl on this list who will send u code
snippets if u ask... 😀
Yes that is what Steve pointed out, suggesting also that for that simple
exercise you should perform all operations on a spark standalone instead
(or alt. Use an nfs on the cluster)
I'd agree with his sugg
Hi Marco,
For the first time in several years FOR THE VERY FIRST TIME. I am seeing
someone actually executing code and providing response. It feel wonderful
that at least someone considered to respond back by executing code and just
did not filter out each and every technical details to brood only
Hi list,
I have Sark 2.2.0 in standalone mode and python 3.6. It is a very small
testing cluster with two nodes.
I am running (trying) a streaming job that simple read from kafka, apply an
ML model and store it back into kafka.
The job is run with following parameters:
"--conf spark.cores.max=2 --