date:20170805

RE: mapPartitioningWithIndex in Dataframe

2017-08-05 Thread Mendelson, Assaf

First I believe you mean on the Dataset API rather than the dataframe API. You can easily add the partition index as a new column to your dataframe using spark_partition_id() Then a normal mapPartitions should work fine (i.e. you should create the appropriate case class which includes the partiti

kafka settting, enable.auto.commit to false is being overridden and I lose data. please help!

2017-08-05 Thread shyla deshpande

Hello All, I am using spark 2.0.2 and spark-streaming-kafka-0-10_2.11 . I am setting enable.auto.commit to false, and manually want to commit the offsets after my output operation is successful. So when a exception is raised during during the processing I do not want the offsets to be committed. B

Trying to connect Spark 1.6 to Hive

2017-08-05 Thread toletum

Hi everybody I'm trying to connect Spark to Hive. Hive uses Derby Server for metastore_db. $SPARK_HOME/conf/hive-site.xml javax.jdo.option.ConnectionURL jdbc:derby://derby:1527/metastore_db;create=true JDBC connect string for a JDBC metastore javax.jdo.option.ConnectionDriverName org.ap

Re: SPARK Issue in Standalone cluster

2017-08-05 Thread Marco Mistroni

Uh believe me there are lots of ppl on this list who will send u code snippets if u ask... 😀 Yes that is what Steve pointed out, suggesting also that for that simple exercise you should perform all operations on a spark standalone instead (or alt. Use an nfs on the cluster) I'd agree with his sugg

Re: SPARK Issue in Standalone cluster

2017-08-05 Thread Gourav Sengupta

Hi Marco, For the first time in several years FOR THE VERY FIRST TIME. I am seeing someone actually executing code and providing response. It feel wonderful that at least someone considered to respond back by executing code and just did not filter out each and every technical details to brood only

PySpark Streaming keeps dying

2017-08-05 Thread Riccardo Ferrari

Hi list, I have Sark 2.2.0 in standalone mode and python 3.6. It is a very small testing cluster with two nodes. I am running (trying) a streaming job that simple read from kafka, apply an ML model and store it back into kafka. The job is run with following parameters: "--conf spark.cores.max=2 --

RE: mapPartitioningWithIndex in Dataframe

kafka settting, enable.auto.commit to false is being overridden and I lose data. please help!

Trying to connect Spark 1.6 to Hive

Re: SPARK Issue in Standalone cluster

Re: SPARK Issue in Standalone cluster

PySpark Streaming keeps dying

6 matches

Site Navigation

Mail list logo

Footer information