date:20170131

Spark SQL Dataframe resulting from an except( ) is unusable

2017-01-31 Thread Vinayak Joshi5

With Spark 2.x, I construct a Dataframe from a sample libsvm file: scala> val higgsDF = spark.read.format("libsvm").load("higgs.libsvm") higgsDF: org.apache.spark.sql.DataFrame = [label: double, features: vector] Then, build a new dataframe that involves an except( ) scala> val train_df = higg

[SQL][ML] Pipeline performance regression between 1.6 and 2.x

2017-01-31 Thread Maciej Szymkiewicz

Hi everyone, While experimenting with ML pipelines I experience a significant performance regression when switching from 1.6.x to 2.x. import org.apache.spark.ml.{Pipeline, PipelineStage} import org.apache.spark.ml.feature.{OneHotEncoder, StringIndexer, VectorAssembler} val df = (1 to 40).foldLe

Unique Partition Id per partition

2017-01-31 Thread Chawla,Sumit

Hi All I have a rdd, which i partition based on some key, and then can sc.runJob for each partition. Inside this function, i assign each partition a unique key using following: "%s_%s" % (id(part), int(round(time.time())) This is to make sure that, each partition produces separate bookeeping st

Re: Error Saving Dataframe to Hive with Spark 2.0.0

2017-01-31 Thread Michael Allman

That's understandable. Maybe I can help. :) What happens if you set `HIVE_TABLE_NAME = "default.employees"`? Also, does that table exist before you call `filtered_output_timestamp.write.mode("append").saveAsTable(HIVE_TABLE_NAME)`? Cheers, Michael > On Jan 29, 2017, at 9:52 PM, Chetan Khatri

Re: Unique Partition Id per partition

2017-01-31 Thread Michael Allman

Hi Sumit, Can you use http://spark.apache.org/docs/latest/api/python/pyspark.html?highlight=rdd#pyspark.RDD.mapPartitionsWithIndex to solve your problem? Michael > On Jan 31, 2017,

Call for abstracts open for Dataworks & Hadoop Summit San Jose

2017-01-31 Thread Alan Gates

The Dataworks & Hadoop summit will be in San Jose June 13-15, 2017. The call for abstracts closes February 10. You can submit an abstract at http://tinyurl.com/dwsj17CFA There are tracks for Hadoop, data processing and warehousing, governance and security, IoT and streaming, cloud and operati

Structured Streaming Source error

2017-01-31 Thread Sam Elamin

Hi Folks I am getting a weird error when trying to write a BigQuery Structured Streaming source Error: java.lang.AbstractMethodError: com.samelamin.spark.bigquery.streaming.BigQuerySource.commit(Lorg/apache/spark/sql/execution/streaming/Offset;)V at org.apache.spark.sql.execution.stream

Re: Structured Streaming Source error

2017-01-31 Thread Shixiong(Ryan) Zhu

You used one Spark version to compile your codes but another newer version to run. As the Source APIs are not stable, Spark doesn't guarantee that they are binary compatibility. On Tue, Jan 31, 2017 at 1:39 PM, Sam Elamin wrote: > Hi Folks > > > I am getting a weird error when trying to write a

Re: Structured Streaming Source error

2017-01-31 Thread Sam Elamin

Ha Ryan your everywhere,JIRA and maillist. I thought multitasking was a myth! Thanks for your help. It was using different versions! Regards Sam On Tue, Jan 31, 2017 at 9:48 PM, Shixiong(Ryan) Zhu wrote: > You used one Spark version to compile your codes but another newer version > to run. As

Re: MLlib mission and goals

2017-01-31 Thread Seth Hendrickson

I agree with what Sean said about not supporting arbitrarily many algorithms. I think the goal of MLlib should be to support only core algorithms for machine learning. Ideally Spark ML provides a relatively small set of algorithms that are heavily optimized, and also provides a framework that makes

Spark SQL Dataframe resulting from an except( ) is unusable

[SQL][ML] Pipeline performance regression between 1.6 and 2.x

Unique Partition Id per partition

Re: Error Saving Dataframe to Hive with Spark 2.0.0

Re: Unique Partition Id per partition

Call for abstracts open for Dataworks & Hadoop Summit San Jose

Structured Streaming Source error

Re: Structured Streaming Source error

Re: Structured Streaming Source error

Re: MLlib mission and goals

10 matches

Site Navigation

Mail list logo

Footer information