date:20180130

Spark Structured Streaming for Twitter Streaming data

2018-01-30 Thread Divya Gehlot

Hi, I am exploring the spark structured streaming . When turned to internet to understand about it I could find its more integrated with Kafka or other streaming tool like Kenesis. I couldnt find where we can use Spark Streaming API for twitter streaming data . Would be grateful ,if any body used i

Re: Issue with Cast in Spark Sql

2018-01-30 Thread naresh Goud

Spark/Hive converting decimal to null value if we specify the precision more than available precision in file. Below example give you details. I am not sure why its converting into Null. Note: You need to trim string before casting to decimal Table data with col1 and col2 columns val r = sqlCo

why groupByKey still shuffle if SQL does "Distribute By" on same columns ?

2018-01-30 Thread Dibyendu Bhattacharya

Hi, I am trying something like this.. val sesDS: Dataset[XXX] = hiveContext.sql(select).as[XXX] The select statement is something like this : "select * from sometable DISTRIBUTE by col1, col2, col3" Then comes groupByKey... val gpbyDS = sesDS .groupByKey(x => (x.col1, x.col2, x.col3))

Issue with Cast in Spark Sql

2018-01-30 Thread Arnav kumar

Hi Experts I am trying to convert a string with decimal value to decimal in Spark Sql and load it into Hive/Sql Server. In Hive instead of getting converted to decimal all my values are coming as null. In Sql Server instead of getting decimal values are coming without precision Can you please l

Re: ML:One vs Rest with crossValidator for multinomial in logistic regression

2018-01-30 Thread Bryan Cutler

Hi Michelle, Your original usage of ParamGridBuilder was not quite right, `addGrid` expects (some parameter, array of values for that parameter). If you want to do a grid search with different regularization values, you would do the following: val paramMaps = new ParamGridBuilder().addGrid(logis

Re: spark job error

2018-01-30 Thread Jacek Laskowski

Hi, Start with spark.executor.memory 2g. You may also give spark.yarn.executor.memoryOverhead a try. See https://spark.apache.org/docs/latest/configuration.html and https://spark.apache.org/docs/latest/running-on-yarn.html for more in-depth information. Pozdrawiam, Jacek Laskowski https://a

spark job error

2018-01-30 Thread shyla deshpande

I am running Zeppelin on EMR. with the default settings. I am getting the following error. Restarting the Zeppelin application fixes the problem. What default settings do I need to override that will help fix this error. org.apache.spark.SparkException: Job aborted due to stage failure: Task 71

[Spark Streaming]: Non-deterministic uneven task-to-machine assignment

2018-01-30 Thread LongVehicle

Hello everyone, We are running Spark Streaming jobs in Spark 2.1 in cluster mode in YARN. We have an RDD (3GB) that we periodically (every 30min) refresh by reading from HDFS. Namely, we create a DataFrame /df / using /sqlContext.read.parquet/, and then we create /RDD rdd = df.as[T].rdd/. The firs

回复: Re: use kafka streams API aggregate ?

2018-01-30 Thread 446463...@qq.com

oh sorry, I means just use Kafka streams API do the aggregate and not depend on Spark 446463...@qq.com 发件人：郭鹏飞发送时间： 2018-01-30 23:06 收件人： 446463...@qq.com 抄送： user 主题： Re: use kafka streams API aggregate ? hi, Today I do it too. check your kafka version, then follow one of the guides be

Re: use kafka streams API aggregate ?

2018-01-30 Thread 郭鹏飞

hi, Today I do it too. check your kafka version, then follow one of the guides below. http://spark.apache.org/docs/latest/streaming-kafka-0-8-integration.html http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html http://spark.apache.org/docs/latest/streaming-kafka-integ

ML:One vs Rest with crossValidator for multinomial in logistic regression

2018-01-30 Thread michelleyang

I tried to use One vs Rest in spark ml with pipeline and crossValidator for multimultinomial in logistic regression. It came out with empty coefficients. I figured out it was the setting of ParamGridBuilder. Can anyone help me understand how does the parameter setting affect the crossValidator pro

use kafka streams API aggregate ?

2018-01-30 Thread 446463...@qq.com

Hi I am new to kafka. today I use kafka streams API for real timing process data and I have no idea with this Can someone help me ? 446463...@qq.com

Re: mapGroupsWithState in Python

2018-01-30 Thread ayan guha

Any help would be much appreciated :) On Mon, Jan 29, 2018 at 6:25 PM, ayan guha wrote: > Hi > > I want to write something in Structured streaming: > > 1. I have a dataset which has 3 columns: id, last_update_timestamp, > attribute > 2. I am receiving the data through Kinesis > > I want to dedup

Data Integration with Chinese Social Media Sites

2018-01-30 Thread Sanjay Kulkarni

Hi All, One of the requirement in our project is to integrate data from Chinese Social media platforms - WeChat, Weibo, Baidu(Search). Has anyone done it using Spark; are there any pre-built connectors available and challenges involved in setting up the same. I am sure there could be multiple cha

spark.sql.adaptive.enabled has no effect

2018-01-30 Thread 张万新

Hi there, As far as I know, when *spark.sql.adaptive.enabled* is set to true, the number of post shuffle partitions should change with the map output size. But in my application there is a stage reading 900GB shuffled files only with 200 partitions (which is the default number of *spark.sql.shuf

[Doubt] GridSearch for Hyperparameter Tuning in Spark

2018-01-30 Thread Aakash Basu

Hi, Is there any available pyspark ML or MLLib API for Grid Search similar to GridSearchCV from model_selection of sklearn? I found this - https://spark.apache.org/docs/2.2.0/ml-tuning.html, but it has cross-validation and train-validation for hp-tuning and not pure grid search. Any help? Thank

Spark Structured Streaming for Twitter Streaming data

Re: Issue with Cast in Spark Sql

why groupByKey still shuffle if SQL does "Distribute By" on same columns ?

Issue with Cast in Spark Sql

Re: ML:One vs Rest with crossValidator for multinomial in logistic regression

Re: spark job error

spark job error

[Spark Streaming]: Non-deterministic uneven task-to-machine assignment

回复: Re: use kafka streams API aggregate ?

Re: use kafka streams API aggregate ?

ML:One vs Rest with crossValidator for multinomial in logistic regression

use kafka streams API aggregate ?

Re: mapGroupsWithState in Python

Data Integration with Chinese Social Media Sites

spark.sql.adaptive.enabled has no effect

[Doubt] GridSearch for Hyperparameter Tuning in Spark

16 matches

Site Navigation

Mail list logo

Footer information