date:20170607

Re: Scala, Python or Java for Spark programming

2017-06-07 Thread Jörn Franke

A slight advantage of Java is also the tooling that exist around it - better support by build tools and plugins, advanced static code analysis (security, bugs, performance) etc. > On 8. Jun 2017, at 08:20, Mich Talebzadeh wrote: > > What I like about Scala is that it is less ceremonial compar

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-07 Thread Chanh Le

I did add mode -> DROPMALFORMED but it still couldn't ignore it because the error raise from the CSV library that Spark are using. On Thu, Jun 8, 2017 at 12:11 PM Jörn Franke wrote: > The CSV data source allows you to skip invalid lines - this should also > include lines that have more than max

Re: Scala, Python or Java for Spark programming

2017-06-07 Thread Mich Talebzadeh

What I like about Scala is that it is less ceremonial compared to Java. Java users claim that Scala is built on Java so the error tracking is very difficult. Also Scala sits on top of Java and that makes it virtually depending on Java. For me the advantage of Scala is its simplicity and compactnes

SPARK-19547

2017-06-07 Thread Rastogi, Pankaj

Hi, I have been trying to distribute Kafka topics among different instances of same consumer group. I am using KafkaDirectStream API for creating DStreams. After the second consumer group comes up, Kafka does partition rebalance and then Spark driver of the first consumer dies with the followin

Read Data From NFS

2017-06-07 Thread ayan guha

Hi Guys Quick one: How spark deals (ie create partitions) with large files sitting on NFS, assuming the all executors can see the file exactly same way. ie, when I run r = sc.textFile("file://my/file") what happens if the file is on NFS? is there any difference from r = sc.textFile("hdfs://my

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-07 Thread Jörn Franke

The CSV data source allows you to skip invalid lines - this should also include lines that have more than maxColumns. Choose mode "DROPMALFORMED" > On 8. Jun 2017, at 03:04, Chanh Le wrote: > > Hi Takeshi, Jörn Franke, > > The problem is even I increase the maxColumns it still have some lines

Re: No TypeTag Available for String

2017-06-07 Thread Ryan

did you include the proper scala-reflect dependency? On Wed, May 31, 2017 at 1:01 AM, krishmah wrote: > I am currently using Spark 2.0.1 with Scala 2.11.8. However same code works > with Scala 2.10.6. Please advise if I am missing something > > import org.apache.spark.sql.functions.udf > > val g

Re: Worker node log not showed

2017-06-07 Thread Ryan

I think you need to get the logger within the lambda, otherwise it's the logger on driver side which can't work. On Wed, May 31, 2017 at 4:48 PM, Paolo Patierno wrote: > No it's running in standalone mode as Docker image on Kubernetes. > > > The only way I found was to access "stderr" file creat

Re: good http sync client to be used with spark

2017-06-07 Thread Ryan

we use AsyncHttpClient(from the java world) and simply call future.get as synchronous call. On Thu, Jun 1, 2017 at 4:08 AM, vimal dinakaran wrote: > Hi, > In our application pipeline we need to push the data from spark streaming > to a http server. > > I would like to have a http client with be

Re: Question about mllib.recommendation.ALS

2017-06-07 Thread Ryan

1. could you give job, stage & task status from Spark UI? I found it extremely useful for performance tuning. 2. use modele.transform for predictions. Usually we have a pipeline for preparing training data, and use the same pipeline to transform data you want to predict could give us the predictio

Re: Java SPI jar reload in Spark

2017-06-07 Thread Ryan

I'd suggest scripts like js, groovy, etc.. To my understanding the service loader mechanism isn't a good fit for runtime reloading. On Wed, Jun 7, 2017 at 4:55 PM, Jonnas Li(Contractor) < zhongshuang...@envisioncn.com> wrote: > To be more explicit, I used mapwithState() in my application, just li

Re: Convert the feature vector to raw data

2017-06-07 Thread Ryan

if you use StringIndexer to category the data, IndexToString could convert it back. On Wed, Jun 7, 2017 at 6:14 PM, kundan kumar wrote: > Hi Yan, > > This doesnt work. > > thanks, > kundan > > On Wed, Jun 7, 2017 at 2:53 PM, 颜发才(Yan Facai) > wrote: > >> Hi, kumar. >> >> How about removing the `

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-07 Thread Chanh Le

Hi Takeshi, Jörn Franke, The problem is even I increase the maxColumns it still have some lines have larger columns than the one I set and it will cost a lot of memory. So I just wanna skip the line has larger columns than the maxColumns I set. Regards, Chanh On Thu, Jun 8, 2017 at 12:48 AM Tak

Re: Scala, Python or Java for Spark programming

2017-06-07 Thread Matt Tenenbaum

A lot depends on your context as well. If I'm using Spark _for analysis_, I frequently use python; it's a starting point, from which I can then leverage pandas, matplotlib/seaborn, and other powerful tools available on top of python. If the Spark outputs are the ends themselves, rather than the me

Re: Scala, Python or Java for Spark programming

2017-06-07 Thread Bryan Jeffrey

Mich, We use Scala for a large project. On our team we've set a few standards to ensure readability (we try to avoid excessive use of tuples, use named functions, etc.) Given these constraints, I find Scala to be very readable, and far easier to use than Java. The Lambda functionality of Java p

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-07 Thread Takeshi Yamamuro

Is it not enough to set `maxColumns` in CSV options? https://github.com/apache/spark/blob/branch-2.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala#L116 // maropu On Wed, Jun 7, 2017 at 9:45 AM, Jörn Franke wrote: > Spark CSV data source should be able

Re: Scala, Python or Java for Spark programming

2017-06-07 Thread Jörn Franke

I think this is a religious question ;-) Java is often underestimated, because people are not aware of its lambda functionality which makes the code very readable. Scala - it depends who programs it. People coming with the normal Java background write Java-like code in scala which might not be

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-07 Thread Jörn Franke

Spark CSV data source should be able > On 7. Jun 2017, at 17:50, Chanh Le wrote: > > Hi everyone, > I am using Spark 2.1.1 to read csv files and convert to avro files. > One problem that I am facing is if one row of csv file has more columns than > maxColumns (default is 20480). The process of

user-unsubscr...@spark.apache.org

2017-06-07 Thread williamtellme123

user-unsubscr...@spark.apache.org From: kundan kumar [mailto:iitr.kun...@gmail.com] Sent: Wednesday, June 7, 2017 5:15 AM To: 颜发才(Yan Facai) Cc: spark users Subject: Re: Convert the feature vector to raw data Hi Yan, This doesnt work. thanks, kundan On Wed, Jun 7, 2017 at 2

user-unsubscr...@spark.apache.org

2017-06-07 Thread williamtellme123

user-unsubscr...@spark.apache.org From: 颜发才(Yan Facai) [mailto:facai@gmail.com] Sent: Wednesday, June 7, 2017 4:24 AM To: kundan kumar Cc: spark users Subject: Re: Convert the feature vector to raw data Hi, kumar. How about removing the `select` in your code? namely, Dataset resul

user-unsubscr...@spark.apache.org

2017-06-07 Thread williamtellme123

user-unsubscr...@spark.apache.org user-unsubscr...@spark.apache.org From: kundan kumar [mailto:iitr.kun...@gmail.com] Sent: Wednesday, June 7, 2017 4:01 AM To: spark users Subject: Convert the feature vector to raw data I am using Dataset result = model.transform(testData).select("prob

[CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-07 Thread Chanh Le

Hi everyone, I am using Spark 2.1.1 to read csv files and convert to avro files. One problem that I am facing is if one row of csv file has more columns than maxColumns (default is 20480). The process of parsing was stop. Internal state when error was thrown: line=1, column=3, record=0, charIndex=

Re: Exception which using ReduceByKeyAndWindow in Spark Streaming.

2017-06-07 Thread swetha kasireddy

I changed the datastructure to scala.collection.immutable.Set and I still see the same issue. My key is a String. I do the following in my reduce and invReduce. visitorSet1 ++visitorSet2.toTraversable visitorSet1 --visitorSet2.toTraversable On Tue, Jun 6, 2017 at 8:22 PM, Tathagata Das wrote:

Scala, Python or Java for Spark programming

2017-06-07 Thread Mich Talebzadeh

Hi, I am a fan of Scala and functional programming hence I prefer Scala. I had a discussion with a hardcore Java programmer and a data scientist who prefers Python. Their view is that in a collaborative work using Scala programming it is almost impossible to understand someone else's Scala code.

Re: problem initiating spark context with pyspark

2017-06-07 Thread Curtis Burkhalter

Thanks Doc I saw this on another board yesterday so I've tried this by first going to the directory where I've stored the wintutils.exe and then as an admin running the command that you suggested and I get this exception when checking the permissions: C:\winutils\bin>winutils.exe ls -F C:\tmp\hiv

Re: problem initiating spark context with pyspark

2017-06-07 Thread Doc Dwarf

Hi Curtis, I believe in windows, the following command needs to be executed: (will need winutils installed) D:\winutils\bin\winutils.exe chmod 777 D:\tmp\hive On 6 June 2017 at 09:45, Curtis Burkhalter wrote: > Hello all, > > I'm new to Spark and I'm trying to interact with it using Pyspark.

Re: [Spark JDBC] Does spark support read from remote Hive server via JDBC

2017-06-07 Thread Patrik Medvedev

No, I don't. ср, 7 июн. 2017 г. в 16:42, Jean Georges Perrin : > Do you have some other security in place like Kerberos or impersonation? > It may affect your access. > > > jg > > > On Jun 7, 2017, at 02:15, Patrik Medvedev > wrote: > > Hello guys, > > I need to execute hive queries on remote hi

Re: [Spark JDBC] Does spark support read from remote Hive server via JDBC

2017-06-07 Thread Jean Georges Perrin

Do you have some other security in place like Kerberos or impersonation? It may affect your access. jg > On Jun 7, 2017, at 02:15, Patrik Medvedev wrote: > > Hello guys, > > I need to execute hive queries on remote hive server from spark, but for some > reasons i receive only column names(

Re: Convert the feature vector to raw data

2017-06-07 Thread kundan kumar

Hi Yan, This doesnt work. thanks, kundan On Wed, Jun 7, 2017 at 2:53 PM, 颜发才(Yan Facai) wrote: > Hi, kumar. > > How about removing the `select` in your code? > namely, > > Dataset result = model.transform(testData); > result.show(1000, false); > > > > > On Wed, Jun 7, 2017 at 5:00 PM, kundan k

Re: Convert the feature vector to raw data

2017-06-07 Thread Yan Facai

Hi, kumar. How about removing the `select` in your code? namely, Dataset result = model.transform(testData); result.show(1000, false); On Wed, Jun 7, 2017 at 5:00 PM, kundan kumar wrote: > I am using > > Dataset result = model.transform(testData).select("probability", > "label","features");

[Spark JDBC] Does spark support read from remote Hive server via JDBC

2017-06-07 Thread Patrik Medvedev

Hello guys, I need to execute hive queries on remote hive server from spark, but for some reasons i receive only column names(without data). Data available in table, i checked it via HUE and java jdbc connection. Here is my code example: val test = spark.read .option("url", "jdbc:hive2://

[no subject]

2017-06-07 Thread Patrik Medvedev

Hello guys, I need to execute hive queries on remote hive server from spark, but for some reasons i receive only column names(without data). Data available in table, i checked it via HUE and java jdbc connection. Here is my code example: val test = spark.read .option("url", "jdbc:hive2://

Convert the feature vector to raw data

2017-06-07 Thread kundan kumar

I am using Dataset result = model.transform(testData).select("probability", "label","features"); result.show(1000, false); In this case the feature vector is being printed as output. Is there a way that my original raw data gets printed instead of the feature vector OR is there a way to reverse

Re: Java SPI jar reload in Spark

2017-06-07 Thread Jonnas Li(Contractor)

To be more explicit, I used mapwithState() in my application, just like this: stream = KafkaUtils.createStream(..) mappedStream = stream.mapPartitionToPair(..) stateStream = mappedStream.mapwithState(MyUpdateFunc(..)) stateStream.foreachRDD(..) I call the jar in MyUpdateFunc(), and the jar reload

Re: Edge Node in Spark

2017-06-07 Thread Mich Talebzadeh

Agreed with Ayan. Essentially an Edge node is a physical host or VM that is used by the application to run the job. The users or service users start the process from the Edge node. Edge nodes are added to the cluster for example DEV/TEST/UAT etc. Edge node normally has all compatible binaries in

Re: Scala, Python or Java for Spark programming

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

Re: Scala, Python or Java for Spark programming

SPARK-19547

Read Data From NFS

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

Re: No TypeTag Available for String

Re: Worker node log not showed

Re: good http sync client to be used with spark

Re: Question about mllib.recommendation.ALS

Re: Java SPI jar reload in Spark

Re: Convert the feature vector to raw data

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

Re: Scala, Python or Java for Spark programming

Re: Scala, Python or Java for Spark programming

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

Re: Scala, Python or Java for Spark programming

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

user-unsubscr...@spark.apache.org

user-unsubscr...@spark.apache.org

user-unsubscr...@spark.apache.org

[CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

Re: Exception which using ReduceByKeyAndWindow in Spark Streaming.

Scala, Python or Java for Spark programming

Re: problem initiating spark context with pyspark

Re: problem initiating spark context with pyspark

Re: [Spark JDBC] Does spark support read from remote Hive server via JDBC

Re: [Spark JDBC] Does spark support read from remote Hive server via JDBC

Re: Convert the feature vector to raw data

Re: Convert the feature vector to raw data

[Spark JDBC] Does spark support read from remote Hive server via JDBC

[no subject]

Convert the feature vector to raw data

Re: Java SPI jar reload in Spark

Re: Edge Node in Spark

35 matches

Site Navigation

Mail list logo

Footer information