Re: Load multiple CSV from different paths

2017-07-05 Thread Didac Gil
t; > In spark 1.6.x I think this may work with spark-csv > <https://github.com/databricks/spark-csv> : > > spark.read.format("com.databricks.spark.csv").option("header", "false") > .schema(custom_schema) > .option('delimiter', 

Load multiple CSV from different paths

2017-07-05 Thread Didac Gil
x27;delimiter', '\t') .option('mode', 'DROPMALFORMED') .load(paths.split(',')) However, even it mentions that this approach would work in Spark 2.x, I don’t find an implementation of load that accepts an Array[String] as an input

Re: Analysis Exception after join

2017-07-03 Thread Didac Gil
4, Inf_period#1039, > infectedFamily#1355L, infectedWorker#1385L] > > +- Aggregate [S_ID#1903L], [S_ID#1903L, count(1) AS infectedStreet#1415L] > > Does someone have a clue about it? > Thanks, > > > Didac Gil de la Iglesia PhD in Computer Science didacg...@gmail.com Spain: +34 696 285 544 Sweden: +46 (0)730229737 Skype: didac.gil.de.la.iglesia signature.asc Description: Message signed with OpenPGP

Re: How to print data to console in structured streaming using Spark 2.1.0?

2017-05-16 Thread Didac Gil
.load(); > val ds1 = ds.select($"value") > val query = ds1.writeStream.outputMode("append").format("console").start() > query.awaitTermination() > There are no errors when I execute this code however I don't see any data > being printed out to console? When I run m

Re: How can i merge multiple rows to one row in sparksql or hivesql?

2017-05-15 Thread Didac Gil
as follow > user_id1 feature1 feature2 feature3 feature4 feature5...feature100 > > Is there a more efficient way except join? > > Thanks! Didac Gil de la Iglesia PhD in Computer Science didacg...@gmail.com Spain: +34 696 285 544 Sweden: +46 (0)730229737 Skype: didac.gil.de.la.iglesia signature.asc Description: Message signed with OpenPGP

Re: How can i merge multiple rows to one row in sparksql or hivesql?

2017-05-15 Thread Didac Gil
ature2 feature3 feature4 feature5...feature100 > > Is there a more efficient way except join? > > Thanks! Didac Gil de la Iglesia PhD in Computer Science didacg...@gmail.com Spain: +34 696 285 544 Sweden: +46 (0)730229737 Skype: didac.gil.de.la.iglesia signature.asc Description: Message signed with OpenPGP

Re: Dataframes na fill with empty list

2017-04-11 Thread Didac Gil
coalesce in a sql expression, but I'm not having any luck here either. > > Obviously, I can do a null check on the fields downstream, however it is not > in the spirit of scala to pass around nulls, so I wanted to see if I was > missing another approach first. > > Thanks, >

Re: kafka and spark integration

2017-03-22 Thread Didac Gil
Spark can be a consumer and a producer from the Kafka point of view. You can create a kafka client in Spark that registers to a topic and reads the feeds, and you can process data in Spark and generate a producer that sends that data into a topic. So, Spark lies next to Kafka and you can use Kaf

Re: Suprised!!!!!Spark-shell showing inconsistent results

2017-02-02 Thread Didac Gil
Is 1570 the value of Col1? If so, you have ordered by that column and selected only the first item. It seems that both results have the same Col1 value, therefore any of them would be a right answer to return. Right? > On 2 Feb 2017, at 11:03, Alex wrote: > > Hi As shown below same query when

Re: Dataframe fails to save to MySQL table in spark app, but succeeds in spark shell

2017-01-26 Thread Didac Gil
Are you sure that “age” is a numeric field? Even numeric, you could pass the “44” between quotes: INSERT into your_table ("user","age","state") VALUES ('user3’,’44','CT’) Are you sure there are no more fields that are specified as NOT NULL, and that you did not provide a value (besides user, a

[no subject]

2016-11-28 Thread Didac Gil
Any suggestions for using something like OneHotEncoder and StringIndexer on an InputDStream? I could try to combine an Indexer based on a static parquet but I want to use the OneHotEncoder approach in Streaming data coming from a socket. Thanks! Dídac Gil de la Iglesia