How to Fill Sparse Data With the Previous Non-Empty Value in SPARQL Dataset

2017-06-28 Thread Carlo Allocca
Dear All, I am trying to propagate the last valid observation (e.g. not null) to the null values in a dataset. Below I reported the partial solution: Dataset tmp800=tmp700.select("uuid", "eventTime", "Washer_rinseCycles"); WindowSpec wspec= Window.partitionBy(tmp800.col("uuid")).o

How to Fill Sparse Data With the Previous Non-Empty Value in SPARQL Dataset

2017-06-25 Thread Carlo . Allocca
Dear All, I need to apply a dataset transformation to replace null values with the previous Non-null Value. As an example, I report the following: from: id | col1 - 1 null 1 null 2 4 2 null 2 null 3 5 3 null 3 null to: id | col1 - 1 null 1 null 2 4 2

Re: using spark-xml_2.10 to extract data from XML file

2017-02-15 Thread Carlo . Allocca
question is: How Can I get it right to use String rowTag="xocs:doc”; and get the right values for ….abstract.ce:para, etc? what am I doing wrong? Many Thanks in advance. Best Regards, Carlo On 14 Feb 2017, at 17:35, carlo allocca mailto:ca6...@open.ac.uk>> wrote: Dear All, I would

Re: using spark-xml_2.10 to extract data from XML file

2017-02-14 Thread Carlo . Allocca
String rowTag="xocs:doc”; and get the right values for ….abstract.ce:para, etc? what am I doing wrong? Many Thanks in advance. Best Regards, Carlo On 14 Feb 2017, at 17:35, carlo allocca mailto:ca6...@open.ac.uk>> wrote: Dear All, I would like to ask you help about the foll

Re: using spark-xml_2.10 to extract data from XML file

2017-02-14 Thread Carlo . Allocca
Dear All, I would like to ask you help about the following issue when using spark-xml_2.10: Given a XML file with the following structure: xocs:doc |-- xocs:item: struct (nullable = true) ||-- bibrecord: struct (nullable = true) |||-- head: struct (nullable = true) |||

using spark-xml_2.10 to extract data from XML file

2017-02-13 Thread Carlo . Allocca
Dear All, I am using spark-xml_2.10 to parse and extract some data from XML files. I got the issue of getting null value whereas the XML file contains actually values. ++-

Re: LinearRegressionWithSGD and Rank Features By Importance

2016-11-09 Thread Carlo . Allocca
Hi Masood, Thanks for the answer. Sure. I will do as suggested. Many Thanks, Best Regards, Carlo On 8 Nov 2016, at 17:19, Masood Krohy mailto:masood.kr...@intact.net>> wrote: labels -- The Open University is incorporated by Royal Charter (RC 000391), an exempt charity in England & Wales and a

Re: LinearRegressionWithSGD and Rank Features By Importance

2016-11-08 Thread Carlo . Allocca
hanks in advance. Best Regards, Carlo On 7 Nov 2016, at 17:14, carlo allocca mailto:ca6...@open.ac.uk>> wrote: I found it just google http://sebastianraschka.com/Articles/2014_about_feature_scaling.html Thanks. Carlo On 7 Nov 2016, at 17:12, carlo allocca mailto:ca6...@open.ac.uk&g

Re: LinearRegressionWithSGD and Rank Features By Importance

2016-11-07 Thread Carlo . Allocca
I found it just google http://sebastianraschka.com/Articles/2014_about_feature_scaling.html Thanks. Carlo On 7 Nov 2016, at 17:12, carlo allocca mailto:ca6...@open.ac.uk>> wrote: Hi Masood, Thank you very much for your insight. I am going to scale all my features as you described. A

Re: LinearRegressionWithSGD and Rank Features By Importance

2016-11-07 Thread Carlo . Allocca
Hi Masood, Thank you very much for your insight. I am going to scale all my features as you described. As I am beginners, Is there any paper/book that would explain the suggested approaches? I would love to read. Many Thanks, Best Regards, Carlo On 7 Nov 2016, at 16:27, Masood Krohy mailt

Re: LinearRegressionWithSGD and Rank Features By Importance

2016-11-04 Thread Carlo . Allocca
Hi Robin, On 4 Nov 2016, at 09:19, Robin East mailto:robin.e...@xense.co.uk>> wrote: Hi Do you mean the test of significance that you usually get with R output? Yes, exactly. I don’t think there is anything implemented in the standard MLLib libraries however I believe that the sparkR version

Re: LinearRegressionWithSGD and Rank Features By Importance

2016-11-04 Thread Carlo . Allocca
Hi Mohit, Thank you for your reply. OK. it means coefficient with high score are more important that other with low score… Many Thanks, Best Regards, Carlo > On 3 Nov 2016, at 20:41, Mohit Jaggi wrote: > > For linear regression, it should be fairly easy. Just sort the co-efficients > :) >

LinearRegressionWithSGD and Rank Features By Importance

2016-11-03 Thread Carlo . Allocca
Hi All, I am using SPARK and in particular the MLib library. import org.apache.spark.mllib.regression.LabeledPoint; import org.apache.spark.mllib.regression.LinearRegressionModel; import org.apache.spark.mllib.regression.LinearRegressionWithSGD; For my problem I am using the LinearRegressionWith

Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Carlo . Allocca
Thanks Marcelo. Problem solved. Best, Carlo Hi Marcelo, Thanks you for your help. Problem solved as you suggested. Best Regards, Carlo > On 5 Aug 2016, at 18:34, Marcelo Vanzin wrote: > > On Fri, Aug 5, 2016 at 9:53 AM, Carlo.Allocca > wrote: >> >>org.apache.spark >>

Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Carlo . Allocca
I have also executed: mvn dependency:tree |grep log [INFO] | | +- com.esotericsoftware:minlog:jar:1.3.0:compile [INFO] +- log4j:log4j:jar:1.2.17:compile [INFO] +- org.slf4j:slf4j-log4j12:jar:1.7.16:compile [INFO] | | +- commons-logging:commons-logging:jar:1.1.3:compile and the POM reports

Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Carlo . Allocca
Please Sean, could you detail the version mismatch? Many thanks, Carlo On 5 Aug 2016, at 18:11, Sean Owen mailto:so...@cloudera.com>> wrote: You also seem to have a version mismatch here. -- The Open University is incorporated by Royal Charter (RC 000391), an exempt charity in England & Wales

Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Carlo . Allocca
Hi Ted, Thanks for the promptly answer. It is not yet clear to me what I should do. How to fix it? Many thanks, Carlo On 5 Aug 2016, at 17:58, Ted Yu mailto:yuzhih...@gmail.com>> wrote: private[spark] trait Logging { -- The Open University is incorporated by Royal Charter (RC 000391), an exe

ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Carlo . Allocca
Dear All, I would like to ask for your help about the following issue: java.lang.ClassNotFoundException: org.apache.spark.Logging I checked and the class Logging is not present. Moreover, the line of code where the exception is thrown final org.apache.spark.mllib.regression.LinearRegressionMode

Re: Dataset and JavaRDD: how to eliminate the header.

2016-08-03 Thread Carlo . Allocca
On 3 Aug 2016, at 22:01, Mich Talebzadeh mailto:mich.talebza...@gmail.com>> wrote: ok in other words the result set of joining two dataset ends up with inconsistent result as a header from one DS is joined with another row from another DS? I am not 100% sure I got this point. Let me check if I

Re: Dataset and JavaRDD: how to eliminate the header.

2016-08-03 Thread Carlo . Allocca
Hi Mich, Thanks again. My issue is not when I read the csv from a file. It is when you have a Dataset that is output of some join operations. Any help on that? Many Thanks, Best, Carlo On 3 Aug 2016, at 21:43, Mich Talebzadeh mailto:mich.talebza...@gmail.com>> wrote: hm odd. Otherwise you ca

Re: Dataset and JavaRDD: how to eliminate the header.

2016-08-03 Thread Carlo . Allocca
One more: it seems that the steps == Step 1: transform the Dataset into JavaRDD JavaRDD dataPointsWithHeader =dataset1_Join_dataset2.toJavaRDD(); and List someRows = dataPointsWithHeader.collect(); someRows.forEach(System.out::println); do not print the header. So, Could I assume

Re: Dataset and JavaRDD: how to eliminate the header.

2016-08-03 Thread Carlo . Allocca
Thanks Mich. Yes, I know both headers (categoryRankSchema, categorySchema ) as expressed below: this.dataset1 = d1_DFR.schema(categoryRankSchema).csv(categoryrankFilePath); this.dataset2 = d2_DFR.schema(categorySchema).csv(categoryFilePath); Can you use filter to get rid of the

Re: Dataset and JavaRDD: how to eliminate the header.

2016-08-03 Thread Carlo . Allocca
Hi Aseem, Thank you very much for your help. Please, allow me to be more specific for my case (to some extent I already do what you suggested): Let us imagine that I two csv datasets d1 and d2. I generate the Dataset as in the following: == Reading d1: sparkSession=spark; options =

Dataset and JavaRDD: how to eliminate the header.

2016-08-03 Thread Carlo . Allocca
Hi All, I would like to apply a regression to my data. One of the workflow is the prepare my data as a JavaRDD starting from a Dataset with its header. So, what I did was the following: == Step 1: transform the Dataset into JavaRDD JavaRDD dataPointsWithHeader =modelDS.toJavaRDD();

Re: converting a Dataset into JavaRDD

2016-08-03 Thread Carlo . Allocca
problem solved. The package org.apache.spark.api.java.function.Function was missing. Thanks. Carlo On 3 Aug 2016, at 12:14, Carlo.Allocca mailto:carlo.allo...@open.ac.uk>> wrote: Hi All, I am trying to convert a Dataset into JavaRDD in order to apply a linear regression. I am using spark-co

converting a Dataset into JavaRDD

2016-08-03 Thread Carlo . Allocca
Hi All, I am trying to convert a Dataset into JavaRDD in order to apply a linear regression. I am using spark-core_2.10, version2.0.0 with Java 1.8. My current approach is: == Step 1: convert the Dataset into JavaRDD JavaRDD dataPoints =modelDS.toJavaRDD(); == Step 2: convert JavaRDD int

Re: SPARK Exception thrown in awaitResult

2016-07-28 Thread Carlo . Allocca
Solved!! The solution is using date_format with the “u” option. Thank you very much. Best, Carlo On 28 Jul 2016, at 18:59, carlo allocca mailto:ca6...@open.ac.uk>> wrote: Hi Mark, Thanks for the suggestion. I changed the maven entries as follows spark-core_2.10 2.0.0 and

Re: SPARK Exception thrown in awaitResult

2016-07-28 Thread Carlo . Allocca
1:14, Carlo.Allocca mailto:carlo.allo...@open.ac.uk>> wrote: I have also found the following two related links: 1) https://github.com/apache/spark/commit/947b9020b0d621bc97661a0a056297e6889936d3 2) https://github.com/apache/spark/pull/12433 which both explain why it happens but noth

Re: SPARK Exception thrown in awaitResult

2016-07-28 Thread Carlo . Allocca
commit/947b9020b0d621bc97661a0a056297e6889936d3 2) https://github.com/apache/spark/pull/12433 which both explain why it happens but nothing about what to do to solve it. Do you have any suggestion/recommendation? Many thanks. Carlo On 28 Jul 2016, at 11:06, carlo allocca mailto:ca6...@open.ac.uk>> wrote: Hi Rui, Th

Re: SPARK Exception thrown in awaitResult

2016-07-28 Thread Carlo . Allocca
/recommendation? Many thanks. Carlo On 28 Jul 2016, at 11:06, carlo allocca mailto:ca6...@open.ac.uk>> wrote: Hi Rui, Thanks for the promptly reply. No, I am not using Mesos. Ok. I am writing a code to build a suitable dataset for my needs as in the following: == Session configuration: SparkS

Re: SPARK Exception thrown in awaitResult

2016-07-28 Thread Carlo . Allocca
Hi Rui, Thanks for the promptly reply. No, I am not using Mesos. Ok. I am writing a code to build a suitable dataset for my needs as in the following: == Session configuration: SparkSession spark = SparkSession .builder() .master("local[6]") //

SPARK Exception thrown in awaitResult

2016-07-28 Thread Carlo . Allocca
Hi All, I am running SPARK locally, and when running d3=join(d1,d2) and d5=(d3, d4) am getting the following exception "org.apache.spark.SparkException: Exception thrown in awaitResult”. Googling for it, I found that the closed is the answer reported https://issues.apache.org/jira/browse/SPARK

SPARK UDF related issue

2016-07-25 Thread Carlo . Allocca
Hi All, I am using SPARK 2.0 and I have got the following issue: I am able to run the step 1-5 (see below) but not the step 6 which uses an UDF. Actually, the step 1-5 takes few second and the step 6 looks like that it never ends. Is there anything wrong? how should I address it? Any sugge

SPARK SQL and join pipeline issue

2016-07-25 Thread Carlo . Allocca
Dear All, I have the following question: I am using SPARK SQL 2.0 version and, in particular I am doing some joins in pipeline of the following pattern (d3 = d1 join d2, d4=d5 join d6, d7=d3 join d4). When running my code, I realised that the building of d7 generates an issue as reported belo