Re: Use Spark extension points to implement row-level security

2018-08-18 Thread Richard Siebeling
ated through the constructor and > not using the scala getOrCreate() method (I've sent an email regarding > this). But other than that, it works. > > > On Fri, Aug 17, 2018, 03:56 Richard Siebeling > wrote: > >> Hi, >> >> I'd like to implement so

Use Spark extension points to implement row-level security

2018-08-16 Thread Richard Siebeling
Hi, I'd like to implement some kind of row-level security and am thinking of adding additional filters to the logical plan possibly using the Spark extensions. Would this be feasible, for example using the injectResolutionRule? thanks in advance, Richard

Determine Cook's distance / influential data points

2017-12-13 Thread Richard Siebeling
Hi, would it be possible to determine the Cook's distance using Spark? thanks, Richard

Re: Handling skewed data

2017-04-19 Thread Richard Siebeling
I'm also interested in this, does anyone this? On 17 April 2017 at 17:17, Vishnu Viswanath wrote: > Hello All, > > Does anyone know if the skew handling code mentioned in this talk > https://www.youtube.com/watch?v=bhYV0JOPd9Y was added to spark? > > If so can I know where to look for more info,

Re: Fast write datastore...

2017-03-15 Thread Richard Siebeling
maybe Apache Ignite does fit your requirements On 15 March 2017 at 08:44, vincent gromakowski < vincent.gromakow...@gmail.com> wrote: > Hi > If queries are statics and filters are on the same columns, Cassandra is a > good option. > > Le 15 mars 2017 7:04 AM, "muthu" a écrit : > > Hello there, >

Re: Continuous or Categorical

2017-03-01 Thread Richard Siebeling
I think it's difficult to determine with certainty if a variable is continuous or categorical, what to do when the values are numbers like 1, 2, 2, 3, 4, 5. These values can both be continuous as categorical. for exa However you could perform some checks: - are there any decimal values > it will pr

Re: is it possible to read .mdb file in spark

2017-01-26 Thread Richard Siebeling
Hi, haven't used it, but Jackcess should do the trick > http://jackcess.sourceforge.net/ kind regards, Richard 2017-01-25 11:47 GMT+01:00 Selvam Raman : > > > -- > Selvam Raman > "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து" >

Re: Could not parse Master URL for Mesos on Spark 2.1.0

2017-01-09 Thread Richard Siebeling
under changes of behaviour or changes in the build process or something like that, kind regards, Richard On 9 January 2017 at 22:55, Richard Siebeling wrote: > Hi, > > I'm setting up Apache Spark 2.1.0 on Mesos and I am getting a "Could not > parse Master URL: 'me

Could not parse Master URL for Mesos on Spark 2.1.0

2017-01-09 Thread Richard Siebeling
Hi, I'm setting up Apache Spark 2.1.0 on Mesos and I am getting a "Could not parse Master URL: 'mesos://xx.xx.xxx.xxx:5050'" error. Mesos is running fine (both the master as the slave, it's a single machine configuration). I really don't understand why this is happening since the same configurati

Re: How to stop a running job

2016-10-06 Thread Richard Siebeling
t; If running in client mode, just kill the job. If running in cluster >>> mode, the Spark Dispatcher exposes an HTTP API for killing jobs. I don't >>> think this is externally documented, so you might have to check the code to >>> find this endpoint. If you run

How to stop a running job

2016-10-05 Thread Richard Siebeling
Hi, how can I stop a long running job? We're having Spark running in Mesos Coarse-grained mode. Suppose the user start a long running job, makes a mistake, changes a transformation and runs the job again. In this case I'd like to cancel the first job and after that start the second job. It would

Re: building Spark 2.1 vs Java 1.8 on Ubuntu 16/06

2016-10-05 Thread Richard Siebeling
sorry, now with the link included, see http://spark.apache.org/docs/latest/building-spark.html On Wed, Oct 5, 2016 at 10:19 AM, Richard Siebeling wrote: > Hi, > > did you set the following option: export MAVEN_OPTS="-Xmx2g > -XX:ReservedCodeCacheSize=512m" > > kind r

Re: building Spark 2.1 vs Java 1.8 on Ubuntu 16/06

2016-10-05 Thread Richard Siebeling
Hi, did you set the following option: export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m" kind regards, Richard On Tue, Oct 4, 2016 at 10:21 PM, Marco Mistroni wrote: > Hi all > my mvn build of Spark 2.1 using Java 1.8 is spinning out of memory with > an error saying it cannot allocate

Re: Best way to calculate intermediate column statistics

2016-08-25 Thread Richard Siebeling
r:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising

Re: Best way to calculate intermediate column statistics

2016-08-24 Thread Richard Siebeling
result just after the calculation? >> Then you may aggregate statistics from the cached dataframe. >> This way it won't hit performance too much. >> >> Regards >> -- >> Bedrytski Aliaksandr >> sp...@bedryt.ski >> >> >> >> On Wed,

Best way to calculate intermediate column statistics

2016-08-24 Thread Richard Siebeling
Hi, what is the best way to calculate intermediate column statistics like the number of empty values and the number of distinct values each column in a dataset when aggregating of filtering data next to the actual result of the aggregate or the filtered data? We are developing an application in w

Re: Spark 2.0 - make-distribution fails while regular build succeeded

2016-08-04 Thread Richard Siebeling
fixed! after adding the option -DskipTests everything build ok. Thanks Sean for your help On Thu, Aug 4, 2016 at 8:18 PM, Richard Siebeling wrote: > I don't see any other errors, these are the last lines of the > make-distribution log. > Above these lines there are no errors.

Re: Spark 2.0 - make-distribution fails while regular build succeeded

2016-08-04 Thread Richard Siebeling
4, 2016 at 6:30 PM, Sean Owen wrote: > That message is a warning, not error. It is just because you're cross > compiling with Java 8. If something failed it was elsewhere. > > > On Thu, Aug 4, 2016, 07:09 Richard Siebeling wrote: > >> Hi, >> >> spark 2.0

Spark 2.0 - make-distribution fails while regular build succeeded

2016-08-03 Thread Richard Siebeling
Hi, spark 2.0 with mapr hadoop libraries was succesfully build using the following command: ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.0-mapr-1602 -DskipTests clean package However when I then try to build a runnable distribution using the following command ./dev/make-distribution.sh --

Errors when running SparkPi on a clean Spark 1.6.1 on Mesos

2016-05-15 Thread Richard Siebeling
het volgende geschreven: > On Sun, May 15, 2016 at 5:50 PM, Richard Siebeling > wrote: > > > I'm getting the following errors running SparkPi on a clean just compiled > > and checked Mesos 0.29.0 installation with Spark 1.6.1 > > > > 16/05/15 23:05:52 ERROR Task

Re: Errors when running SparkPi on a clean Spark 1.6.1 on Mesos

2016-05-15 Thread Richard Siebeling
B.t.w. this is on a single node cluster Op zondag 15 mei 2016 heeft Richard Siebeling het volgende geschreven: > Hi, > > I'm getting the following errors running SparkPi on a clean just compiled > and checked Mesos 0.29.0 installation with Spark 1.6.1 > >

Errors when running SparkPi on a clean Spark 1.6.1 on Mesos

2016-05-15 Thread Richard Siebeling
Hi, I'm getting the following errors running SparkPi on a clean just compiled and checked Mesos 0.29.0 installation with Spark 1.6.1 16/05/15 23:05:52 ERROR TaskSchedulerImpl: Lost executor e23f2d53-22c5-40f0-918d-0d73805fdfec-S0/0 on xxx Remote RPC client disassociated. Likely due to containers

Re: Split columns in RDD

2016-01-19 Thread Richard Siebeling
ist = inputString.split(",") >>> (stringList, stringList.size) >>> } >>> >>> If you then wanted to find out how many state columns you should have in >>> your table you could use a normal reduce (with a filter beforehand to >>>

Re: Split columns in RDD

2016-01-19 Thread Richard Siebeling
the driver. > > Regards > Sab > On 19-Jan-2016 8:48 pm, "Richard Siebeling" wrote: > >> Hi, >> >> what is the most efficient way to split columns and know how many columns >> are created. >> >> Here is the curr

Split columns in RDD

2016-01-19 Thread Richard Siebeling
Hi, what is the most efficient way to split columns and know how many columns are created. Here is the current RDD - ID STATE - 1 TX, NY, FL 2 CA, OH - This is the preferred output: - IDSTATE_1 STATE_2

Stacking transformations and using intermediate results in the next transformation

2016-01-15 Thread Richard Siebeling
Hi, we're stacking multiple RDD operations on each other, for example as a source we have a RDD[List[String]] like ["a", "b, c", "d"] ["a", "d, a", "d"] In the first step we split the second column in two columns, in the next step we filter the data on column 3 = "c" and in the final step we're

Re: ROSE: Spark + R on the JVM.

2016-01-13 Thread Richard Siebeling
Hi David, the use case is that we're building a data processing system with an intuitive user interface where Spark is used as the data processing framework. We would like to provide a HTML user interface to R where the user types or copy-pastes his R code, the system should then send this R code

Re: ROSE: Spark + R on the JVM.

2016-01-12 Thread Richard Siebeling
Hi, this looks great and seems to be very usable. Would it be possible to access the session API from within ROSE, to get for example the images that are generated by R / openCPU and the logging to stdout that is logged by R? thanks in advance, Richard On Tue, Jan 12, 2016 at 10:16 PM, Vijay Kir

Re: combining operations elegantly

2014-03-24 Thread Richard Siebeling
mple with 2 columns, where i do conditional count for first > > column, and simple sum for second: > > scala> sc.parallelize((1 to 10).zip(11 to 20)).map{ case (x, y) => ( > > | if (x > 5) 1 else 0, > > | y > > | )}.reduce(_ + _) > > res3: (In

Re: combining operations elegantly

2014-03-23 Thread Richard Siebeling
Hi Koert, Patrick, do you already have an elegant solution to combine multiple operations on a single RDD? Say for example that I want to do a sum over one column, a count and an average over another column, thanks in advance, Richard On Mon, Mar 17, 2014 at 8:20 AM, Richard Siebeling wrote

Re: combining operations elegantly

2014-03-17 Thread Richard Siebeling
Patrick, Koert, I'm also very interested in these examples, could you please post them if you find them? thanks in advance, Richard On Thu, Mar 13, 2014 at 9:39 PM, Koert Kuipers wrote: > not that long ago there was a nice example on here about how to combine > multiple operations on a single