Re: Spark Streaming - Twitter on Python current status

2016-05-30 Thread Reynold Xin
I think your understanding is correct. There will be external libraries that allow you to use the twitter streaming dstream API even in 2.0 though. On Sat, May 28, 2016 at 8:37 AM, Ricardo Almeida < ricardo.alme...@actnowib.com> wrote: > As far as I could understand... > 1. Using Python (PySpark

Re: NLP & Constraint Programming

2016-05-30 Thread Marcin Tustin
Hi Ralph, You could look at https://spark-packages.org/ and see if there's anything you want on there, and if not release your packages there. Constraint programming might benefit from integration into Spark, though. Marcin On Mon, May 30, 2016 at 7:12 AM, Debusmann, Ralph wrote: > Hi, > > >

Re: Secondary Indexing?

2016-05-30 Thread Michael Segel
I have to clarify something… In SparkSQL, we can query against both immutable existing RDDs, and Hive/HBase/MapRDB/ which are mutable. So we have to keep this in mind while we are talking about secondary indexing. (Its not just RDDs) I think the only advantage to being immutable is that once

Secondary Indexing?

2016-05-30 Thread Michael Segel
I’m not sure where to post this since its a bit of a philosophical question in terms of design and vision for spark. If we look at SparkSQL and performance… where does Secondary indexing fit in? The reason this is a bit awkward is that if you view Spark as querying RDDs which are temporary, i

NLP & Constraint Programming

2016-05-30 Thread Debusmann, Ralph
Hi, I am still a Spark newbie who'd like to contribute. There are two topics which I am most interested in: 1) Deep NLP (Syntactic/Semantic analysis) 2) Constraint Programming For both, I see no built-in support in Spark yet. Or is there? Cheers, Ralph

Re: NegativeArraySizeException / segfault

2016-05-30 Thread Jiří Syrový
I think I saw this one already as the first indication that something is wrong and it was related to https://issues.apache.org/jira/browse/SPARK-13516 2016-05-28 1:34 GMT+02:00 Koert Kuipers : > it seemed to be related to an Aggregator, so for tests we replaced it with > an ordinary Dataset.reduc