Re: ability to provide custom serializers

2016-12-02 Thread Michael Armbrust
I would love to see something like this. The closest related ticket is probably https://issues.apache.org/jira/browse/SPARK-7768 (though maybe there are enough people using UDTs in their current form that we should just make a new ticket) A few thoughts: - even if you can do implicit search, we

ability to provide custom serializers

2016-12-02 Thread Erik LaBianca
Hi All, Apologies in advance for any confusing terminology, I’m still pretty new to Spark. I’ve got a bunch of Scala case class “domain objects” from an existing application. Many of them contain simple, but unsupported-by-spark types in them, such as case class Foo(timestamp: java.time.Instan

Issues using Hive JDBC

2016-12-02 Thread Jim Hughes
Hi all, I'm investigating adding geospatial user-defined functions and types to Spark SQL in Spark 2.0.x. That is going rather well; I've seen how to add geospatial UDT and UDFs (and even UDAFs!). As part of the investigation, I tried out the Thrift JDBC server, and I have encountered two g

Re: Flink event session window in Spark

2016-12-02 Thread Miguel Morales
Although this may not be natively supported, you can mimic this behavior. By having a micro batch time of 1 minute. Then on your updateStateByKey check how long the session has been running. If it's longer than 10 minutes, return an empty key so that it's removed from the stream. On Fri, Dec 2,

SPARK-18689: A proposal for priority based app scheduling utilizing linux cgroups.

2016-12-02 Thread Hegner, Travis
Hello, I've just created a JIRA to open up discussion of a new feature that I'd like to propose. https://issues.apache.org/jira/browse/SPARK-18689 I'd love to get some feedback on the idea. I know that normally anything related to scheduling or queuing automatically throws up the "hard to

Re: [SPARK-17845] [SQL][PYTHON] More self-evident window function frame boundary API

2016-12-02 Thread Maciej Szymkiewicz
Sure, here you are: https://issues.apache.org/jira/browse/SPARK-18690 To be fair I am not fully convinced it is worth it. On 12/02/2016 12:51 AM, Reynold Xin wrote: > Can you submit a pull request with test cases based on that change? > > > On Dec 1, 2016, 9:39 AM -0800, Maciej Szymkiewicz > , w

Re: Flink event session window in Spark

2016-12-02 Thread Michael Armbrust
Here is the JIRA for adding this feature: https://issues.apache.org/jira/browse/SPARK-10816 On Fri, Dec 2, 2016 at 11:20 AM, Fritz Budiyanto wrote: > Hi All, > > I need help on how to implement Flink event session window in Spark. Is > this possible? > > For instance, I wanted to create a sessio

Flink event session window in Spark

2016-12-02 Thread Fritz Budiyanto
Hi All, I need help on how to implement Flink event session window in Spark. Is this possible? For instance, I wanted to create a session window with a timeout of 10 minutes (see Flink snippet below) Continues event will make the session window alive. If there are no activity for 10 minutes, t

Re: getting PRs into the spark hive dependency

2016-12-02 Thread Reynold Xin
ThriftHttpCLIService.java code is actually in Spark. That pull request is basically no-op. Overall we are moving away from Hive dependency by implementing almost everything in Spark, so the need to change that repo is getting less and less. On Fri, Dec 2, 2016 at 10:03 AM, Marcelo Vanzin wrote:

Re: getting PRs into the spark hive dependency

2016-12-02 Thread Marcelo Vanzin
I believe the latest one is actually in Josh's repository. Which kinda raises a more interesting question: Should we create a repository managed by the Spark project, using the Apache infrastructure, to handle that fork? It seems not very optimal to have this lie in some random person's github acc

getting PRs into the spark hive dependency

2016-12-02 Thread Steve Loughran
What's the process for PR review for the Hive JAR? I ask as I've had a PR for fixing a kerberos problem outstanding for a while, without much response https://github.com/pwendell/hive/pull/2 I'm now looking at the one line it would take for the JAR to consider Hadoop 3.x compatible at the API