Distributed Anomaly Detection using MIDAS

2020-06-27 Thread Shivin Srivastava
Hi All, I have recently been exploring MIDAS: an algorithm for Streaming Anomaly Detection. A production level parallel and distributed implementation of MIDAS should be quite useful to the industry. I feel that Spark is very well-suited for the same. If anyone is interested to contribute/collabor

Re: Distributed Anomaly Detection using MIDAS

2020-06-27 Thread Aditya Addepalli
Hi Shivin, I'm interested in collaborating with you on this project. I have been using pyspark for a while now and quite familiar with it. Do you have any plan on how to proceed? Thanks, Aditya On Sat, 27 Jun, 2020, 2:58 pm Shivin Srivastava, wrote: > Hi All, > > I have recently been explori

Contract for PartitionReader/InputPartition for ColumnarBatch?

2020-06-27 Thread Micah Kornfield
Hello spark-dev, Looking at ColumnarBatch [1] it seems to indicate a single object is meant to be used for the entire loading process. Does this imply that Spark assumes the ColumnarBatch and any direct references to ColumnarBatch (e.g. UTF8Strings) returned by InputPartitionReader/PartitionReade

Re: [DISCUSS][SPIP] Graceful Decommissioning

2020-06-27 Thread Holden Karau
There’s been some comments & a few additions in the doc, but it seems like the folks taking a look generally agree on the design. If there are no other issues I will bring this to a vote late next week. On Thu, Jun 25, 2020 at 7:43 PM Holden Karau wrote: > Thanks for looping in more folks :) > >

UnknownSource NullPointerException in CodeGen. with Custom Strategy

2020-06-27 Thread Nasrulla Khan Haris
HI Spark Developers, Encountering this NullPointerException while reading parquet file in multi-node cluster. However while running the spark-job locally on single-node (development environment) not encountering this error. Appreciate your inputs. Thanks in advance, NKH pqjah.dx.internal.cloud

RE: UnknownSource NullPointerException in CodeGen. with Custom Strategy

2020-06-27 Thread Nasrulla Khan Haris
StackTrace with WSCG disabled scala> df29.groupBy("LastName").count().show() 20/06/28 06:20:55 WARN TaskSetManager: Lost task 1.0 in stage 2.0 (TID 8, wn5-nkhwes.zhqzi2stszlevpekfsrlmpqjah.dx.internal.cloudapp.net, executor 4): java.lang.NullPointerException at org.apache.spark.sql.cat