Re: HashingTFModel/IDFModel in Structured Streaming

2017-11-16 Thread Davis Varghese
PR > for it. > If anyone can have a look and suggest any changes it would be really > appreciated. > > Thank you. > > > 2017-11-15 1:11 GMT+00:00 Bago Amirbekian : > >> There is a known issue with VectorAssembler which causes it to fail in >> streaming if any of

Re: HashingTFModel/IDFModel in Structured Streaming

2017-11-15 Thread Davis Varghese
Since we are on spark 2.2, I backported/fixed it. Here is the diff file comparing against https://github.com/apache/spark/blob/73fe1d8087cfc2d59ac5b9af48b4cf5f5b86f920/mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala 24c24 < import org.apache.spark.ml.param.{Param, ParamMap, P

Re: HashingTFModel/IDFModel in Structured Streaming

2017-11-15 Thread Davis Varghese
Since we are on spark 2.2, I backported/fixed it. Here is the diff file comparing against https://github.com/apache/spark/blob/73fe1d8087cfc2d59ac5b9af48b4cf5f5b86f920/mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala 24c24 < import org.apache.spark.ml.param.{Param, ParamMap, P

Re: HashingTFModel/IDFModel in Structured Streaming

2017-11-12 Thread Davis Varghese
Bago, Finally I am able to create one which fails consistently. I think the issue is caused by the VectorAssembler in the model. In the new code, I have 2 features(1 text and 1 number) and I have to run through a VectorAssembler before giving to LogisticRegression. Code and test data below import

Re: HashingTFModel/IDFModel in Structured Streaming

2017-11-09 Thread Davis Varghese
Bago, The code I wrote is not generating the issue. In our case, we build a ML pipeline from a UI and is done in a particular fashion so that a user can create a pipeline behind the scene using drag and drop. I am yet to dig deeper to recreate the same as a standalone code. Meanwhile I am sharing

Re: HashingTFModel/IDFModel in Structured Streaming

2017-11-01 Thread Davis Varghese
Sure. I will get one over the weekend -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

HashingTFModel/IDFModel in Structured Streaming

2017-10-16 Thread Davis Varghese
I have built a ML pipeline model on a static twitter data for sentiment analysis. When I use the model on a structured stream, it always throws "Queries with streaming sources must be executed with writeStream.start()". This particular model doesn't contain any documented "unsupported" operations.