Hi Guys,
We are fitting a Logistic model using the following code.
val Chisqselector = new
ChiSqSelector().setNumTopFeatures(10).setFeaturesCol("VECTOR_1").setLabelCol("TARGET").setOutputCol("selectedFeatures")
val assembler = new VectorAssembler().setInputCols(Array("FEATURES",
"sele
Hello,
I am working with Spark SQL to query Hive Managed Table (in Orc Format)
I have my data organized by partitions and asked to set indexes for each
50,000 Rows by setting ('orc.row.index.stride'='5')
lets say -> after evaluating partition there are around 50 files in which
data is
Thanks a lot! You are right!
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
strangely this is working only for very small dataset of rows.. for very
large datasets apparently the partitioning is not working. is there a limit
to the number of columns or rows when repartitioning according to multiple
columns?
regards,
Imran
On Wed, Oct 18, 2017 at 11:00 AM, Imran Rajjad w