Hi All,
Given a Avro Schema object is there a way to get StructType that represents
the schema in Java?
Thanks!
I'm looking to run a job that involves a zillion files in a format called
CDF, a nasa standard. There are a number of libraries out there that can
read CDFs but most of them are not high quality compared to the official
NASA one, which has java bindings (via JNI). It's a little clumsy but I
have
HI all
i am trying to run a sample decision tree, following examples here (for
Mllib)
https://spark.apache.org/docs/latest/ml-classification-regression.html#decision-tree-classifier
the example seems to use a Vectorindexer, however i am missing something.
How does the featureIndexer knows which
May be I should consider something like impala ?
Le ven. 15 déc. 2017 à 11:32, Julien CHAMP a écrit :
> Hi Spark Community members !
>
> I want to do several ( from 1 to 10) aggregate functions using window
> functions on something like 100 columns.
>
> Instead of doing several pass on the data
Hi Jeremy,
just out of curiosity - you do know that this is a SPARK user group?
Regards,
Gourav
On Thu, Dec 14, 2017 at 7:03 PM, Jeremy Kelley
wrote:
> We have a largeish kinesis stream with about 25k events per second and
> each record is around 142k. I have tried multiple cluster sizes, mu
Hi all,
I am trying to compile my udf with janino copmpiler and then register
it in spark and use it afterwards. Here is the code:
String s = " \n" +
"public class MyUDF implements
org.apache.spark.sql.api.java.UDF1 {\n" +
"@Override\n" +
"public St
Hi Spark Community members !
I want to do several ( from 1 to 10) aggregate functions using window
functions on something like 100 columns.
Instead of doing several pass on the data to compute each aggregate
function, is there a way to do this efficiently ?
Currently it seems that doing
val
Hi,
We have a batch processing application that reads logs files over multiple
days, does transformations and aggregations on them using Spark and saves
various intermediate outputs to Parquet. These jobs take many hours to run.
This pipeline is deployed at many customer sites with some site speci