A slight advantage of Java is also the tooling that exist around it - better
support by build tools and plugins, advanced static code analysis (security,
bugs, performance) etc.
> On 8. Jun 2017, at 08:20, Mich Talebzadeh wrote:
>
> What I like about Scala is that it is less ceremonial compar
I did add mode -> DROPMALFORMED but it still couldn't ignore it because the
error raise from the CSV library that Spark are using.
On Thu, Jun 8, 2017 at 12:11 PM Jörn Franke wrote:
> The CSV data source allows you to skip invalid lines - this should also
> include lines that have more than max
What I like about Scala is that it is less ceremonial compared to Java.
Java users claim that Scala is built on Java so the error tracking is very
difficult. Also Scala sits on top of Java and that makes it virtually
depending on Java.
For me the advantage of Scala is its simplicity and compactnes
Hi,
I have been trying to distribute Kafka topics among different instances of
same consumer group. I am using KafkaDirectStream API for creating DStreams.
After the second consumer group comes up, Kafka does partition rebalance and
then Spark driver of the first consumer dies with the followin
Hi Guys
Quick one: How spark deals (ie create partitions) with large files sitting
on NFS, assuming the all executors can see the file exactly same way.
ie, when I run
r = sc.textFile("file://my/file")
what happens if the file is on NFS?
is there any difference from
r = sc.textFile("hdfs://my
The CSV data source allows you to skip invalid lines - this should also include
lines that have more than maxColumns. Choose mode "DROPMALFORMED"
> On 8. Jun 2017, at 03:04, Chanh Le wrote:
>
> Hi Takeshi, Jörn Franke,
>
> The problem is even I increase the maxColumns it still have some lines
did you include the proper scala-reflect dependency?
On Wed, May 31, 2017 at 1:01 AM, krishmah wrote:
> I am currently using Spark 2.0.1 with Scala 2.11.8. However same code works
> with Scala 2.10.6. Please advise if I am missing something
>
> import org.apache.spark.sql.functions.udf
>
> val g
I think you need to get the logger within the lambda, otherwise it's the
logger on driver side which can't work.
On Wed, May 31, 2017 at 4:48 PM, Paolo Patierno wrote:
> No it's running in standalone mode as Docker image on Kubernetes.
>
>
> The only way I found was to access "stderr" file creat
we use AsyncHttpClient(from the java world) and simply call future.get as
synchronous call.
On Thu, Jun 1, 2017 at 4:08 AM, vimal dinakaran wrote:
> Hi,
> In our application pipeline we need to push the data from spark streaming
> to a http server.
>
> I would like to have a http client with be
1. could you give job, stage & task status from Spark UI? I found it
extremely useful for performance tuning.
2. use modele.transform for predictions. Usually we have a pipeline for
preparing training data, and use the same pipeline to transform data you
want to predict could give us the predictio
I'd suggest scripts like js, groovy, etc.. To my understanding the service
loader mechanism isn't a good fit for runtime reloading.
On Wed, Jun 7, 2017 at 4:55 PM, Jonnas Li(Contractor) <
zhongshuang...@envisioncn.com> wrote:
> To be more explicit, I used mapwithState() in my application, just li
if you use StringIndexer to category the data, IndexToString could convert
it back.
On Wed, Jun 7, 2017 at 6:14 PM, kundan kumar wrote:
> Hi Yan,
>
> This doesnt work.
>
> thanks,
> kundan
>
> On Wed, Jun 7, 2017 at 2:53 PM, 颜发才(Yan Facai)
> wrote:
>
>> Hi, kumar.
>>
>> How about removing the `
Hi Takeshi, Jörn Franke,
The problem is even I increase the maxColumns it still have some lines have
larger columns than the one I set and it will cost a lot of memory.
So I just wanna skip the line has larger columns than the maxColumns I set.
Regards,
Chanh
On Thu, Jun 8, 2017 at 12:48 AM Tak
A lot depends on your context as well. If I'm using Spark _for analysis_, I
frequently use python; it's a starting point, from which I can then
leverage pandas, matplotlib/seaborn, and other powerful tools available on
top of python.
If the Spark outputs are the ends themselves, rather than the me
Mich,
We use Scala for a large project. On our team we've set a few standards to
ensure readability (we try to avoid excessive use of tuples, use named
functions, etc.) Given these constraints, I find Scala to be very
readable, and far easier to use than Java. The Lambda functionality of
Java p
Is it not enough to set `maxColumns` in CSV options?
https://github.com/apache/spark/blob/branch-2.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala#L116
// maropu
On Wed, Jun 7, 2017 at 9:45 AM, Jörn Franke wrote:
> Spark CSV data source should be able
I think this is a religious question ;-)
Java is often underestimated, because people are not aware of its lambda
functionality which makes the code very readable. Scala - it depends who
programs it. People coming with the normal Java background write Java-like code
in scala which might not be
Spark CSV data source should be able
> On 7. Jun 2017, at 17:50, Chanh Le wrote:
>
> Hi everyone,
> I am using Spark 2.1.1 to read csv files and convert to avro files.
> One problem that I am facing is if one row of csv file has more columns than
> maxColumns (default is 20480). The process of
user-unsubscr...@spark.apache.org
From: kundan kumar [mailto:iitr.kun...@gmail.com]
Sent: Wednesday, June 7, 2017 5:15 AM
To: 颜发才(Yan Facai)
Cc: spark users
Subject: Re: Convert the feature vector to raw data
Hi Yan,
This doesnt work.
thanks,
kundan
On Wed, Jun 7, 2017 at 2
user-unsubscr...@spark.apache.org
From: 颜发才(Yan Facai) [mailto:facai@gmail.com]
Sent: Wednesday, June 7, 2017 4:24 AM
To: kundan kumar
Cc: spark users
Subject: Re: Convert the feature vector to raw data
Hi, kumar.
How about removing the `select` in your code?
namely,
Dataset resul
user-unsubscr...@spark.apache.org
user-unsubscr...@spark.apache.org
From: kundan kumar [mailto:iitr.kun...@gmail.com]
Sent: Wednesday, June 7, 2017 4:01 AM
To: spark users
Subject: Convert the feature vector to raw data
I am using
Dataset result = model.transform(testData).select("prob
Hi everyone,
I am using Spark 2.1.1 to read csv files and convert to avro files.
One problem that I am facing is if one row of csv file has more columns
than maxColumns (default is 20480). The process of parsing was stop.
Internal state when error was thrown: line=1, column=3, record=0,
charIndex=
I changed the datastructure to scala.collection.immutable.Set and I still
see the same issue. My key is a String. I do the following in my reduce
and invReduce.
visitorSet1 ++visitorSet2.toTraversable
visitorSet1 --visitorSet2.toTraversable
On Tue, Jun 6, 2017 at 8:22 PM, Tathagata Das
wrote:
Hi,
I am a fan of Scala and functional programming hence I prefer Scala.
I had a discussion with a hardcore Java programmer and a data scientist who
prefers Python.
Their view is that in a collaborative work using Scala programming it is
almost impossible to understand someone else's Scala code.
Thanks Doc I saw this on another board yesterday so I've tried this by
first going to the directory where I've stored the wintutils.exe and then
as an admin running the command that you suggested and I get this
exception when checking the permissions:
C:\winutils\bin>winutils.exe ls -F C:\tmp\hiv
Hi Curtis,
I believe in windows, the following command needs to be executed: (will
need winutils installed)
D:\winutils\bin\winutils.exe chmod 777 D:\tmp\hive
On 6 June 2017 at 09:45, Curtis Burkhalter
wrote:
> Hello all,
>
> I'm new to Spark and I'm trying to interact with it using Pyspark.
No, I don't.
ср, 7 июн. 2017 г. в 16:42, Jean Georges Perrin :
> Do you have some other security in place like Kerberos or impersonation?
> It may affect your access.
>
>
> jg
>
>
> On Jun 7, 2017, at 02:15, Patrik Medvedev
> wrote:
>
> Hello guys,
>
> I need to execute hive queries on remote hi
Do you have some other security in place like Kerberos or impersonation? It may
affect your access.
jg
> On Jun 7, 2017, at 02:15, Patrik Medvedev wrote:
>
> Hello guys,
>
> I need to execute hive queries on remote hive server from spark, but for some
> reasons i receive only column names(
Hi Yan,
This doesnt work.
thanks,
kundan
On Wed, Jun 7, 2017 at 2:53 PM, 颜发才(Yan Facai) wrote:
> Hi, kumar.
>
> How about removing the `select` in your code?
> namely,
>
> Dataset result = model.transform(testData);
> result.show(1000, false);
>
>
>
>
> On Wed, Jun 7, 2017 at 5:00 PM, kundan k
Hi, kumar.
How about removing the `select` in your code?
namely,
Dataset result = model.transform(testData);
result.show(1000, false);
On Wed, Jun 7, 2017 at 5:00 PM, kundan kumar wrote:
> I am using
>
> Dataset result = model.transform(testData).select("probability",
> "label","features");
Hello guys,
I need to execute hive queries on remote hive server from spark, but for
some reasons i receive only column names(without data).
Data available in table, i checked it via HUE and java jdbc connection.
Here is my code example:
val test = spark.read
.option("url", "jdbc:hive2://
Hello guys,
I need to execute hive queries on remote hive server from spark, but for
some reasons i receive only column names(without data).
Data available in table, i checked it via HUE and java jdbc connection.
Here is my code example:
val test = spark.read
.option("url", "jdbc:hive2://
I am using
Dataset result = model.transform(testData).select("probability",
"label","features");
result.show(1000, false);
In this case the feature vector is being printed as output. Is there a way
that my original raw data gets printed instead of the feature vector OR is
there a way to reverse
To be more explicit, I used mapwithState() in my application, just like this:
stream = KafkaUtils.createStream(..)
mappedStream = stream.mapPartitionToPair(..)
stateStream = mappedStream.mapwithState(MyUpdateFunc(..))
stateStream.foreachRDD(..)
I call the jar in MyUpdateFunc(), and the jar reload
Agreed with Ayan.
Essentially an Edge node is a physical host or VM that is used by the
application to run the job. The users or service users start the process
from the Edge node. Edge nodes are added to the cluster for example
DEV/TEST/UAT etc.
Edge node normally has all compatible binaries in
35 matches
Mail list logo