currently i am using spark 0.9 on my data i wrote code in java for
sparksql.now i want to use spark 1.4 so how to do and what changes i have to
do for tables.i ahve .sql file,pom file,.py file. iam using s3 for storage
--
View this message in context:
http://apache-spark-user-list.1001560.n3.na
Hi Akhil,
Think of the scenario as running a piece of code in normal Java with
multiple threads. Lets say there are 4 threads spawned by a Java process to
handle reading from database, some processing and storing to database. In
this process, while a thread is performing a database I/O, the CPU co
Last time I checked, Camus doesn't support storing data as parquet, which
is a deal breaker for me. Otherwise it works well for my Kafka topics with
low data volume.
I am currently using spark streaming to ingest data, generate semi-realtime
stats and publish to a dashboard, and dump full dataset
Reposting my question from SO:
http://stackoverflow.com/questions/32161865/elasticsearch-analyze-not-compatible-with-spark-in-python
I'm using the elasticsearch-py client within PySpark using Python 3 and I'm
running into a problem using the analyze() function with ES in conjunction
with an RDD. I
Hi All,
We have a spark standalone cluster running 1.4.1 and we are setting
spark.io.compression.codec to lzf.
I have a long running interactive application which behaves as normal,
but after a few days I get the following exception in multiple jobs. Any
ideas on what could be causing this ?
https://www.youtube.com/watch?v=umDr0mPuyQc
On Sat, Aug 22, 2015 at 8:01 AM, Ted Yu wrote:
> See http://spark.apache.org/community.html
>
> Cheers
>
> On Sat, Aug 22, 2015 at 2:51 AM, Lars Hermes <
> li...@hermes-it-consulting.de> wrote:
>
>> subscribe
>>
>> -
To be perfectly clear, the direct kafka stream will also recover from any
failures, because it does the simplest thing possible - fail the task and
let spark retry it.
If you're consistently having socket closed problems on one task after
another, there's probably something else going on in your e
I think you also can give a try to this consumer :
http://spark-packages.org/package/dibbhatt/kafka-spark-consumer in your
environment. This has been running fine for topic with large number of
Kafka partition ( > 200 ) like yours without any issue.. no issue with
connection as this consumer re-use
On trying the consumer without external connections or with low
number of external conections its working fine -
so doubt is how socket got closed -
15/08/21 08:54:54 ERROR executor.Executor: Exception in task 262.0 in
stage 130.0 (TID 16332)
java.io.EOFException: Received -1 when reading from
Can you try some other consumer and see if the issue still exists?
On Aug 22, 2015 12:47 AM, "Shushant Arora"
wrote:
> Exception comes when client has so many connections to some another
> external server also.
> So I think Exception is coming because of client side issue only- server
> side ther
Thanks Akhil. Does this mean that the executor running in the VM can spawn
two concurrent jobs on the same core? If this is the case, this is what we
are looking for. Also, which version of Spark is this flag in?
Thanks,
Sateesh
On Sat, Aug 22, 2015 at 1:44 AM, Akhil Das
wrote:
> You can look a
Interesting. TD, can you please throw some light on why this is and point
to the relevant code in Spark repo. It will help in a better understanding
of things that can affect a long running streaming job.
On Aug 21, 2015 1:44 PM, "Tathagata Das" wrote:
> Could you periodically (say every 10 mins
In Spark 1.4, there was considerable refactoring around interaction with
Hive, such as SPARK-7491.
It would not be straight forward to port ORC support to 1.3
FYI
On Fri, Aug 21, 2015 at 10:21 PM, dong.yajun wrote:
> hi Ted,
>
> thanks for your reply, are there any other way to do this with sp
See http://spark.apache.org/community.html
Cheers
On Sat, Aug 22, 2015 at 2:51 AM, Lars Hermes
wrote:
> subscribe
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@sp
subscribe
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
On trying the consumer without external connections or with low number of
external conections its working fine -
so doubt is how socket got closed -
java.io.EOFException: Received -1 when reading from channel, socket
has likely been closed.
On Sat, Aug 22, 2015 at 7:24 PM, Akhil Das
wrote:
Hmm for a singl core VM you will have to run it in local mode(specifying
master= local[4]). The flag is available in all the versions of spark i
guess.
On Aug 22, 2015 5:04 AM, "Sateesh Kavuri" wrote:
> Thanks Akhil. Does this mean that the executor running in the VM can spawn
> two concurrent jo
Hi Rishitesh,
We are not using any RDD's to parallelize the processing and all of the
algorithm runs on a single core (and in a single thread). The parallelism
is done at the user level
The disk can be started in a separate IO, but then the executor will not be
able to take up more jobs, since th
1. how to work with partition in spark streaming from kafka
2. how to create partition in spark streaming from kafka
when i send the message from kafka topic having three partitions.
Spark will listen the message when i say kafkautils.createStream or
createDirectstSream have local[4]
Now i want
Exception comes when client has so many connections to some another
external server also.
So I think Exception is coming because of client side issue only- server
side there is no issue.
Want to understand is executor(simple consumer) not making new connection
to kafka broker at start of each tas
HI All,
Currently using DSE 4.7 and Spark 1.2.2 version
Regards,
Satish
On Fri, Aug 21, 2015 at 7:30 PM, java8964 wrote:
> What version of Spark you are using, or comes with DSE 4.7?
>
> We just cannot reproduce it in Spark.
>
> yzhang@localhost>$ more test.spark
> val pairs = sc.makeRDD(Seq((0
21 matches
Mail list logo