Hello,
when I set spark.executor.cores e.g. to 8 cores and spark.executor.memory
to 8GB. It can allocate more executors with less cores for my app but each
executors gets 8GB RAM.
It is a problem because I can allocate more memory across cluster than
expected, the worst case is 8x 1core executors,
Hi,
I understand it just as that they will provide some lower latency interface
and probably using jdbc so that 3rd party BI tools can integrate and query
streams like they would be static datasets. If BI will repeat the query it
will be updated. I don't know if BI tools are already heading towards
I have to ask my colleague if there is any specific error but I think it
just doesn't see files.
Petr
On Thu, Apr 21, 2016 at 11:54 AM, Petr Novak wrote:
> Hello,
> Impala (v2.1.0, Spark 1.6.0) can't read partitioned Parquet files saved
> from DF.partitionBy (using Python).
Hello,
Impala (v2.1.0, Spark 1.6.0) can't read partitioned Parquet files saved
from DF.partitionBy (using Python). Is there any known reason, some config?
Or it should generally work hence it is likely to be something wrong solely
on our side?
Many thanks,
Petr
Hi all,
I believe that it used to be in documentation that Standalone mode is not
for production. I'm either wrong or it was already removed.
Having a small cluster between 5-10 nodes is Standalone recommended for
production? I would like to go with Mesos but the question is if there is
real add-o
How Arrows collide with Tungsten and its binary in-memory format. It will
still has to convert between them. I assume they use similar
concepts/layout hence it is likely the conversion can be quite efficient.
Or is there a change that the current Tungsten in memory format would be
replaced by Arrow
Hi all,
based on documenation:
"Spark 1.6.0 is designed for use with Mesos 0.21.0 and does not require any
special patches of Mesos."
We are considering Mesos for our use but this concerns me a lot. Mesos is
currently on v0.27 which we need for its Volumes feature. But Spark locks
us to 0.21 only
You can have offsetRanges on workers f.e.
object Something {
var offsetRanges = Array[OffsetRange]()
def create[F : ClassTag](stream: InputDStream[Array[Byte]])
(implicit codec: Codec[F]: DStream[F] = {
stream transform { rdd =>
offsetRanges = rdd.asInstanceOf[HasOffsetRanges]
I'm sorry. Both approaches actually work. It was something else wrong with
my cluster. Petr
On Fri, Sep 25, 2015 at 4:53 PM, Petr Novak wrote:
> Either setting it programatically doesn't work:
> sparkConf.setIfMissing("class", "...Main")
>
> In my curr
Either setting it programatically doesn't work:
sparkConf.setIfMissing("class", "...Main")
In my current setting moving main to another package requires to propagate
change to deploy scripts. Doesn't matter I will find some other way. Petr
On Fri, Sep 25, 2015
Ortherwise it seems it tries to load from a checkpoint which I have deleted
and cannot be found. Or it should work and I have wrong something else.
Documentation doesn't mention option with jar manifest, so I assume it
doesn't work this way.
Many thanks,
Petr
, 2015 at 12:14 PM, Petr Novak wrote:
> Many thanks Cody, it explains quite a bit.
>
> I had couple of problems with checkpointing and graceful shutdown moving
> from working code in Spark 1.3.0 to 1.5.0. Having InterruptedExceptions,
> KafkaDirectStream couldn't initialize,
Hi,
I have 2 streams and checkpointing with code based on documentation. One
stream is transforming data from Kafka and saves them to Parquet file. The
other stream uses the same stream and does updateStateByKey to compute some
aggregations. There is no gracefulShutdown.
Both use about this code t
If you need to understand what is the magic Product then google up
Algebraic Data Types and learn it together with what is Sum type. One
option is http://www.stephanboyer.com/post/18/algebraic-data-types
Enjoy,
Petr
On Wed, Sep 23, 2015 at 9:07 AM, Petr Novak wrote:
> I'm unsu
I'm unsure if it completely equivalent to a case class and if it has some
limitations compared to case class or if it needs some more methods
implemented.
Petr
On Wed, Sep 23, 2015 at 9:04 AM, Petr Novak wrote:
> You can implement your own case class supporting more then 22 fields.
You can implement your own case class supporting more then 22 fields. It is
something like:
class MyRecord(val val1: String, val val2: String, ... more then 22,
in this case f.e. 26)
extends Product with Serializable {
def canEqual(that: Any): Boolean = that.isInstanceOf[MyRecord]
def prod
AM, Petr Novak wrote:
> Ahh the problem probably is async ingestion to Spark receiver buffers,
> hence WAL is required I would say.
>
> Petr
>
> On Tue, Sep 22, 2015 at 10:53 AM, Petr Novak wrote:
>
>> If MQTT can be configured with long enough timeout for ACK and can bu
Ahh the problem probably is async ingestion to Spark receiver buffers,
hence WAL is required I would say.
Petr
On Tue, Sep 22, 2015 at 10:53 AM, Petr Novak wrote:
> If MQTT can be configured with long enough timeout for ACK and can buffer
> enough events while waiting for Spark Job r
restart Spark job.
On Tue, Sep 22, 2015 at 10:53 AM, Petr Novak wrote:
> Ahh the problem probably is async ingestion to Spark receiver buffers,
> hence WAL is required I would say.
>
> On Tue, Sep 22, 2015 at 10:52 AM, Petr Novak wrote:
>
>> If MQTT can be configured with lo
And probably the original source code
https://gist.github.com/koen-dejonghe/39c10357607c698c0b04
On Tue, Sep 22, 2015 at 10:37 AM, Petr Novak wrote:
> To complete design pattern:
>
> http://stackoverflow.com/questions/30450763/spark-streaming-and-connection-pool-implementation
>
&
u wrote:
>>>>
>>>>> You can use broadcast variable for passing connection information.
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Sep 21, 2015, at 4:27 AM, Priya Ch
>>>>> wrote:
>>>>>
>
We have tried on another cluster installation with the same effect.
Petr
On Mon, Sep 21, 2015 at 10:45 AM, Petr Novak wrote:
> It might be connected with my problems with gracefulShutdown in Spark
>> 1.5.0 2.11
>> https://mail.google.com/mail/#search/petr/14fb6bd5166f9395
>
Nice, thanks.
So the note in build instruction for 2.11 is obsolete? Or there are still
some limitations?
http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-211
On Fri, Sep 11, 2015 at 2:19 PM, Petr Novak wrote:
> Nice, thanks.
>
> So the note in build instru
Great work.
On Fri, Sep 18, 2015 at 6:51 PM, Harish Butani
wrote:
> Hi,
>
> I have just posted a Blog on this:
> https://www.linkedin.com/pulse/combining-druid-spark-interactive-flexible-analytics-scale-butani
>
> regards,
> Harish Butani.
>
> On Tue, Sep 1, 2015 at 11:46 PM, Paolo Platter
> wr
add @transient?
On Mon, Sep 21, 2015 at 11:36 AM, Petr Novak wrote:
> add @transient?
>
> On Mon, Sep 21, 2015 at 11:27 AM, Priya Ch
> wrote:
>
>> Hello All,
>>
>> How can i pass sparkContext as a parameter to a method in an object.
>> Be
on, Sep 21, 2015 at 11:26 AM, Petr Novak wrote:
> I should read my posts at least once to avoid so many typos. Hopefully you
> are brave enough to read through.
>
> Petr
>
> On Mon, Sep 21, 2015 at 11:23 AM, Petr Novak wrote:
>
>> I think you would have to persist events
I should read my posts at least once to avoid so many typos. Hopefully you
are brave enough to read through.
Petr
On Mon, Sep 21, 2015 at 11:23 AM, Petr Novak wrote:
> I think you would have to persist events somehow if you don't want to miss
> them. I don't see any other opti
I think you would have to persist events somehow if you don't want to miss
them. I don't see any other option there. Either in MQTT if it is supported
there or routing them through Kafka.
There is WriteAheadLog in Spark but you would have decouple stream MQTT
reading and processing into 2 separate
g on Spark
provided version is fine for us for now.
Regards,
Petr
On Mon, Sep 21, 2015 at 11:06 AM, Petr Novak wrote:
> Internally Spark is using json4s and jackson parser v3.2.10, AFAIK. So if
> you are using Scala they should be available without adding dependencies.
> There is v3.2.11 al
Internally Spark is using json4s and jackson parser v3.2.10, AFAIK. So if
you are using Scala they should be available without adding dependencies.
There is v3.2.11 already available but adding to my app was causing
NoSuchMethod exception so I would have to shade it. I'm simply staying on
v3.2.10 f
n?
I would just need a confirmation from community that checkpointing and
graceful shutdown is actually working with KafkaDirectStream on 1.5.0 so
that I can look for a problem on my side.
Many thanks,
Petr
On Sun, Sep 20, 2015 at 12:58 PM, Petr Novak wrote:
> Hi Michal,
> yes, it is there
val topics="first"
shouldn't it be val topics = Set("first") ?
On Sun, Sep 20, 2015 at 1:01 PM, Petr Novak wrote:
> val topics="first"
>
> shouldn't it be val topics = Set("first") ?
>
> On Sat, Sep 19, 2015 at 10:07 PM,
shutdown hook
Thanks,
Petr
On Sat, Sep 19, 2015 at 4:01 AM, Michal Čizmazia wrote:
> Hi Petr, after Ctrl+C can you see the following message in the logs?
>
> Invoking stop(stopGracefully=false)
>
> Details:
> https://github.com/apache/spark/pull/6307
>
>
> On 18 Septembe
It might be connected with my problems with gracefulShutdown in Spark 1.5.0
2.11
https://mail.google.com/mail/#search/petr/14fb6bd5166f9395
Maybe Ctrl+C corrupts checkpoints and breaks gracefulShutdown?
Petr
On Fri, Sep 18, 2015 at 4:10 PM, Petr Novak wrote:
> ...to ensure it is not someth
.spark.SparkContext)
[2015-09-11 22:33:05,899] INFO Shutdown hook called
(org.apache.spark.util.ShutdownHookManager)
[2015-09-11 22:33:05,899] INFO Deleting directory
/dfs/spark/tmp/spark-b466fc2e-9ab8-4783-87c2-485bac5c3cd6
(org.apache.spark.util.ShutdownHookManager)
Thanks,
Petr
On Mon, Sep
...to ensure it is not something wrong on my cluster.
On Fri, Sep 18, 2015 at 4:09 PM, Petr Novak wrote:
> I have tried it on Spark 1.3.0 2.10 and it works. The same code doesn't on
> Spark 1.5.0 2.11. It would be nice if anybody could try on another
> installation to ensure i
I have tried it on Spark 1.3.0 2.10 and it works. The same code doesn't on
Spark 1.5.0 2.11. It would be nice if anybody could try on another
installation to ensure it is something wrong on my cluster.
Many thanks,
Petr
On Fri, Sep 18, 2015 at 4:07 PM, Petr Novak wrote:
> This one is g
This one is generated, I suppose, after Ctrl+C
15/09/18 14:38:25 INFO Worker: Asked to kill executor
app-20150918143823-0001/0
15/09/18 14:38:25 INFO Worker: Asked to kill executor
app-20150918143823-0001/0
15/09/18 14:38:25 DEBUG
AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handle
Hi all,
it throws FileBasedWriteAheadLogReader: Error reading next item, EOF reached
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at
org.apache.spark.streaming.util.FileBaseWriteAheadLogReader.hasNext(FileBasedWriteAheadLogReader.scala:47)
WAL is not enable
Does it still apply for 1.5.0?
What actual limitation does it mean when I switch to 2.11? No JDBC
Thriftserver? No JDBC DataSource? No JdbcRDD (which is already obsolete I
believe)? Some more?
What library is the blocker to upgrade JDBC component to 2.11?
Is there any estimate when it could be a
Hello,
my Spark streaming v1.3.0 code uses
sys.ShutdownHookThread {
ssc.stop(stopSparkContext = true, stopGracefully = true)
}
to use Ctrl+C in command line to stop it. It returned back to command line
after it finished batch but it doesn't with v1.4.0-v.1.5.0. Was the
behaviour or required cod
Hello,
sqlContext.parquetFile(dir)
throws exception " Unable to instantiate
org.apache.hadoop.hive.metastore.HiveMetaStoreClient"
The strange thing is that on the second attempt to open the file it is
successful:
try {
sqlContext.parquetFile(dir)
} catch {
case e: Exception => sqlCont
The same as
https://mail.google.com/mail/#label/Spark%2Fuser/14f64c75c15f5ccd
Please follow the discussion there.
On Tue, Aug 25, 2015 at 5:02 PM, Petr Novak wrote:
> Hi all,
> when I read parquet files with "required" fields aka nullable=false they
> are read correctl
Hi all,
when I read parquet files with "required" fields aka nullable=false they
are read correctly. Then I save them (df.write.parquet) and read again all
my fields are saved and read as optional, aka nullable=true. Which means I
suddenly have files with incompatible schemas. This happens on 1.3.0
ogger.info(...)
Iterator.empty
} foreach {
(_: Nothing) =>
}
}
}
Many thanks for any advice, I'm sure its a noob question.
Petr
On Mon, Aug 17, 2015 at 1:12 PM, Petr Novak wrote:
> Or can I generally create new RDD from transformation and enrich its
> partitio
Or can I generally create new RDD from transformation and enrich its
partitions with some metadata so that I would copy OffsetRanges in my new
RDD in DStream?
On Mon, Aug 17, 2015 at 1:08 PM, Petr Novak wrote:
> Hi all,
> I need to transform KafkaRDD into a new stream of deserialize
Hi all,
I need to transform KafkaRDD into a new stream of deserialized case
classes. I want to use the new stream to save it to file and to perform
additional transformations on it.
To save it I want to use offsets in filenames, hence I need OffsetRanges in
transformed RDD. But KafkaRDD is private
Hello,
I would like to switch from Scala 2.10 to 2.11 for Spark app development.
It seems that the only thing blocking me is a missing
spark-streaming-kafka_2.11 maven package. Any plan to add it or am I just
blind?
Many thanks,
Vladimir
Thank you. HADOOP_CONF_DIR has been missing.
On Wed, Sep 24, 2014 at 4:48 PM, Matt Narrell
wrote:
> Yes, this works. Make sure you have HADOOP_CONF_DIR set on your Spark
> machines
>
> mn
>
> On Sep 24, 2014, at 5:35 AM, Petr Novak wrote:
>
> Hello,
> if our Hadoop
Hello,
if our Hadoop cluster is configured with HA and "fs.defaultFS" points to a
namespace instead of a namenode hostname - hdfs:/// - then
our Spark job fails with exception. Is there anything to configure or it is
not implemented?
Exception in thread "main" org.apache.spark.SparkException: Job
50 matches
Mail list logo