but should work for hive tables).
>
> Michael
>
> On Tue, Dec 1, 2015 at 10:55 AM, Krzysztof Zarzycki
> wrote:
>
>> Hi there,
>> Do you know how easily I can get a list of all files of a Hive table?
>>
>> What I want to achieve is to get all files that
Hi there,
Do you know how easily I can get a list of all files of a Hive table?
What I want to achieve is to get all files that are underneath parquet
table and using sparksql-protobuf[1] library(really handy library!) and its
helper class ProtoParquetRDD:
val protobufsRdd = new ProtoParquetRDD(s
Hi there,
I have a serious problem in my Hadoop cluster, that YARN Timeline server
generates very high load, 800% CPU when there are 8 Spark Streaming jobs
running in parallel.
I discuss this problem on Hadoop group in parallel:
http://mail-archives.apache.org/mod_mbox/hadoop-user/201509.mbox/%3CC
ice
>> on this, from people who implemented anything on this.
>>
>> On Fri, Sep 18, 2015 at 2:35 AM, Krzysztof Zarzycki > > wrote:
>>
>>> Hi there Spark Community,
>>> I would like to ask you for an advice: I'm running Spark Streaming
I'm also interested in this feature. Did you guys found some information
about how to use Hive Streaming with Spark Streaming?
Thanks,
Krzysiek
2015-07-17 20:16 GMT+02:00 unk1102 :
> Hi I have similar use case did you found solution for this problem of
> loading
> DStreams in Hive using Spark St
Hi there Spark Community,
I would like to ask you for an advice: I'm running Spark Streaming jobs in
production. Sometimes these jobs fail and I would like to get email
notification about it. Do you know how I can set up Spark to notify me by
email if my job fails? Or do I have to use external moni
works that might do it more convenient (Samza, Flink or just
being-designed Kafka-Streams
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-28+-+Add+a+processor+client>
)
Thanks Dibyendu for your note, I will strongly consider it, when falling
back to receiver-based approach.
Cheers,
Kr
Thanks guys for your answers. I put my answers in text, below.
Cheers,
Krzysztof Zarzycki
2015-09-10 15:39 GMT+02:00 Cody Koeninger :
> The kafka direct stream meets those requirements. You don't need
> checkpointing for exactly-once. Indeed, unless your output operations are
&
xamples of manually managing ZK offsets?
Thanks,
Krzysztof
2015-09-10 12:22 GMT+02:00 Akhil Das :
> This consumer pretty much covers all those scenarios you listed
> github.com/dibbhatt/kafka-spark-consumer Give it a try.
>
> Thanks
> Best Regards
>
> On Thu, Sep 10, 2015
e able to upgrade code & not lose Kafka offsets?
Thank you a lot for your answers,
Krzysztof Zarzycki
one can help? Of course the original problem stays
open.
Thanks!
Krzysiek
2015-08-09 14:19 GMT+02:00 Krzysztof Zarzycki :
> Hi there,
> I have a problem with a spark streaming job running on Spark 1.4.1, that
> appends to parquet table.
>
> My job receives json strings and creates Jso
Hi there,
I have a problem with a spark streaming job running on Spark 1.4.1, that
appends to parquet table.
My job receives json strings and creates JsonRdd out of it. The jsons might
come in different shape as most of the fields are optional. But they never
have conflicting schemas.
Next, for e
Hi everyone,
I have pretty challenging problem with reading/writing multiple parquet
files with streaming, but let me introduce my data flow:
I have a lot of json events streaming to my platform. All of them have the
same structure, but fields are mostly optional. Some of the fields are
arrays wit
This is a common use of Spark Streaming +
> Cassandra/HBase.
>
> Regarding the performance of updateStateByKey, we are aware of the
> limitations, and we will improve it soon :)
>
> TD
>
>
> On Tue, Apr 14, 2015 at 12:34 PM, Krzysztof Zarzycki > wrote:
>
>> H
Hey guys, could you please help me with a question I asked on
Stackoverflow:
https://stackoverflow.com/questions/29635681/is-it-feasible-to-keep-millions-of-keys-in-state-of-spark-streaming-job-for-two
? I'll be really grateful for your help!
I'm also pasting the question below:
I'm trying to so
15 matches
Mail list logo