I don't think ss now support "partitioned" watermark. and why different
partition's consumption rate vary? If the handling logic is quite
different, using different topic is a better way.
On Fri, Sep 1, 2017 at 4:59 PM, 张万新 wrote:
> Thanks, it's true that looser watermark can guarantee more data
you may try foreachPartition
On Fri, Sep 1, 2017 at 10:54 PM, asethia wrote:
> Hi,
>
> I have list of person records in following format:
>
> case class Person(fName:String, city:String)
>
> val l=List(Person("A","City1"),Person("B","City2"),Person("C","City1"))
>
> val rdd:RDD[Person]=sc.parall
My settings are: Running Spark 2.1 on 3 node YARN cluster with 160 GB.
Dynamic allocation turned on. spark.executor.memory=6G,
spark.executor.cores=6
First, I am reading hive tables: orders(329MB) and lineitems(1.43GB) and
doing left outer join.
Next, I apply 7 different filter conditions based on
AFAIK, one of the side must be jdbc
On Fri, 1 Sep 2017 at 10:37 pm, HARSH TAKKAR wrote:
> Hi,
>
> I have just started using spark session, with hive enabled. but i am
> facing some issue while updating hive warehouse directory post spark
> session creation,
>
> usecase: i want to read data from
Is watermark always set using processing time or event time or both?
Any ideas @Tathagata? I'd be happy to contribute a patch if you can point me in
the right direction.
From: Karthik Palaniappan
Sent: Friday, August 25, 2017 9:15 AM
To: Akhil Das
Cc: user@spark.apache.org; t...@databricks.com
Subject: RE: [Spark Streaming] Streami
Thanks for the info
On Fri, Sep 1, 2017 at 12:06 PM, Nick Pentreath
wrote:
> No unfortunately not - as i recall storageLevel accesses some private
> methods to get the result.
>
> On Fri, 1 Sep 2017 at 17:55, Nathan Kronenfeld
>
> wrote:
>
>> Ah, in 2.1.0.
>>
>> I'm in 2.0.1 at the moment... i
No unfortunately not - as i recall storageLevel accesses some private
methods to get the result.
On Fri, 1 Sep 2017 at 17:55, Nathan Kronenfeld
wrote:
> Ah, in 2.1.0.
>
> I'm in 2.0.1 at the moment... is there any way that works that far back?
>
> On Fri, Sep 1, 2017 at 11:46 AM, Nick Pentreath
Ah, in 2.1.0.
I'm in 2.0.1 at the moment... is there any way that works that far back?
On Fri, Sep 1, 2017 at 11:46 AM, Nick Pentreath
wrote:
> Dataset does have storageLevel. So you can use isCached = (storageLevel !=
> StorageLevel.NONE) as a test.
>
> Arguably isCached could be added to data
Dataset does have storageLevel. So you can use isCached = (storageLevel !=
StorageLevel.NONE) as a test.
Arguably isCached could be added to dataset too, shouldn't be a
controversial change.
On Fri, 1 Sep 2017 at 17:31, Nathan Kronenfeld
wrote:
> I'm currently porting some of our code from RDDs
I'm currently porting some of our code from RDDs to Datasets.
With RDDs it's pretty easy to figure out if they are cached or not.
I notice that the catalog has a function for determining this on Datasets
too, but it's private[sql]. Is there any reason for it not to be public?
Is there any way at
Hi,
I have list of person records in following format:
case class Person(fName:String, city:String)
val l=List(Person("A","City1"),Person("B","City2"),Person("C","City1"))
val rdd:RDD[Person]=sc.parallelize(l)
val groupBy:RDD[(String, Iterable[Person])]=rdd.groupBy(_.city)
I would like to sav
Hi,
I have just started using spark session, with hive enabled. but i am facing
some issue while updating hive warehouse directory post spark session
creation,
usecase: i want to read data from hive one cluster and write to hive on
another cluster
Please suggest if this can be done?
Thanks, it's true that looser watermark can guarantee more data not be
dropped, but at the same time more state need to be kept. I just consider
if there is sth like kafka-partition-aware watermark in flink in SS may be
a better solution.
Tathagata Das 于2017年8月31日周四 上午9:13写道:
> Why not set the
14 matches
Mail list logo