Re: Syndicate Apache Spark Twitter to Mastodon?

2022-11-30 Thread Dmitry
ffiliated > friends moving to Mastodon (personally I now do both). > > On Wed, Nov 30, 2022 at 3:17 PM Dmitry wrote: > >> Hello, >> Does any long-term statistics about number of developers who moved to >> mastodon and activity use exists? >> >> I believe t

Re: Syndicate Apache Spark Twitter to Mastodon?

2022-11-30 Thread Dmitry
Hello, Does any long-term statistics about number of developers who moved to mastodon and activity use exists? I believe the most devs are still using Twitter. чт, 1 дек. 2022 г., 01:35 Holden Karau : > Do we want to start syndicating Apache Spark Twitter to a Mastodon > instance. It seems like

There is no way to force partition discovery if _spark_metadata exists

2019-01-16 Thread Dmitry
Hello, I have two stage processing pipeline: 1. Spark streaming job receives data from kafka and saves it to partitioned orc 2. There is spark etl job that runs ones per day that compact each partition( i have two variables for partitioning dt=20180529/location=mumbai ( merge small files to bigg

Re: Easy way to get offset metatada with Spark Streaming API

2017-09-14 Thread Dmitry Naumenko
Nice, thanks again Michael for helping out. Dmitry 2017-09-14 21:37 GMT+03:00 Michael Armbrust : > Yep, that is correct. You can also use the query ID which is a GUID that > is stored in the checkpoint and preserved across restarts if you want to > distinguish the batches from

Re: Easy way to get offset metatada with Spark Streaming API

2017-09-14 Thread Dmitry Naumenko
hId. Dmitry 2017-09-13 22:12 GMT+03:00 Michael Armbrust : > I think the right way to look at this is the batchId is just a proxy for > offsets that is agnostic to what type of source you are reading from (or > how many sources their are). We might call into a custom sink with the > sa

Re: Easy way to get offset metatada with Spark Streaming API

2017-09-13 Thread Dmitry Naumenko
ly, just ignore intermediate data, re-read from Kafka and re-try processing and load)? Dmitry 2017-09-12 22:43 GMT+03:00 Michael Armbrust : > In the checkpoint directory there is a file /offsets/$batchId that holds > the offsets serialized as JSON. I would not consider this a public stable > AP

Re: Easy way to get offset metatada with Spark Streaming API

2017-09-12 Thread Dmitry Naumenko
Thanks for response, Michael > You should still be able to get exactly once processing by using the batchId that is passed to the Sink. Could you explain this in more detail, please? Is there some kind of offset manager API that works as get-offset by batch id lookup table? Dmitry 2017-09

Re: Easy way to get offset metatada with Spark Streaming API

2017-09-12 Thread Dmitry Naumenko
Thanks, Cody Unfortunately, it seems to be there is no active development right now. Maybe I can step in and help with it somehow? Dmitry 2017-09-11 21:01 GMT+03:00 Cody Koeninger : > https://issues-test.apache.org/jira/browse/SPARK-18258 > > On Mon, Sep 11, 2017 at 7:15 AM, Dmitry

Easy way to get offset metatada with Spark Streaming API

2017-09-11 Thread Dmitry Naumenko
have offsets", so why it's not a part of Public API? What do you think about supporting it? Dmitry

Re: [VOTE] Release Apache Spark 2.0.0 (RC2)

2016-07-11 Thread Dmitry Zhukov
Sorry for bringing this topic up. Any updates here? Really looking forward to the upcoming RC. Thanks! On Wed, Jul 6, 2016 at 6:19 PM, Ted Yu wrote: > Running the following command: > build/mvn clean -Phive -Phive-thriftserver -Pyarn -Phadoop-2.6 -Psparkr > -Dhadoop.version=2.7.0 package > > T

Re: branch-2.0 is now 2.0.1-SNAPSHOT?

2016-07-11 Thread Dmitry Zhukov
So, as I understand the correct git branch to maven version mapping should be the following: branch-2.0 -> 2.0.0-SNAPSHOT master -> 2.1.0-SNAPSHOT but the current is branch-2.0 -> 2.0.1-SNAPSHOT master -> 2.0.0-SNAPTHOT We are starting to play with Spark 2.0 in TransferWise and find the versio

SPARK-15465 - AnalysisException: cannot cast StructType to VectorUDT

2016-07-11 Thread Dmitry Zhukov
Hi! I want to bring this issue of Spark 2.0 here https://issues.apache.org/jira/browse/SPARK-15465. It looks quite major (I would even say critical) to me. Should it be fixed within RC? I would also like to contribute myself but struggle to find a place where to start... Thanks! -- Dmitry

Re: [discuss] dropping Python 2.6 support

2016-01-10 Thread Dmitry Kniazev
st in the same environment. For example, we use virtualenv to run Spark with Python 2.7 and do not touch system Python 2.6. Thank you, Dmitry 09.01.2016, 06:36, "Sasha Kacanski" : > +1 > Companies that use stock python in redhat 2.6 will need to upgrade or install > fresh vers

[no subject]

2015-11-26 Thread Dmitry Tolpeko