Using Self signed root ca for https connection in eleasticsearchIO

2020-03-22 Thread Mohil Khare
e Letsencrypt root certificater for internal services as it expires every three months. Is there a way, just like kafkaIO, to use selfsigned root certificate with elasticsearchIO? Or is there a way to update java cacerts on worker VMs where beam job is running? Looking forward for some suggesti

Keeping keys in a state for a very very long time (keys expiry unknown)

2020-04-06 Thread Mohil Khare
Hello all, We are attempting a implement a use case where beam (java sdk) reads two kind of records from data stream like Kafka: 1. Records of type A containing key and corresponding metadata. 2. Records of type B containing the same key, but no metadata. Beam then needs to fill metadata for recor

Re: Using Self signed root ca for https connection in eleasticsearchIO

2020-04-06 Thread Mohil Khare
Any update on this? Shall I open a jira for this support ? Thanks and regards Mohil On Sun, Mar 22, 2020 at 9:36 PM Mohil Khare wrote: > Hi, > This is Mohil from Prosimo, a small bay area based stealth mode startup. > We use Beam (on version 2.19) with google dataflow in our

Re: Keeping keys in a state for a very very long time (keys expiry unknown)

2020-04-06 Thread Mohil Khare
re the value again in key > StateA-metadata. > > Cheers > > On Tue, 7 Apr 2020 at 08:03, Mohil Khare wrote: > >> Hello all, >> We are attempting a implement a use case where beam (java sdk) reads two >> kind of records from data stream like Kafka: >> >&

Re: Keeping keys in a state for a very very long time (keys expiry unknown)

2020-04-06 Thread Mohil Khare
bound. > > Sorry, its not a yes / no answer :-) > > On Tue, 7 Apr 2020 at 09:03, Mohil Khare wrote: > >> Thanks a lot Reza for your quick response. Yeah saving the data in an >> external system after timer expiry makes sense. >> So do you suggest using a global wind

Re: Keeping keys in a state for a very very long time (keys expiry unknown)

2020-04-06 Thread Mohil Khare
ion/patterns/overview/ > > This would make a great one! > > On Tue, 7 Apr 2020 at 09:12, Mohil Khare wrote: > >> No ... that's a valid answer. Since I wanted to have a long window size >> per key and since we can't use state with session windows, I am using a >>

Re: Using Self signed root ca for https connection in eleasticsearchIO

2020-04-09 Thread Mohil Khare
--filesToStage on > the command line or setFilesToStage in Java code. Am I correct that this > symptom is confirmed? > > Kenn > > On Mon, Apr 6, 2020 at 5:04 PM Mohil Khare wrote: > >> Any update on this? Shall I open a jira for this support ? >> >> Thanks and

Re: Keeping keys in a state for a very very long time (keys expiry unknown)

2020-05-17 Thread Mohil Khare
Hi Reza and others, As suggested, I have opened https://issues.apache.org/jira/browse/BEAM-10019 which I think might be a good addition to beam pipeline patterns. Thanks Mohil On Mon, Apr 6, 2020 at 6:28 PM Mohil Khare wrote: > Sure thing.. I would love to contribute. > > Thank

Re: Using Self signed root ca for https connection in eleasticsearchIO

2020-05-17 Thread Mohil Khare
st store. Just make sure it's properly set > up in your META-INF/services. > > This is supported by Dataflow and all PortableRunners that use a separate > process/container for the worker. > > 1: > https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/a

Timers exception on "Job Drain" while using stateful beam processing in global window

2020-05-17 Thread Mohil Khare
Hello, I have a use case where I have two sets of PCollections (RecordA and RecordB) coming from a real time streaming source like Kafka. Both Records are correlated with a common key, let's say KEY. The purpose is to enrich RecordA with RecordB's data for which I am using CoGbByKey. Since Recor

Re: Timers exception on "Job Drain" while using stateful beam processing in global window

2020-05-18 Thread Mohil Khare
"MyKey" won't last for more than 60-90 secs*), and my use case seems to be working fine. Even DRAIN is working successfully. Thanks Mohil On Sun, May 17, 2020 at 3:37 PM Mohil Khare wrote: > Hello, > > I have a use case where I have two sets of PCollections (RecordA and

Re: Timers exception on "Job Drain" while using stateful beam processing in global window

2020-05-18 Thread Mohil Khare
hrown inside the Dataflow runner. Are you able to file a JIRA for this bug? > > On Mon, May 18, 2020 at 10:44 AM Robert Bradshaw > wrote: > >> Glad you were able to get this working; thanks for following up. >> >> On Mon, May 18, 2020 at 10:35 AM Mohil Khare wrote:

Re: Timers exception on "Job Drain" while using stateful beam processing in global window

2020-05-20 Thread Mohil Khare
Hi Reuven, all, I have opened following jira to track this issue: https://issues.apache.org/jira/browse/BEAM-10053 Thanks and Regards Mohil On Mon, May 18, 2020 at 12:28 PM Mohil Khare wrote: > Hi Reuven, > > Thanks for your reply. Well, I haven't filed JIRA yet. But if it look

Need suggestion/help for use case (usage of the side input pattern and sliding window)

2020-05-27 Thread Mohil Khare
Hi everyone, I need a suggestion regarding usage of the side input pattern and sliding window, especially while replaying old kafka logs/offsets. FYI: I am running beam 2.19 on google dataflow. I have a use case where I read a continuous stream of data from Kafka and need to calculate one score (

Re: Need suggestion/help for use case (usage of the side input pattern and sliding window)

2020-05-30 Thread Mohil Khare
Hello all, Any suggestions? Where am I going wrong or is there any better way of achieving this so that I can do replay as well ? Thanks Mohil On Wed, May 27, 2020 at 11:40 AM Mohil Khare wrote: > Hi everyone, > I need a suggestion regarding usage of the side input pattern and s

KafkaIO write in case on topic name present in PCollection

2020-06-01 Thread Mohil Khare
Hello everyone, Does anyone know if it is possible to provide a topic name embedded in a PCollection object to kafkaIO while writing ? We have a use case where we have a team specific kafka topic for eg teamA_topicname, teamB_topicname. >From beam, we create PCollection> and we need to send thi

Re: Need suggestion/help for use case (usage of the side input pattern and sliding window)

2020-06-01 Thread Mohil Khare
ces > to the end of windows timestamp + 360 days before something is output from > the grouping aggregation/available at the side input. > > > On Sat, May 30, 2020 at 12:17 PM Mohil Khare wrote: > >> Hello all, >> >> Any suggestions? Where am I going wrong or is

Re: KafkaIO write in case on topic name present in PCollection

2020-06-02 Thread Mohil Khare
quot;results”) // default sink topic > .withKeySerializer(...) > .withValueSerializer(...)); > > > > On 2 Jun 2020, at 03:27, Mohil Khare wrote: > > Hello everyone, > > Does anyone know if it is possible to provide a topic name embedded in a > PCollecti

Re: Need suggestion/help for use case (usage of the side input pattern and sliding window)

2020-06-03 Thread Mohil Khare
. > > 1: > https://beam.apache.org/documentation/patterns/side-inputs/#slowly-updating-side-input-using-windowing > > On Mon, Jun 1, 2020 at 8:27 PM Mohil Khare wrote: > >> Thanks Luke for your reply. >> I see. I am trying to recall why I added allowedLateness as 360 days. >&g

Re: Getting PC: @ 0x7f7490e08602 (unknown) raise and Check failure stack trace: ***

2020-06-25 Thread Mohil Khare
BTW, just to make sure that there is no issue with any individual PTransform, I enabled each one of them one by one and the pipeline started successfully. Issue happens as soon as I enable more than one new aforementioned PTransform. Thanks and regards Mohil On Thu, Jun 25, 2020 at 1:26 AM Mohil

Re: Getting PC: @ 0x7f7490e08602 (unknown) raise and Check failure stack trace: ***

2020-06-25 Thread Mohil Khare
Mohil Khare wrote: > Hi Luke, > Thanks for your response, I tried looking at worker logs using the logging > service of GCP and unable to get a clear picture. Not sure if its due to > memory pressure or low number of harness threads. > Attaching a few more screenshots of crash logs

Re: Getting PC: @ 0x7f7490e08602 (unknown) raise and Check failure stack trace: ***

2020-06-25 Thread Mohil Khare
ch transforms and it was working fine. Yesterday I added a few more and started seeing crashes. If I enable just one of the newly added PCollectionView transforms (keeping old 3-4 intact), then everything works fine. Moment I enable another new transform, a crash happens. Hope this provides some

Re: Getting PC: @ 0x7f7490e08602 (unknown) raise and Check failure stack trace: ***

2020-06-25 Thread Mohil Khare
evaluating to make sure that there is no performance degradation by doing so. Thanks and regards Mohil On Thu, Jun 25, 2020 at 11:44 AM Mohil Khare wrote: > Hi Luke, > > Let me give you some more details about the code. > As I mentioned before, I am using java sdk 2.19.0 on dataflow.

Re: Getting PC: @ 0x7f7490e08602 (unknown) raise and Check failure stack trace: ***

2020-06-26 Thread Mohil Khare
it generated tags that weren't unique enough. > > I would open up a case with Dataflow support with all the information you > have provided here. > > 1: https://issues.apache.org/jira/browse/BEAM-4549 > 2: https://issues.apache.org/jira/browse/BEAM-4534 > > On Thu, Jun

Re: Getting PC: @ 0x7f7490e08602 (unknown) raise and Check failure stack trace: ***

2020-06-26 Thread Mohil Khare
Sure not a problem. I will open one. Thanks Mohil On Fri, Jun 26, 2020 at 10:55 AM Luke Cwik wrote: > Sorry, I didn't open a support case. You should open the support case. > > On Fri, Jun 26, 2020 at 10:41 AM Mohil Khare wrote: > >> Thanks a lot Luke for following u

Re: Getting PC: @ 0x7f7490e08602 (unknown) raise and Check failure stack trace: ***

2020-06-26 Thread Mohil Khare
I have opened following ticket: https://issues.apache.org/jira/browse/BEAM-10339 Please let me know if there some other place where I need to open a support ticket. Thank you Mohil On Fri, Jun 26, 2020 at 11:13 AM Mohil Khare wrote: > Sure not a problem. I will open one. > > Thank

Unable to read value from state/Unable to fetch data due to token mismatch for key

2020-07-08 Thread Mohil Khare
Hello, I am using beam java sdk 2.19.0 (with enableStreamingEngine set as true) and very heavily use stateful beam processing model. However, sometimes I am seeing the following exception while reading value from state for a key (Please note : here my key is a POJO where fields create a kind of co

Re: Unable to read value from state/Unable to fetch data due to token mismatch for key

2020-07-09 Thread Mohil Khare
source of work rebalancing), so the in-progress work item on that > worker failed. However the work item will be processed successfully on the > new worker that owns it. This should not cause a persistent failure. > > On Wed, Jul 8, 2020 at 9:53 PM Mohil Khare wrote: > >> Hello,

Carry forward state information from one window to next

2020-07-10 Thread Mohil Khare
Hello, I am using beam on dataflow with java sdk 2.19.0. I have a use case where I need to collect some messages in a short window of few seconds to 1 minute, update state (stateful processing of beam) and carry forward this state information to next window and use this to initialize state of next

Re: FileIO write to new folder every hour.

2020-07-10 Thread Mohil Khare
Hello Julius, Well I do something similar using FileIO.Write.FileNaming i,e, https://beam.apache.org/releases/javadoc/2.4.0/org/apache/beam/sdk/io/FileIO.Write.FileNaming.html You can do something like following: .apply(FileIO.*write*() .via(ParquetIO.*sink*(*getOutput_schema*()))

Re: FileIO write to new folder every hour.

2020-07-10 Thread Mohil Khare
dow; return String.format( %s/file.parquet", subpath, * DATE_FORMAT.print(intervalWindow.start()*); } } Regards Mohil On Fri, Jul 10, 2020 at 3:39 PM Mohil Khare wrote: > Hello Julius, > > Well I do something similar using FileIO

On DRAIN: Attempt to deliver a timer to a DoFn, but timers are not supported in Dataflow.

2020-07-24 Thread Mohil Khare
Hello, I am on java sdk 2.19 and using dataflow for beam job. I use Timers for my stateful transformations, but recently I started seeing the following exception on DRAINING a job. It used to work fine and not sure what changed. java.lang.UnsupportedOperationException: 1. 1. atorg.apach

Re: On DRAIN: Attempt to deliver a timer to a DoFn, but timers are not supported in Dataflow.

2020-07-24 Thread Mohil Khare
07-24 17:06:53.863 PDT Uncaught exception: On Fri, Jul 24, 2020 at 2:59 PM Mohil Khare wrote: > Hello, > > I am on java sdk 2.19 and using dataflow for beam job. I use Timers for my > stateful transformations, but recently I started seeing the following > exception on DRAINING

Re: On DRAIN: Attempt to deliver a timer to a DoFn, but timers are not supported in Dataflow.

2020-07-26 Thread Mohil Khare
timer to a DoFn, but timers are not supported in Dataflow). Do you know under what circumstances, My code might be throwing this ? Not sure if its some issue in 2.19 which might have been fixed now with 2.22 Thanks and Regards Mohil On Fri, Jul 24, 2020 at 5:21 PM Mohil Khare wrote

Re: On DRAIN: Attempt to deliver a timer to a DoFn, but timers are not supported in Dataflow.

2020-07-26 Thread Mohil Khare
and it seems be due to TimerType User Thanks Mohil On Sun, Jul 26, 2020 at 1:42 PM Mohil Khare wrote: > Hello, > > I was looking at source code of > https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dat

Exceptions: Attempt to deliver a timer to a DoFn, but timers are not supported in Dataflow.

2020-07-27 Thread Mohil Khare
1. at org.apache.beam.runners.dataflow.worker. StreamingDataflowWorker$7.run (StreamingDataflowWorker.java:1073) 2. at java.util.concurrent.ThreadPoolExecutor.runWorker ( ThreadPoolExecutor.java:1149) 3. at java.util.concurrent.ThreadPoolExecutor$Worker.run ( ThreadP

Re: Exceptions: Attempt to deliver a timer to a DoFn, but timers are not supported in Dataflow.

2020-07-27 Thread Mohil Khare
regards Mohil On Mon, Jul 27, 2020 at 10:50 AM Mohil Khare wrote: > Hello All, > > Any idea how to debug this and find out which stage, which DoFn or which > side input is causing the problem? > Do I need to override OnTimer with every DoFn to avoid this problem? > I thought

Re: Exceptions: Attempt to deliver a timer to a DoFn, but timers are not supported in Dataflow.

2020-07-29 Thread Mohil Khare
and some more > details. This looks related to > https://issues.apache.org/jira/browse/BEAM-6855 which claims to be > resolved in 2.17.0 > > Kenn > > On Mon, Jul 27, 2020 at 11:47 PM Mohil Khare wrote: > >> Hello all, >> >> I think I found the reason f

Re: Exceptions: Attempt to deliver a timer to a DoFn, but timers are not supported in Dataflow.

2020-08-04 Thread Mohil Khare
https://issues.apache.org/jira/browse/BEAM-6855 which claims to be >> resolved in 2.17.0 >> >> Kenn >> >> On Mon, Jul 27, 2020 at 11:47 PM Mohil Khare wrote: >> >>> Hello all, >>> >>> I think I found the reason for the issue. Since the e

Re: Exceptions: Attempt to deliver a timer to a DoFn, but timers are not supported in Dataflow.

2020-08-04 Thread Mohil Khare
Thanks a lot Luke.. Regards Mohil On Tue, Aug 4, 2020 at 12:01 PM Luke Cwik wrote: > BEAM-6855 is still open and I updated it linking to this thread that a > user is still being impacted. > > On Tue, Aug 4, 2020 at 10:20 AM Mohil Khare wrote: > >> yeah .. looks li

Intermittent Slowness with kafkaIO read

2020-08-05 Thread Mohil Khare
Hello, I am using Beam java Sdk 2.19 on dataflow. We have a system where log shipper continuously emit logs to kafka and beam read logs using KafkaIO. Sometime I am seeing slowness on kafkaIO read with one of the topics (probably during peak traffic period), where there is a 2-3 minutes between r

Re: Intermittent Slowness with kafkaIO read

2020-08-05 Thread Mohil Khare
sumerFactoryObj I setup ssl keystrokes Thanks and Regards Mohil On Wed, Aug 5, 2020 at 12:59 PM Mohil Khare wrote: > Hello, > > I am using Beam java Sdk 2.19 on dataflow. We have a system where log > shipper continuously emit logs to kafka and beam read logs using KafkaIO. > &

Re: Intermittent Slowness with kafkaIO read

2020-08-05 Thread Mohil Khare
I also have a question that if wrong windowing is messing up the received timestamp ?? Thanks and regards Mohil On Wed, Aug 5, 2020 at 1:19 PM Mohil Khare wrote: > Just to let you know, this is how I setup kafkaIO read: > > p > > .apply("Read_From_

Session Windowing followed by CoGroupByKey

2020-08-06 Thread Mohil Khare
Hello All, I need to seek advice whether Session Windowing followed by CoGroupByKey is a correct way to solve my use case or not and if YES, then whether I am using it correctly or not. Please note that I am using java sdk 2.19 on google dataflow I have two streams of data coming from two differe

Re: Session Windowing followed by CoGroupByKey

2020-08-07 Thread Mohil Khare
ely doing something wrong. Any suggestions? Thanks and regards Mohil On Thu, Aug 6, 2020 at 12:42 AM Mohil Khare wrote: > Hello All, > > I need to seek advice whether Session Windowing followed by CoGroupByKey > is a correct way to solve my use case or not and if YES, then whether

ElasticsearchIO: Connection closed and Cannot get Elasticsearch version exceptions

2020-08-09 Thread Mohil Khare
Hello All, I am using apache beam java sdk 2.19 and elastic search IO 6.x. I keep getting following exception while dumping streaming logs to ES: java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: java.lang.IllegalArgumentException: Cannot get Elasticsearch version 1.

Removing Duplicates from Sliding Window

2020-08-18 Thread Mohil Khare
Hello All, Firstly I am using beam java sdk 2.23.0. I have a use case where I continuously read streaming data from Kafka and dump output to Elasticsearch after doing a bunch of PTransforms. One such transform depends on the number of requests we have seen so far in the last one hour (Last one h

Need Support for ElasticSearch 7.x for beam

2020-08-24 Thread Mohil Khare
Hello, Firstly I am on java sdk 2.23.0 and we heavily use Elasticsearch as one of our sinks. It's been a while since beam got upgraded to support elasticsearch version greater than 6.x. Elasticsearch has now moved on to 7.x and we want to use some of their new security features. I want to know w

Re: Need Support for ElasticSearch 7.x for beam

2020-08-24 Thread Mohil Khare
AM Kyle Weaver wrote: > This ticket indicates Elasticsearch 7.x has been supported since Beam > 2.19: https://issues.apache.org/jira/browse/BEAM-5192 > > Are there any specific features you need that aren't supported? > > On Mon, Aug 24, 2020 at 11:33 AM Mohil K

Issue with Checkpointing in dataflow using BigQuery

2020-09-20 Thread Mohil Khare
Hello team, I am using beam java sdk 2.23.0 on dataflow. I have a pipeline where I continuously read from Kafka and run through various transforms and finally emit output to various sinks like Elastic Search, Big Query, GCS buckets etc. There are few transforms where I maintain state of input KV

Re: Issue with Checkpointing in dataflow using BigQuery

2020-09-21 Thread Mohil Khare
ble/Cloud Firestore/...)[3]? > > 1: https://cloud.google.com/dataflow/docs/guides/updating-a-pipeline > 2: > https://beam.apache.org/documentation/programming-guide/#provided-windowing-functions > 3: https://cloud.google.com/products/databases > > > On Sun, Sep 20, 2020 at 7

disable ssl check/set verify cert false for elasticsearchIO

2020-09-24 Thread Mohil Khare
Hello team and elasticSearchIO users, I am using beam java sdk 2.23.0 with elastic search as one of the sinks. Is there any way to turn off ssl check i.e. set cert verify false for https connection with elasticsearchIO ? I know using regular clients, you can do that. But can we achieve the same u

Re: disable ssl check/set verify cert false for elasticsearchIO

2020-09-25 Thread Mohil Khare
tern that is > already there in the code. > > On Thu, Sep 24, 2020 at 8:36 PM Mohil Khare wrote: > >> Hello team and elasticSearchIO users, >> >> I am using beam java sdk 2.23.0 with elastic search as one of the sinks. >> >> Is there any way to turn off ssl c

Re: disable ssl check/set verify cert false for elasticsearchIO

2020-09-25 Thread Mohil Khare
bution guide[2]. > > 1: https://gist.github.com/mikeapr4/3b2b5d05bc57640e77d0 > 2: https://beam.apache.org/contribute/ > > On Fri, Sep 25, 2020 at 9:17 AM Mohil Khare wrote: > >> Hi Luke, >> Yeah, I looked at withTrustSelfSignedCerts options, but after looking at &

Operation Timed out while ElasticsearchIO read

2021-03-07 Thread Mohil Khare
Hello ElasticSearchIO and beam users/developers, I am on Beam 2.23.0 and elasticsearch 6.8 I have been using elasticsearchIO.write() successfully. For the first time, I am trying to use elasticsearchIO.read because I have a use case where I want to read data from one elasticsearch cluster, modify

Re: Error During ElasticsearchIO read

2021-03-07 Thread Mohil Khare
derIterator.abort (ReadOperation.java:371 ) 5. at org.apache.beam.runners.dataflow.worker.util.common.worker. ReadOperation.abort (ReadOperation.java:256) 6. at org.apache.beam.runners.dataflow.worker.util.common.worker. MapTaskExecutor.execute (MapTaskExecutor.java:91) Any help would