e Letsencrypt root certificater for internal services as it
expires every three months. Is there a way, just like kafkaIO, to use
selfsigned root certificate with elasticsearchIO? Or is there a way to
update java cacerts on worker VMs where beam job is running?
Looking forward for some suggesti
Hello all,
We are attempting a implement a use case where beam (java sdk) reads two
kind of records from data stream like Kafka:
1. Records of type A containing key and corresponding metadata.
2. Records of type B containing the same key, but no metadata. Beam then
needs to fill metadata for recor
Any update on this? Shall I open a jira for this support ?
Thanks and regards
Mohil
On Sun, Mar 22, 2020 at 9:36 PM Mohil Khare wrote:
> Hi,
> This is Mohil from Prosimo, a small bay area based stealth mode startup.
> We use Beam (on version 2.19) with google dataflow in our
re the value again in key
> StateA-metadata.
>
> Cheers
>
> On Tue, 7 Apr 2020 at 08:03, Mohil Khare wrote:
>
>> Hello all,
>> We are attempting a implement a use case where beam (java sdk) reads two
>> kind of records from data stream like Kafka:
>>
>&
bound.
>
> Sorry, its not a yes / no answer :-)
>
> On Tue, 7 Apr 2020 at 09:03, Mohil Khare wrote:
>
>> Thanks a lot Reza for your quick response. Yeah saving the data in an
>> external system after timer expiry makes sense.
>> So do you suggest using a global wind
ion/patterns/overview/
>
> This would make a great one!
>
> On Tue, 7 Apr 2020 at 09:12, Mohil Khare wrote:
>
>> No ... that's a valid answer. Since I wanted to have a long window size
>> per key and since we can't use state with session windows, I am using a
>>
--filesToStage on
> the command line or setFilesToStage in Java code. Am I correct that this
> symptom is confirmed?
>
> Kenn
>
> On Mon, Apr 6, 2020 at 5:04 PM Mohil Khare wrote:
>
>> Any update on this? Shall I open a jira for this support ?
>>
>> Thanks and
Hi Reza and others,
As suggested, I have opened
https://issues.apache.org/jira/browse/BEAM-10019 which
I think might be a good addition to beam pipeline patterns.
Thanks
Mohil
On Mon, Apr 6, 2020 at 6:28 PM Mohil Khare wrote:
> Sure thing.. I would love to contribute.
>
> Thank
st store. Just make sure it's properly set
> up in your META-INF/services.
>
> This is supported by Dataflow and all PortableRunners that use a separate
> process/container for the worker.
>
> 1:
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/a
Hello,
I have a use case where I have two sets of PCollections (RecordA and
RecordB) coming from a real time streaming source like Kafka.
Both Records are correlated with a common key, let's say KEY.
The purpose is to enrich RecordA with RecordB's data for which I am using
CoGbByKey. Since Recor
"MyKey" won't last for more than 60-90 secs*), and my use case seems to
be working fine. Even DRAIN is working successfully.
Thanks
Mohil
On Sun, May 17, 2020 at 3:37 PM Mohil Khare wrote:
> Hello,
>
> I have a use case where I have two sets of PCollections (RecordA and
hrown inside the Dataflow runner. Are you able to file a JIRA for this bug?
>
> On Mon, May 18, 2020 at 10:44 AM Robert Bradshaw
> wrote:
>
>> Glad you were able to get this working; thanks for following up.
>>
>> On Mon, May 18, 2020 at 10:35 AM Mohil Khare wrote:
Hi Reuven, all,
I have opened following jira to track this issue:
https://issues.apache.org/jira/browse/BEAM-10053
Thanks and Regards
Mohil
On Mon, May 18, 2020 at 12:28 PM Mohil Khare wrote:
> Hi Reuven,
>
> Thanks for your reply. Well, I haven't filed JIRA yet. But if it look
Hi everyone,
I need a suggestion regarding usage of the side input pattern and sliding
window, especially while replaying old kafka logs/offsets.
FYI: I am running beam 2.19 on google dataflow.
I have a use case where I read a continuous stream of data from Kafka and
need to calculate one score (
Hello all,
Any suggestions? Where am I going wrong or is there any better way of
achieving this so that I can do replay as well ?
Thanks
Mohil
On Wed, May 27, 2020 at 11:40 AM Mohil Khare wrote:
> Hi everyone,
> I need a suggestion regarding usage of the side input pattern and s
Hello everyone,
Does anyone know if it is possible to provide a topic name embedded in a
PCollection object to kafkaIO while writing ?
We have a use case where we have a team specific kafka topic for eg
teamA_topicname, teamB_topicname.
>From beam, we create PCollection> and we need to send
thi
ces
> to the end of windows timestamp + 360 days before something is output from
> the grouping aggregation/available at the side input.
>
>
> On Sat, May 30, 2020 at 12:17 PM Mohil Khare wrote:
>
>> Hello all,
>>
>> Any suggestions? Where am I going wrong or is
quot;results”) // default sink topic
> .withKeySerializer(...)
> .withValueSerializer(...));
>
>
>
> On 2 Jun 2020, at 03:27, Mohil Khare wrote:
>
> Hello everyone,
>
> Does anyone know if it is possible to provide a topic name embedded in a
> PCollecti
.
>
> 1:
> https://beam.apache.org/documentation/patterns/side-inputs/#slowly-updating-side-input-using-windowing
>
> On Mon, Jun 1, 2020 at 8:27 PM Mohil Khare wrote:
>
>> Thanks Luke for your reply.
>> I see. I am trying to recall why I added allowedLateness as 360 days.
>&g
BTW, just to make sure that there is no issue with any individual
PTransform, I enabled each one of them one by one and the pipeline started
successfully. Issue happens as soon as I enable more than one new
aforementioned PTransform.
Thanks and regards
Mohil
On Thu, Jun 25, 2020 at 1:26 AM Mohil
Mohil Khare wrote:
> Hi Luke,
> Thanks for your response, I tried looking at worker logs using the logging
> service of GCP and unable to get a clear picture. Not sure if its due to
> memory pressure or low number of harness threads.
> Attaching a few more screenshots of crash logs
ch transforms and it was working fine. Yesterday I added a
few more and started seeing crashes.
If I enable just one of the newly added PCollectionView transforms
(keeping old 3-4 intact), then everything works fine. Moment I enable
another new transform, a crash happens.
Hope this provides some
evaluating to make sure that there is no performance degradation by
doing so.
Thanks and regards
Mohil
On Thu, Jun 25, 2020 at 11:44 AM Mohil Khare wrote:
> Hi Luke,
>
> Let me give you some more details about the code.
> As I mentioned before, I am using java sdk 2.19.0 on dataflow.
it generated tags that weren't unique enough.
>
> I would open up a case with Dataflow support with all the information you
> have provided here.
>
> 1: https://issues.apache.org/jira/browse/BEAM-4549
> 2: https://issues.apache.org/jira/browse/BEAM-4534
>
> On Thu, Jun
Sure not a problem. I will open one.
Thanks
Mohil
On Fri, Jun 26, 2020 at 10:55 AM Luke Cwik wrote:
> Sorry, I didn't open a support case. You should open the support case.
>
> On Fri, Jun 26, 2020 at 10:41 AM Mohil Khare wrote:
>
>> Thanks a lot Luke for following u
I have opened following ticket:
https://issues.apache.org/jira/browse/BEAM-10339
Please let me know if there some other place where I need to open a support
ticket.
Thank you
Mohil
On Fri, Jun 26, 2020 at 11:13 AM Mohil Khare wrote:
> Sure not a problem. I will open one.
>
> Thank
Hello,
I am using beam java sdk 2.19.0 (with enableStreamingEngine set as true)
and very heavily use stateful beam processing model.
However, sometimes I am seeing the following exception while reading value
from state for a key (Please note : here my key is a POJO where fields
create a kind of co
source of work rebalancing), so the in-progress work item on that
> worker failed. However the work item will be processed successfully on the
> new worker that owns it. This should not cause a persistent failure.
>
> On Wed, Jul 8, 2020 at 9:53 PM Mohil Khare wrote:
>
>> Hello,
Hello,
I am using beam on dataflow with java sdk 2.19.0.
I have a use case where I need to collect some messages in a short window
of few seconds to 1 minute, update state (stateful processing of beam) and
carry forward this state information to next window and use this to
initialize state of next
Hello Julius,
Well I do something similar using FileIO.Write.FileNaming i,e,
https://beam.apache.org/releases/javadoc/2.4.0/org/apache/beam/sdk/io/FileIO.Write.FileNaming.html
You can do something like following:
.apply(FileIO.*write*()
.via(ParquetIO.*sink*(*getOutput_schema*()))
dow;
return String.format(
%s/file.parquet",
subpath,
* DATE_FORMAT.print(intervalWindow.start()*);
}
}
Regards
Mohil
On Fri, Jul 10, 2020 at 3:39 PM Mohil Khare wrote:
> Hello Julius,
>
> Well I do something similar using FileIO
Hello,
I am on java sdk 2.19 and using dataflow for beam job. I use Timers for my
stateful transformations, but recently I started seeing the following
exception on DRAINING a job. It used to work fine and not sure what changed.
java.lang.UnsupportedOperationException:
1.
1. atorg.apach
07-24 17:06:53.863 PDT
Uncaught exception:
On Fri, Jul 24, 2020 at 2:59 PM Mohil Khare wrote:
> Hello,
>
> I am on java sdk 2.19 and using dataflow for beam job. I use Timers for my
> stateful transformations, but recently I started seeing the following
> exception on DRAINING
timer to a DoFn, but timers are not supported in Dataflow).
Do you know under what circumstances, My code might be throwing this ? Not
sure if its some issue in 2.19 which might have been fixed now with 2.22
Thanks and Regards
Mohil
On Fri, Jul 24, 2020 at 5:21 PM Mohil Khare wrote
and it seems be due to TimerType User
Thanks
Mohil
On Sun, Jul 26, 2020 at 1:42 PM Mohil Khare wrote:
> Hello,
>
> I was looking at source code of
> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dat
1. at org.apache.beam.runners.dataflow.worker.
StreamingDataflowWorker$7.run (StreamingDataflowWorker.java:1073)
2. at java.util.concurrent.ThreadPoolExecutor.runWorker (
ThreadPoolExecutor.java:1149)
3. at java.util.concurrent.ThreadPoolExecutor$Worker.run (
ThreadP
regards
Mohil
On Mon, Jul 27, 2020 at 10:50 AM Mohil Khare wrote:
> Hello All,
>
> Any idea how to debug this and find out which stage, which DoFn or which
> side input is causing the problem?
> Do I need to override OnTimer with every DoFn to avoid this problem?
> I thought
and some more
> details. This looks related to
> https://issues.apache.org/jira/browse/BEAM-6855 which claims to be
> resolved in 2.17.0
>
> Kenn
>
> On Mon, Jul 27, 2020 at 11:47 PM Mohil Khare wrote:
>
>> Hello all,
>>
>> I think I found the reason f
https://issues.apache.org/jira/browse/BEAM-6855 which claims to be
>> resolved in 2.17.0
>>
>> Kenn
>>
>> On Mon, Jul 27, 2020 at 11:47 PM Mohil Khare wrote:
>>
>>> Hello all,
>>>
>>> I think I found the reason for the issue. Since the e
Thanks a lot Luke..
Regards
Mohil
On Tue, Aug 4, 2020 at 12:01 PM Luke Cwik wrote:
> BEAM-6855 is still open and I updated it linking to this thread that a
> user is still being impacted.
>
> On Tue, Aug 4, 2020 at 10:20 AM Mohil Khare wrote:
>
>> yeah .. looks li
Hello,
I am using Beam java Sdk 2.19 on dataflow. We have a system where log
shipper continuously emit logs to kafka and beam read logs using KafkaIO.
Sometime I am seeing slowness on kafkaIO read with one of the topics
(probably during peak traffic period), where there is a 2-3 minutes between
r
sumerFactoryObj I setup ssl keystrokes
Thanks and Regards
Mohil
On Wed, Aug 5, 2020 at 12:59 PM Mohil Khare wrote:
> Hello,
>
> I am using Beam java Sdk 2.19 on dataflow. We have a system where log
> shipper continuously emit logs to kafka and beam read logs using KafkaIO.
>
&
I also have a question that if wrong windowing is messing up the received
timestamp ??
Thanks and regards
Mohil
On Wed, Aug 5, 2020 at 1:19 PM Mohil Khare wrote:
> Just to let you know, this is how I setup kafkaIO read:
>
> p
>
> .apply("Read_From_
Hello All,
I need to seek advice whether Session Windowing followed by CoGroupByKey is
a correct way to solve my use case or not and if YES, then whether I am
using it correctly or not.
Please note that I am using java sdk 2.19 on google dataflow
I have two streams of data coming from two differe
ely doing something wrong. Any suggestions?
Thanks and regards
Mohil
On Thu, Aug 6, 2020 at 12:42 AM Mohil Khare wrote:
> Hello All,
>
> I need to seek advice whether Session Windowing followed by CoGroupByKey
> is a correct way to solve my use case or not and if YES, then whether
Hello All,
I am using apache beam java sdk 2.19 and elastic search IO 6.x.
I keep getting following exception while dumping streaming logs to ES:
java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException:
java.lang.IllegalArgumentException: Cannot get Elasticsearch version
1.
Hello All,
Firstly I am using beam java sdk 2.23.0.
I have a use case where I continuously read streaming data from Kafka and
dump output to Elasticsearch after doing a bunch of PTransforms.
One such transform depends on the number of requests we have seen so far in
the last one hour (Last one h
Hello,
Firstly I am on java sdk 2.23.0 and we heavily use Elasticsearch as one of
our sinks.
It's been a while since beam got upgraded to support elasticsearch version
greater than 6.x.
Elasticsearch has now moved on to 7.x and we want to use some of their new
security features.
I want to know w
AM Kyle Weaver wrote:
> This ticket indicates Elasticsearch 7.x has been supported since Beam
> 2.19: https://issues.apache.org/jira/browse/BEAM-5192
>
> Are there any specific features you need that aren't supported?
>
> On Mon, Aug 24, 2020 at 11:33 AM Mohil K
Hello team,
I am using beam java sdk 2.23.0 on dataflow.
I have a pipeline where I continuously read from Kafka and run through
various transforms and finally emit output to various sinks like Elastic
Search, Big Query, GCS buckets etc.
There are few transforms where I maintain state of input KV
ble/Cloud Firestore/...)[3]?
>
> 1: https://cloud.google.com/dataflow/docs/guides/updating-a-pipeline
> 2:
> https://beam.apache.org/documentation/programming-guide/#provided-windowing-functions
> 3: https://cloud.google.com/products/databases
>
>
> On Sun, Sep 20, 2020 at 7
Hello team and elasticSearchIO users,
I am using beam java sdk 2.23.0 with elastic search as one of the sinks.
Is there any way to turn off ssl check i.e. set cert verify false for https
connection with elasticsearchIO ? I know using regular clients, you can do
that. But can we achieve the same u
tern that is
> already there in the code.
>
> On Thu, Sep 24, 2020 at 8:36 PM Mohil Khare wrote:
>
>> Hello team and elasticSearchIO users,
>>
>> I am using beam java sdk 2.23.0 with elastic search as one of the sinks.
>>
>> Is there any way to turn off ssl c
bution guide[2].
>
> 1: https://gist.github.com/mikeapr4/3b2b5d05bc57640e77d0
> 2: https://beam.apache.org/contribute/
>
> On Fri, Sep 25, 2020 at 9:17 AM Mohil Khare wrote:
>
>> Hi Luke,
>> Yeah, I looked at withTrustSelfSignedCerts options, but after looking at
&
Hello ElasticSearchIO and beam users/developers,
I am on Beam 2.23.0 and elasticsearch 6.8
I have been using elasticsearchIO.write() successfully.
For the first time, I am trying to use elasticsearchIO.read because I have
a use case where I want to read data from one elasticsearch cluster,
modify
derIterator.abort (ReadOperation.java:371
)
5. at org.apache.beam.runners.dataflow.worker.util.common.worker.
ReadOperation.abort (ReadOperation.java:256)
6. at org.apache.beam.runners.dataflow.worker.util.common.worker.
MapTaskExecutor.execute (MapTaskExecutor.java:91)
Any help would
56 matches
Mail list logo