Colab vs Local IDE

2020-10-28 Thread Ramesh Mathikumar
Hi Team, Is there any difference in running the spark or Flink runners from Colab vs Local. The code runs with no issues in Google Colab environment but it does not run on my local environment. This is for windows. Steps: 1. Start Flink or Spark on local machine 2. Make sure Spark and Flink

Re: KafkaIO memory issue

2020-10-28 Thread Eleanore Jin
Hi Alex, Thanks for sharing this! I think my problem is I did not preserve enough memory for JVM non-heap usage, and by default Flink set the xms and xmx to be the same and I allocate almost all the memory for heap. After add more memory, the memory usage seems stabilized. We do use global windo

Re: Colab vs Local IDE

2020-10-28 Thread Kyle Weaver
> Is there any difference in running the spark or Flink runners from Colab vs Local. Google Colab is hosted in a Linux virtual machine. Docker for Windows is missing some features, including host networking. > 4. python "filename.py" should run but getting raise grpc.FutureTimeoutError() Can you

Re: Spark Portable Runner + Docker

2020-10-28 Thread Ramesh Mathikumar
Hi Alex -- Please se the details you are looking for. I am running a sample pipeline and my environment is this. python "SaiStudy - Apache-Beam-Spark.py" --runner=PortableRunner --job_endpoint=192.168.99.102:8099 My Spark is running on a Docker Container and I can see that the JobService is ru

Re: Issues with python's external ReadFromPubSub

2020-10-28 Thread Sam Bourne
Yeah, I’m able to run that. apache_beam.io.ReadFromPubSub transform works just fine but only for DirectRunner in python. In flink we’re using the java implementation via an external transform apache_beam.io.external.gcp.pubsub.ReadFromPubSub. Is there a different way to do this? On Wed, Oct 28, 2

Re: Issues with python's external ReadFromPubSub

2020-10-28 Thread Kyle Weaver
Are you able to run streaming word count on the same setup? On Tue, Oct 27, 2020 at 5:39 PM Sam Bourne wrote: > We updated from beam 2.18.0 to 2.24.0 and have been having issues using > the python ReadFromPubSub external transform in flink 1.10. It seems like > it starts up just fine, but it doe

Re: [DISCUSS] Update Kafka dependencies in Beam Java SDK

2020-10-28 Thread Alexey Romanenko
*raising this question, of course > On 28 Oct 2020, at 18:06, Alexey Romanenko wrote: > > tasing this question

Re: KafkaIO memory issue

2020-10-28 Thread Alexey Romanenko
I don’t think it’s a KafkaIO issue since checkpoints are handled by runner. Could it be similar to this issue? https://lists.apache.org/thread.html/r4a454a40197f2a59280ffeccfe44837ec072237aea56d50599f12184%40%3Cuser.beam.apache.org%3E Could you try a workaround with sliding windows proposed th

Re: [DISCUSS] Update Kafka dependencies in Beam Java SDK

2020-10-28 Thread Alexey Romanenko
Piotr, thank you for tasing this question. Let me ask some questions before. What will give us this dependencies update? What are the pros and cons? Can users use recent versions of Kafka client with current implementation based on ConsumerSpEL class? > On 22 Oct 2020, at 10:47, Piotr Szubersk

Re: Spark Portable Runner + Docker

2020-10-28 Thread Alexey Romanenko
Hi Ramesh, By “+ Docker” do you mean Docker SDK Harness or running a Spark in Docker? For the former I believe it works fine. Could you share more details of what kind of error you are facing? > On 27 Oct 2020, at 21:10, Ramesh Mathikumar wrote: > > Hi Group -- Has anyone got this to work?

Re: Which Solr versions should be supported by Beam

2020-10-28 Thread Piotr Szuberski
So I think we can leave it as it is. The only problem would be if a user has a project using Beam and different Solr dependency at once - then Beam would enforce him to use the version Beam does. Should I change the dependency to 'provided' to cover this case? Earlier we had 5.5.2 version as co

Re: Which Solr versions should be supported by Beam

2020-10-28 Thread Piotr Szuberski
Response from Solr: Generally speaking, SolrJ has been very compatible communicating to many backend Solr server versions. I wish we tracked problems about this specifically somewhere, but I don't think we do. I suggest simply using the latest SolrJ release. If you find issues, report them.