Debugging BigQuery queries with Beam

2024-12-16 Thread Lina Mårtensson via user
Hi! We use BigQuery a fair amount, both through Beam jobs and elsewhere, both with Python. We often do some templatizations and simple generation of the queries as well. When not using Beam we use the google.cloud.bigquery library, and when a query fails because there's something wrong wit

Re: [Proposal] Kaskada DSL and FnHarness for Temporal Queries

2023-06-12 Thread Ben Chambers
of time. For instance, the query `Purchases.amount | sum(window = since(Login))` to sum the amount spent since the last login. In user studies, we've heard that these make it much easier to compose queries analyzing the entire "journey" or "funnel" for each user. There

Re: [Proposal] Kaskada DSL and FnHarness for Temporal Queries

2023-06-12 Thread Ryan Michael
- if the doc Ben linked to (or GenAI) is interesting to you and you'll be at the conference I'd love to touch base in person! -Ryan On Mon, Jun 12, 2023 at 2:51 PM Ben Chambers wrote: > Hello Beam! > > Kaskada has created a query language for expressing temporal queries, > makin

[Proposal] Kaskada DSL and FnHarness for Temporal Queries

2023-06-12 Thread Ben Chambers
Hello Beam! Kaskada has created a query language for expressing temporal queries, making it easy to work with multiple streams and perform temporally correct joins. We’re looking at taking our native, columnar execution engine and making it available as a PTransform and FnHarness for use with

Re: [EXTERNAL] Re: Re: Q: Apache Beam IOElasticsearchIO.read() method (Java), which expects a PBegin input and a means to handle a collection of queries

2023-04-24 Thread Evan Galpin
Is your pipeline a bounded or unbounded pipeline? Are you hoping to run a job where the queries are streamed in to some unbounded pipeline Source, and in response the pipeline would execute the query and proceed with any downstream data manipulation? If so, unfortunately the approach I described

Re: [EXTERNAL] Re: Re: Q: Apache Beam IOElasticsearchIO.read() method (Java), which expects a PBegin input and a means to handle a collection of queries

2023-04-24 Thread Murphy, Sean P. via user
Thank for the reply. This would need to build the queries at runtime. There are incoming patient clinics for which there would be a known quantity, but this could fluctuate from thousands to hundreds of thousands depending on the size of study. From the approach you provided below; couldn’t

Re: [EXTERNAL] Re: Q: Apache Beam IOElasticsearchIO.read() method (Java), which expects a PBegin input and a means to handle a collection of queries

2023-04-24 Thread Evan Galpin
. That said, I understand that doing so is not always feasible depending on timelines etc. Is your set of queries countable? Can they be known at pipeline compilation time? Not the most elegant solution, but you could potentially iterate over them if they can be known at compile time: List

Re: Q: Apache Beam IOElasticsearchIO.read() method (Java), which expects a PBegin input and a means to handle a collection of queries

2023-04-20 Thread Evan Galpin
gt; > — > Alexey > > [1] > https://github.com/apache/beam/blob/master/sdks/java/io/solr/src/main/java/org/apache/beam/sdk/io/solr/SolrIO.java > > On 19 Apr 2023, at 19:05, Murphy, Sean P. via user > wrote: > > I'm running into an issue using the ElasticsearchIO.read()

Re: Q: Apache Beam IOElasticsearchIO.read() method (Java), which expects a PBegin input and a means to handle a collection of queries

2023-04-20 Thread Alexey Romanenko
hIO.read() to handle more > than one instance of a query. My queries are being dynamically built as a > PCollection based on an incoming group of values. I'm trying to see how to > load the .withQuery() parameter which could provide this capability or any > approach that pro

Re: [EXTERNAL] Re: Q: Apache Beam IOElasticsearchIO.read() method (Java), which expects a PBegin input and a means to handle a collection of queries

2023-04-20 Thread Murphy, Sean P. via user
@beam.apache.org Cc: Murphy, Sean P. Subject: [EXTERNAL] Re: Q: Apache Beam IOElasticsearchIO.read() method (Java), which expects a PBegin input and a means to handle a collection of queries Yes unfortunately the ES IO connector is not built in a way that can work by taking inputs from a PCollection to

Re: Q: Apache Beam IOElasticsearchIO.read() method (Java), which expects a PBegin input and a means to handle a collection of queries

2023-04-19 Thread Evan Galpin
0176db0116221a1b739a3916da26d822f2f/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java#L873> > > Does that make sense? > > Cheers, > Shahar. > > -- > > Shahar Frank > > srf...@gmail.com > > +447799561438 > >

Re: Q: Apache Beam IOElasticsearchIO.read() method (Java), which expects a PBegin input and a means to handle a collection of queries

2023-04-19 Thread Shahar Frank
gmail.com +447799561438 -- On Wed, 19 Apr 2023 at 18:05, Murphy, Sean P. via user wrote: > I'm running into an issue using the ElasticsearchIO.read() to handle more > than one instance of a query. My queries are being dynamically built as a > PCollec

Q: Apache Beam IOElasticsearchIO.read() method (Java), which expects a PBegin input and a means to handle a collection of queries

2023-04-19 Thread Murphy, Sean P. via user
I'm running into an issue using the ElasticsearchIO.read() to handle more than one instance of a query. My queries are being dynamically built as a PCollection based on an incoming group of values. I'm trying to see how to load the .withQuery() parameter which could provide this cap

Queries

2022-07-23 Thread Udayarc Reddy
Hi Team, I have two queries. Our project is using the Dataflow pipeline (GCP), which is built using python. *Query 1:* When I set loglevel = ERROR like logging.getLogger().setLevel(logging.ERROR), I am still seeing all the log levels in GCP. If I set the log level as an error, I would like to

[Question] Is Splittable DoFn the right choice for page-based queries?

2021-10-04 Thread Brian Rodriguez
Hello! I’m trying to implement an IO source for Google’s Firebase Auth Accounts . The Firebase API exposes a simple page-based querying interface: class ListUsersPage: MAX_LIST

Re: Best approach for Sql queries in Beam

2020-05-21 Thread bharghavi vajrala
n with different keys, Group by >> based on few columns. >> Existing solution: Using session window of 20 seconds having different >> transform for every 2 queries and using the result. >> >> Below is the sample code: >> >>

Re: Best approach for Sql queries in Beam

2020-05-21 Thread Brian Hulette
oracle db, data is pushed to kafka) > SDK : Java > Runner : Flink > Problem: Subscribe to 5 topics(tables) join with different keys, Group by > based on few columns. > Existing solution: Using session window of 20 seconds having different > transform for every 2 queries and usi

Best approach for Sql queries in Beam

2020-05-19 Thread bharghavi vajrala
having different transform for every 2 queries and using the result. Below is the sample code: Sessions sessionWindow = Sessions.withGapDuration(Duration.standardSeconds((long) Integer.parseInt("20"))); PCollection stream1 = PCollectionTuple

Inserting datastore projected queries in Postgresql

2018-11-13 Thread Jonathan Perron
Hello fellow Apache Beam users, I am trying to copy datastore entities to a PostgreSQL instance. As I don't need each fields, I performed projections following [this snippet][1]. I build the following query:     public static Query DatastoreQuery() {     Query.Builder query = Query.newBui