If you just want to emulate pushing down a join, you can just wrap the IN
list query in a JDBCRelation directly:
scala> val r_df = spark.read.format("jdbc").option("url",
> "jdbc:h2:/tmp/testdb").option("dbtable", "R").load()
> r_df: org.apache.spark.sql.DataFrame = [A: int]
> scala> r_df.show
> +
Great idea! If the developer docs are in github, then new contributors who
find errors or omissions can update the docs as an introduction to the PR
process.
Fred
On Wed, Oct 19, 2016 at 5:46 PM, Reynold Xin wrote:
> For the contributing guide I think it makes more sense to put it in
> apache/s
nd quality of service characteristics for multiple users. Then your
>> only latency concerns are event to update, not request to response.
>>
>> On Thu, Oct 13, 2016 at 10:39 AM, Fred Reiss
>> wrote:
>> > On Tue, Oct 11, 2016 at 11:02 AM, Shivaram Venkataraman
>&g
On Tue, Oct 11, 2016 at 11:02 AM, Shivaram Venkataraman <
shiva...@eecs.berkeley.edu> wrote:
>
> >
> Could you expand a little bit more on stability ? Is it just bursty
> workloads in terms of peak vs. average throughput ? Also what level of
> latencies do you find users care about ? Is it on the o
On Tue, Oct 11, 2016 at 10:57 AM, Reynold Xin wrote:
>
> On Tue, Oct 11, 2016 at 10:55 AM, Michael Armbrust > wrote:
>
>> *Complex event processing and state management:* Several groups I've
>>> talked to want to run a large number (tens or hundreds of thousands now,
>>> millions in the near fut
On Thu, Oct 6, 2016 at 12:37 PM, Michael Armbrust > wrote:
>
> [snip!]
> Relatedly, I'm curious to hear more about the types of questions you are
> getting. I think the dev list is a good place to discuss applications and
> if/how structured streaming can handle them.
>
Details are difficult to s
Thanks for the thoughtful comments, Michael and Shivaram. From what I’ve
seen in this thread and on JIRA, it looks like the current plan with regard
to application-facing APIs for sinks is roughly:
1. Rewrite incremental query compilation for Structured Streaming.
2. Redesign Structured Streaming's
Congratulations, Xiao!
Fred
On Tuesday, October 4, 2016, Joseph Bradley wrote:
> Congrats!
>
> On Tue, Oct 4, 2016 at 4:09 PM, Kousuke Saruta > wrote:
>
>> Congratulations Xiao!
>>
>> - Kousuke
>> On 2016/10/05 7:44, Bryan Cutler wrote:
>>
>> Congrats Xiao!
>>
>> On Tue, Oct 4, 2016 at 11:14 A
Also try doing a fresh clone of the git repository. I've seen some of those
rare failure modes corrupt parts of my local copy in the past.
FWIW the main branch as of yesterday afternoon is building fine in my
environment.
Fred
On Tue, Sep 13, 2016 at 6:29 PM, Jakob Odersky wrote:
> There are s
+1 to this request. I talked last week with a product group within IBM that
is struggling with the same issue. It's pretty common in data cleaning
applications for data in the early stages to have nested lists or sets
inconsistent or incomplete schema information.
Fred
On Tue, Sep 13, 2016 at 8:0
The input directory does need to be visible from the driver process, since
FileStreamSource does its polling from the driver. FileStreamSource creates
a Dataset for each microbatch.
I suppose the type-inference-time check for the presence of the input
directory could be moved to the FileStreamSour
I think that the community really needs some feedback on the progress of
this very important task. Many existing Spark Streaming applications can't
be ported to Structured Streaming without Kafka support.
Is there a design document somewhere? Or can someone from the DataBricks
team break down the
ge in the future if we do async checkpointing of
> internal state.
>
> You are totally right that we should relay this info back to the source.
> Opening a JIRA sounds like a good first step.
>
> On Thu, Aug 4, 2016 at 4:38 PM, Fred Reiss wrote:
>
>> Hi,
>>
&
Hi,
I've been looking over the Source API in
org.apache.spark.sql.execution.streaming, and I'm at a loss for how the
current API can be implemented in a practical way. The API defines a single
getBatch() method for fetching records from the source, with the following
Scaladoc comments defining the
14 matches
Mail list logo