date:20190507

Re: Static partitioning in partitionBy()

2019-05-07 Thread Felix Cheung

You could df.filter(col(“c”) = “c1”).write().partitionBy(“c”).save It could get some data skew problem but might work for you From: Burak Yavuz Sent: Tuesday, May 7, 2019 9:35:10 AM To: Shubham Chaurasia Cc: dev; user@spark.apache.org Subject: Re: Static parti

Create table from Avro-generated parquet files?

2019-05-07 Thread Coolbeth, Matthew

I have a “directory” in S3 containing Parquet files created from Avro using the AvroParquetWriter in the parquet-mr project. I can load the contents of these files as a DataFrame using val it = spark.read.parquet("s3a://coolbeth/file=testy") but I have not found a way to define a permanent t

Re: Static partitioning in partitionBy()

2019-05-07 Thread Burak Yavuz

It depends on the data source. Delta Lake (https://delta.io) allows you to do it with the .option("replaceWhere", "c = c1"). With other file formats, you can write directly into the partition directory (tablePath/c=c1), but you lose atomicity. On Tue, May 7, 2019, 6:36 AM Shubham Chaurasia wrote:

Structured Streaming Kafka - Weird behavior with performance and logs

2019-05-07 Thread Austin Weaver

Hey Spark Experts, After listening to some of you, and the presentations at Spark Summit in SF, I am transitioning from d-streams to structured streaming however I am seeing some weird results. My use case is as follows: I am reading in a stream from a kafka topic, transforming a message, and wri

Static partitioning in partitionBy()

2019-05-07 Thread Shubham Chaurasia

Hi All, Is there a way I can provide static partitions in partitionBy()? Like: df.write.mode("overwrite").format("MyDataSource").partitionBy("c=c1").save Above code gives following error as it tries to find column `c=c1` in df. org.apache.spark.sql.AnalysisException: Partition column `c=c1` not

ThriftServer gc over exceed and memory problem

2019-05-07 Thread shicheng31...@gmail.com

Hi all: My spark's version is 2.3.2. I start thriftserver with default spark config. On another hand, I use java-application to query result via JDBC . The query application has plenty of statement to execute. The previous statement executes very quickly, and the latter statement execut

Re: Dynamic metric names

2019-05-07 Thread Roberto Coluccio

It would be a dream to have an easy-to-use dynamic metric system AND a reliable counting system (accumulator-like) in Spark... Thanks Roberto On Tue, May 7, 2019 at 3:54 AM Saisai Shao wrote: > I think the main reason why that was not merged is that Spark itself > doesn't have such requirement,

Re: Spark SQL met "Block broadcast_xxx not found"

2019-05-07 Thread Jacek Laskowski

Hi, I'm curious about "I found the bug code". Can you point me at it? Thanks. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https:

Re: Spark SQL met "Block broadcast_xxx not found"

2019-05-07 Thread Xilang Yan

Ok... I am sure it is a bug of spark, I found the bug code, but the code is removed in 2.2.3, so I just upgrade spark to fix the problem. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mai

Re: Static partitioning in partitionBy()

Create table from Avro-generated parquet files?

Re: Static partitioning in partitionBy()

Structured Streaming Kafka - Weird behavior with performance and logs

Static partitioning in partitionBy()

ThriftServer gc over exceed and memory problem

Re: Dynamic metric names

Re: Spark SQL met "Block broadcast_xxx not found"

Re: Spark SQL met "Block broadcast_xxx not found"

9 matches

Site Navigation

Mail list logo

Footer information