date:20181109

Re: Spark Utf 8 encoding

2018-11-09 Thread Sean Owen

That doesn't necessarily look like a Spark-related issue. Your terminal seems to be displaying the glyph with a question mark because the font lacks that symbol, maybe? On Fri, Nov 9, 2018 at 7:17 PM lsn24 wrote: > > Hello, > > Per the documentation default character encoding of spark is UTF-8. B

Spark Utf 8 encoding

2018-11-09 Thread lsn24

Hello, Per the documentation default character encoding of spark is UTF-8. But when i try to read non ascii characters, spark tend to read it as question marks. What am I doing wrong ?. Below is my Syntax: val ds = spark.read.textFile("a .bz2 file from hdfs"); ds.show(); The string "KøBENHAVN"

Re: Arrow optimization in conversion from R DataFrame to Spark DataFrame

2018-11-09 Thread Bryan Cutler

Great work Hyukjin! I'm not too familiar with R, but I'll take a look at the PR. Bryan On Fri, Nov 9, 2018 at 9:19 AM Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > Thanks Hyukjin! Very cool results > > Shivaram > On Fri, Nov 9, 2018 at 10:58 AM Felix Cheung > wrote: > > > > Very

Re: DataSourceV2 capability API

2018-11-09 Thread Ryan Blue

Another solution to the decimal case is using the capability API: use a capability to signal that the table knows about `supports-decimal`. So before the decimal support check, it would check `table.isSupported("type-capabilities")`. On Fri, Nov 9, 2018 at 12:45 PM Ryan Blue wrote: > For that ca

Re: DataSourceV2 capability API

2018-11-09 Thread Ryan Blue

For that case, I think we would have a property that defines whether supports-decimal is assumed or checked with the capability. Wouldn't we have this problem no matter what the capability API is? If we used a trait to signal decimal support, then we would have to deal with sources that were writt

Re: DataSourceV2 capability API

2018-11-09 Thread Reynold Xin

"If there is no way to report a feature (e.g., able to read missing as null) then there is no way for Spark to take advantage of it in the first place" Consider this (just a hypothetical scenario): We added "supports-decimal" in the future, because we see a lot of data sources don't support decima

Re: DataSourceV2 capability API

2018-11-09 Thread Ryan Blue

Do you have an example in mind where we might add a capability and break old versions of data sources? These are really for being able to tell what features a data source has. If there is no way to report a feature (e.g., able to read missing as null) then there is no way for Spark to take advanta

Re: DataSourceV2 capability API

2018-11-09 Thread Reynold Xin

How do we deal with forward compatibility? Consider, Spark adds a new "property". In the past the data source supports that property, but since it was not explicitly defined, in the new version of Spark that data source would be considered not supporting that property, and thus throwing an exceptio

Re: [Structured Streaming] Kafka group.id is fixed

2018-11-09 Thread Cody Koeninger

That sounds reasonable to me On Fri, Nov 9, 2018 at 2:26 AM Anastasios Zouzias wrote: > > Hi all, > > I run in the following situation with Spark Structure Streaming (SS) using > Kafka. > > In a project that I work on, there is already a secured Kafka setup where ops > can issue an SSL certifica

Re: Arrow optimization in conversion from R DataFrame to Spark DataFrame

2018-11-09 Thread Shivaram Venkataraman

Thanks Hyukjin! Very cool results Shivaram On Fri, Nov 9, 2018 at 10:58 AM Felix Cheung wrote: > > Very cool! > > > > From: Hyukjin Kwon > Sent: Thursday, November 8, 2018 10:29 AM > To: dev > Subject: Arrow optimization in conversion from R DataFrame to Spark Da

Re: Behavior of SaveMode.Append when table is not present

2018-11-09 Thread Ryan Blue

Right now, it is up to the source implementation to decide what to do. I think path-based tables (with no metastore component) treat an append as an implicit create. If you're thinking that relying on sources to interpret SaveMode is bad for consistent behavior, I agree. That's why the community a

Re: DataSourceV2 capability API

2018-11-09 Thread Ryan Blue

I'd have two places. First, a class that defines properties supported and identified by Spark, like the SQLConf definitions. Second, in documentation for the v2 table API. On Fri, Nov 9, 2018 at 9:00 AM Felix Cheung wrote: > One question is where will the list of capability strings be defined? >

Re: DataSourceV2 capability API

2018-11-09 Thread Felix Cheung

One question is where will the list of capability strings be defined? From: Ryan Blue Sent: Thursday, November 8, 2018 2:09 PM To: Reynold Xin Cc: Spark Dev List Subject: Re: DataSourceV2 capability API Yes, we currently use traits that have methods. Something

Re: Arrow optimization in conversion from R DataFrame to Spark DataFrame

2018-11-09 Thread Felix Cheung

Very cool! From: Hyukjin Kwon Sent: Thursday, November 8, 2018 10:29 AM To: dev Subject: Arrow optimization in conversion from R DataFrame to Spark DataFrame Hi all, I am trying to introduce R Arrow optimization by reusing PySpark Arrow optimization. It boost

Can Spark avoid Container killed by Yarn?

2018-11-09 Thread Yang Zhang

I'm always suffering Spark SQL job fails with error "Container exited with a non-zero exit code 143". I know that it was casused by the memory used execeeds the limits of spark.yarn.executor.memoryOverhead. As shown below, memory allocation request was failed at 18/11/08 17:36:05, then it RECEIVED

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-09 Thread purna pradeep

Thanks this is a great news Can you please lemme if dynamic resource allocation is available in spark 2.4? I’m using spark 2.3.2 on Kubernetes, do I still need to provide executor memory options as part of spark submit command or spark will manage required executor memory based on the spark job s

[Structured Streaming] Kafka group.id is fixed

2018-11-09 Thread Anastasios Zouzias

Hi all, I run in the following situation with Spark Structure Streaming (SS) using Kafka. In a project that I work on, there is already a secured Kafka setup where ops can issue an SSL certificate per "group.id", which should be predefined (or hopefully its prefix to be predefined). On the other

Re: Spark Utf 8 encoding

Spark Utf 8 encoding

Re: Arrow optimization in conversion from R DataFrame to Spark DataFrame

Re: DataSourceV2 capability API

Re: DataSourceV2 capability API

Re: DataSourceV2 capability API

Re: DataSourceV2 capability API

Re: DataSourceV2 capability API

Re: [Structured Streaming] Kafka group.id is fixed

Re: Arrow optimization in conversion from R DataFrame to Spark DataFrame

Re: Behavior of SaveMode.Append when table is not present

Re: DataSourceV2 capability API

Re: DataSourceV2 capability API

Re: Arrow optimization in conversion from R DataFrame to Spark DataFrame

Can Spark avoid Container killed by Yarn?

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

[Structured Streaming] Kafka group.id is fixed

17 matches

Site Navigation

Mail list logo

Footer information