Fantastic - glad to see that it's in the pipeline!
On Wed, Jan 7, 2015 at 11:27 AM, Michael Armbrust
wrote:
> I want to support this but we don't yet. Here is the JIRA:
> https://issues.apache.org/jira/browse/SPARK-3851
>
> On Tue, Jan 6, 2015 at 5:23 PM, Adam Gilmore
> wrote:
>
>> Anyone got
I want to support this but we don't yet. Here is the JIRA:
https://issues.apache.org/jira/browse/SPARK-3851
On Tue, Jan 6, 2015 at 5:23 PM, Adam Gilmore wrote:
> Anyone got any further thoughts on this? I saw the _metadata file seems
> to store the schema of every single part (i.e. file) in th
Anyone got any further thoughts on this? I saw the _metadata file seems to
store the schema of every single part (i.e. file) in the parquet directory,
so in theory it should be possible.
Effectively, our use case is that we have a stack of JSON that we receive
and we want to encode to Parquet for
I saw that in the source, which is why I was wondering.
I was mainly reading:
http://blog.cloudera.com/blog/2013/10/parquet-at-salesforce-com/
"A query that tries to parse the organizationId and userId from the 2
logTypes should be able to do so correctly, though they are positioned
differently
I must missed something important here, could you please provide more
clue on Parquet “schema versioning”? I wasn’t aware of this feature
(which sounds really useful).
Especially, are you referring the following scenario:
1. Write some data whose schema is A to “t.parquet”, resulting a file
So, there might be a shorter path to success, I'd be curious too. What I
was able to do is
1. Create the RDD
2. Apply a schema that is 1 column wider
3. register as table
4. insert new data with 1 extra column
I believe you'd have to do step 2 -- if you're inserting into a schema, and
you have ex
Hi all,
I understand that parquet allows for schema versioning automatically in the
format; however, I'm not sure whether Spark supports this.
I'm saving a SchemaRDD to a parquet file, registering it as a table, then
doing an insertInto with a SchemaRDD with an extra column.
The second SchemaRDD