from:"Uwe L. Korn"

Re: parquet file in S3, is there a way to read a subset of all the columns in python

2018-10-11 Thread Uwe L. Korn

Hello Luke, this is only partly implemented. You can do this and I already did do this but this is sadly not in a perfect state. boto3 itself seems to be lacking a proper file-like class. You can get the contents of a file in S3 as https://botocore.amazonaws.com/v1/documentation/api/latest/referen

Re: parquet file in S3, is there a way to read a subset of all the columns in python

2018-10-12 Thread Uwe L. Korn

;foo') > stream = obj.get(Range='bytes=10-100')['Body'] print(stream.read())> > On Thu, Oct 11, 2018 at 2:22 PM Uwe L. Korn wrote:>> __ >> Hello Luke, >> >> this is only partly implemented. You can do this and I already did do >> this but

Re: Joining Parquet & PostgreSQL

2018-11-16 Thread Uwe L. Korn

Hello Korry, the C(glib)-API calls the C++ functions in the background, so this only another layer on top. The parquet::arrow C++ API is built in a way that it does not use C++ exceptions. Instead if there is a failure, we will return arrow::Status objects indicating this. Uwe On Fri, Nov 16,

Re: default memory pool in macOS

2019-03-21 Thread Uwe L. Korn

Hello Nirmala, this looks like you're missing to link to libarrow.dylib. Can you share your compile command? You're most likely missing a -larrow in there. Uwe On Thu, Mar 21, 2019, at 8:44 AM, Nirmala S wrote: > Hi, > > I am a newbie to Arrow. I am trying to create a default memory pool and u

Re: ParquetDataset Filters Question

2019-05-23 Thread Uwe L. Korn

Hello Abe, I think the problems lies in the case that you mix two syntaxes. We either support a "list of tuples" or "list of lists of tuples". Furthermore the correct DNF for your filter would be (A ⋀ B ⋀ C) ⋁ (A ⋀ B ⋀ D), thus you should use filters = [[("col", ">=", ""), ("col", "<=", "")

Re: Go / Python Sharing

2019-07-08 Thread Uwe L. Korn

Hello all, I've been using the in-process sharing method for quite some time for the Python<->Java interaction and I really like the ease of doing it all in the same process. Especially as this avoids any memory-copy or shared memory handling. This is really useful for the case where you only w

Re: Go / Python Sharing

2019-07-11 Thread Uwe L. Korn

Hello Miki, actually having the same byte alignment is something that we have written into the spec. So when there is a problem in the shared memory usage, we actually would have found a bug in one of the two implementations. Uwe On Thu, Jul 11, 2019, at 9:11 AM, Miki Tebeka wrote: > Hi, > >>

Re: python2.7 cannot install pyarrow

2019-07-11 Thread Uwe L. Korn

Hello, I guess you are on Windows? Then there is not pyarrow for Python 2.7 as Python 2.7 is built with a too old runtime to support C++11. Either upgrade to 3.6+ or use a Unix-based System. Uwe On Thu, Jul 11, 2019, at 9:14 AM, black wrote: > I have tried many ways but failed. I am in urgent

Re: python2.7 cannot install pyarrow

2019-07-11 Thread Uwe L. Korn

only way is update to the python3.7 on my computer. But I just learned that > my company's cluster environment is Linux system, and I think there is no > problem to install pyarrow on Linux python2.7, is that true? > > Kind Regards > Black > > > -- Ori

Re: PyArrow connect to Azure Data Lake Gen2

2019-07-17 Thread Uwe L. Korn

Hello, you don't to go through HDFS or Java to access ADLS Gen 2. This is simply an improved API for Azure Storage Blob and thus you can use the blob APIs of https://azure-storage.readthedocs.io/ to access the relevant containers. I've previously used https://github.com/blue-yonder/storefact an

Re: Pyarrow installation

2020-01-07 Thread Uwe L. Korn

Hello Xiaoni, it seems that you have a conda environment and also installed pyarrow with conda before. Mixing pip and conda installs of pyarrow is not going to work. Please recreate your conda environment and install pyarrow from conda forge using "conda config --add channels conda-forge && con

Re: Using 'zero copy' for interop with python from java

2020-06-08 Thread Uwe L. Korn

Yes, this is zero-copy and is probably the thing you are looking for. In future, it would be nice to migrate this codebase to the C-interface but for now, this should do the job. On Mon, Jun 8, 2020, at 5:24 AM, Micah Kornfield wrote: > Uwe wrote a blog post [1] on how to do this with PY4J a whi

Re: Errors linking with libarrow.so

2020-07-13 Thread Uwe L. Korn

If you are building against the wheel, you need to add `-D_GLIBCXX_USE_CXX11_ABI=0` to the compiler flags of the code you are compiling. ManylinuxX packages still use the old CXX ABI. Cheers Uwe On Mon, Jul 13, 2020, at 10:00 AM, Zhang, Zhang wrote: > I got many “undefined reference” errors whe

Re: Parquet with CMake and conda

2020-07-30 Thread Uwe L. Korn

Hello, you should add find_package(Parquet REQUIRED) target_link_libraries(parquet_test PRIVATE parquet_shared) to your CMake setup. This will also then link the Parquet libraries in addition to the Arrow libraries. You need to use the FindParquet.cmake that is included in the Arrow sources.

Re: Arrow in Kubernetes

2020-09-17 Thread Uwe L. Korn

Hello Raúl, this seems to be a question about memory mapped files in general in Kubernetes. We don't do anything special with regards to memory mapping in Arrow with them, so I think it is better to ask this question in a forum where people focus on Kubernetes not on Arrow. Best Uwe On Thu, S

Re: [C++][Python] Shared memory with Arrow ?

2020-09-21 Thread Uwe L. Korn

Hello Luis, As you already mentioned, mapped files, Windows name for shared memory, need the size to be available ahead. This is the same on other operating systems, too. Flight will copy the data when transferring from one process to another. So there you will have the copy again. So to actua

Re: Creating and populating Arrow table directly?

2020-10-18 Thread Uwe L. Korn

Hello, You actually can use NumPy arrays to construct an Arrow array without the need to copy any data. The important aspect here is to treat these NumPy arrays simply as plain memory allocations. You use it to construct the separate memory memory buffers (i.e. the valid-bits and data buffers)

Re: What's the future of Plasma?

2020-11-02 Thread Uwe L. Korn

As long as nobody is stepping up to maintain it, its future will be the removal from the code base. If you rely on it, it would be a good choice for you to look through the issues and see whether you can contribute here. Uwe On Mon, Nov 2, 2020, at 10:20 AM, 梁彬彬 wrote: > As mentioned in > issu

Re: [C++] error when writing Timestamps in NANOS resolution using StreamWriter to parquet files

2020-12-09 Thread Uwe L. Korn

Hello Anders, you have twice the same time_type in your mail. I guess one of them should be different? Cheers Uwe On Wed, Dec 9, 2020, at 11:00 AM, anders johansson wrote: > Hi, > > I am trying to write time stamps in int64_t format representing time in UTC > normalized nanoseconds to a parqu

Re: [pyarrow] Pyarrow=2.0.0 without boost-cpp dependancy

2021-02-22 Thread Uwe L. Korn

Hello Alex, continuing this here instead of StackOverflow. Can you share your source of conda packages: Do you use conda-forge? Can you share the output of `conda list` of your current environment? What stucks me is that we have never built pyarrow on conda-forge against boost-cpp 1.73, only 1

Re: [pyarrow] Pyarrow=2.0.0 without boost-cpp dependancy

2021-02-22 Thread Uwe L. Korn

boost-cpp. > > Thanks again, > Alex > > On Mon, Feb 22, 2021 at 11:53 AM Uwe L. Korn wrote: >> __ >> Hello Alex, >> >> continuing this here instead of StackOverflow. Can you share your source of >> conda packages: Do you use conda-forge? Can you sh

Re: [DISCUSS] Apache Arrow Meetup in Europe

2025-03-10 Thread Uwe L. Korn

Hi JB, If there is interest in the area around Frankfurt/Mannheim/Karlsruhe (it is quite central in Europe, and Frankfurt Airport is well-connected), I can also connect you to the local PyData chapters and the non-profit Pioneers Hub. The only co-hosting opportunity I see this year is PyConDE i

Re: [DISCUSS] Apache Arrow Meetup in Europe

2025-03-11 Thread Uwe L. Korn

+1 sounds like a cool idea. If this is happening later this year, I would also like to attend. On Thu, Mar 6, 2025, at 3:28 PM, Matt Topol wrote: > +1 for this, hoping that I can get funding to attend too! Lol > > --Matt > > On Thu, Mar 6, 2025, 9:07 AM Antoine Pitrou wrote: > >> >> Hi JB, >> >>

Re: parquet file in S3, is there a way to read a subset of all the columns in python

Re: parquet file in S3, is there a way to read a subset of all the columns in python

Re: Joining Parquet & PostgreSQL

Re: default memory pool in macOS

Re: ParquetDataset Filters Question

Re: Go / Python Sharing

Re: Go / Python Sharing

Re: python2.7 cannot install pyarrow

Re: python2.7 cannot install pyarrow

Re: PyArrow connect to Azure Data Lake Gen2

Re: Pyarrow installation

Re: Using 'zero copy' for interop with python from java

Re: Errors linking with libarrow.so

Re: Parquet with CMake and conda

Re: Arrow in Kubernetes

Re: [C++][Python] Shared memory with Arrow ?

Re: Creating and populating Arrow table directly?

Re: What's the future of Plasma?

Re: [C++] error when writing Timestamps in NANOS resolution using StreamWriter to parquet files

Re: [pyarrow] Pyarrow=2.0.0 without boost-cpp dependancy

Re: [pyarrow] Pyarrow=2.0.0 without boost-cpp dependancy

Re: [DISCUSS] Apache Arrow Meetup in Europe

Re: [DISCUSS] Apache Arrow Meetup in Europe

23 matches

Site Navigation

Mail list logo

Footer information