Re: How to access the off-heap representation of cached data in Spark 2.0

2016-05-28 Thread Kazuaki Ishizaki
Hi, According to my understanding, contents in df.cache() is currently on Java heap as a set of Byte arrays in https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala#L58 . Data is accessed by using sun.misc.unsafe AP

Re: How to access the off-heap representation of cached data in Spark 2.0

2016-05-28 Thread Jacek Laskowski
Hi Jim, There's no C++ API in Spark to access the off-heap data. Moreover, I also think "off-heap" has an overloaded meaning in Spark - for tungsten and to persist your data off-heap (it's all about memory but for different purposes and with client- and internal API). That's my limited understand

Spark Streaming - Twitter on Python current status

2016-05-28 Thread Ricardo Almeida
As far as I could understand... 1. Using Python (PySpark), the use of Twitter Streaming (TwitterUtils ) as well as Customer Receivers is restricted to Scala and Java APIs on Spark 1.6.1; 2. Ma

Re: How to access the off-heap representation of cached data in Spark 2.0

2016-05-28 Thread jpivar...@gmail.com
Is this not the place to ask such questions? Where can I get a hint as to how to access the new off-heap cache, or C++ API, if it exists? I'm willing to do my own research, but I have to have a place to start. (In fact, this is the first step in that research.) Thanks, -- Jim -- View this mess