You can print whatever you want wherever you want, it's just a question of
whether it's going to show up on the driver or the various executors logs
On Wed, Feb 17, 2016 at 5:50 AM, Cyril Scetbon
wrote:
> I don't think we can print an integer value in a spark streaming process
> As opposed to a
I don't think we can print an integer value in a spark streaming process As
opposed to a spark job. I think I can print the content of an rdd but not debug
messages. Am I wrong ?
Cyril Scetbon
> On Feb 17, 2016, at 12:51 AM, ayan guha wrote:
>
> Hi
>
> You can always use RDD properties, whi
Hi
You can always use RDD properties, which already has partition information.
https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/performance_optimization/how_many_partitions_does_an_rdd_have.html
On Wed, Feb 17, 2016 at 2:36 PM, Cyril Scetbon
wrote:
> Your understanding i
Your understanding is the right one (having re-read the documentation). Still
wondering how I can verify that 5 partitions have been created. My job is
reading from a topic in Kafka that has 5 partitions and sends the data to E/S.
I can see that when there is one task to read from Kafka there ar
I have a slightly different understanding.
Direct stream generates 1 RDD per batch, however, number of partitions in
that RDD = number of partitions in kafka topic.
On Wed, Feb 17, 2016 at 12:18 PM, Cyril Scetbon
wrote:
> Hi guys,
>
> I'm making some tests with Spark and Kafka using a Python sc
Hi guys,
I'm making some tests with Spark and Kafka using a Python script. I use the
second method that doesn't need any receiver (Direct Approach). It should adapt
the number of RDDs to the number of partitions in the topic. I'm trying to
verify it. What's the easiest way to verify it ? I also