from:"Deepesh Maheshwari"

Re: Slow Mongo Read from Spark

2015-09-03 Thread Deepesh Maheshwari

t; Le jeu. 3 sept. 2015 à 9:15, Akhil Das a > écrit : > >> On SSD you will get around 30-40MB/s on a single machine (on 4 cores). >> >> Thanks >> Best Regards >> >> On Mon, Aug 31, 2015 at 3:13 PM, Deepesh Maheshwari < >> deepesh.maheshwar...@gm

Write Concern used in Mongo-Hadoop Connector

2015-08-31 Thread Deepesh Maheshwari

Hi, I am using below code to insert data in mongodb from spark. JavaPairRDD rdd; Configuration config = new Configuration(); config.set("mongo.output.uri", SparkProperties.MONGO_OUTPUT_URI); config.set("mongo.output.format", "com.mongodb.hadoop.MongoOutputFormat"); rdd.saveAsNewAPIHadoo

Re: Slow Mongo Read from Spark

2015-08-31 Thread Deepesh Maheshwari

utput.uri", mongodbUri); > > JavaPairRDD bsonRatingsData = > sc.newAPIHadoopFile( > ratingsUri, BSONFileInputFormat.class, Object.class, > BSONObject.class, bsonDataConfig); > > > Thanks > Best Regards > > On Mon, Aug 31, 2015 at 12:59

Slow Mongo Read from Spark

2015-08-30 Thread Deepesh Maheshwari

Hi, I am trying to read mongodb in Spark newAPIHadoopRDD. / Code */ config.set("mongo.job.input.format", "com.mongodb.hadoop.MongoInputFormat"); config.set("mongo.input.uri",SparkProperties.MONGO_OUTPUT_URI); config.set("mongo.input.query","{host: 'abc.com'}"); JavaSparkContext sc=new Ja

Custom Offset Management

2015-08-26 Thread Deepesh Maheshwari

Hi Folks, My Spark application interacts with kafka for getting data through Java Api. I am using Direct Approach (No Receivers) - which use Kafka’s simple consumer API to Read data. So, kafka offsets need to be handles explicitly. In case of Spark failure i need to save the offset state of kafka

reduceByKey not working on JavaPairDStream

2015-08-25 Thread Deepesh Maheshwari

Hi, I have applied mapToPair and then a reduceByKey on a DStream to obtain a JavaPairDStream>. I have to apply a flatMapToPair and reduceByKey on the DSTream Obtained above. But i do not see any logs from reduceByKey operation. Can anyone explain why is this happening..? find My Code Below - *

persist for DStream

2015-08-20 Thread Deepesh Maheshwari

Hi, there are function available tp cache() or persist() RDD in memory but i am reading data from kafka in form of DStream and applying operation it and i want to persist that DStream in memory for further. Please suggest method how i can persist DStream in memory. Regards, Deepesh

How to Handle Update Operation from Spark to MongoDB

2015-08-12 Thread Deepesh Maheshwari

Hi, I am using MongoDb -Hadoop connector to insert RDD into mongodb. rdd.saveAsNewAPIHadoopFile("file:///notapplicable", Object.class, BSONObject.class, MongoOutputFormat.class, outputConfig); But, some operation required to insert rdd data as update operation for Mongo i

Error while output JavaDStream to disk and mongodb

2015-08-10 Thread Deepesh Maheshwari

Hi, I have successfully reduced my data and store it in JavaDStream Now, i want to save this data in mongodb for this i have used BSONObject type. But, when i try to save it, it is giving exception. For this, i also try to save it just as *saveAsTextFile *but same exception. Error Log : attache

Debugging Spark job in Eclipse

2015-08-05 Thread Deepesh Maheshwari

Hi, As spark job is executed when you run start() method of JavaStreamingContext. All the job like map, flatMap is already defined earlier but even though you put breakpoints in the function ,breakpoint doesn't stop there , then how can i debug the spark jobs. JavaDStream words=lines.flatMap(new

Transform MongoDB Aggregation into Spark Job

2015-08-04 Thread Deepesh Maheshwari

Hi, I am new to Apache Spark and exploring spark+kafka intergration to process data using spark which i did earlier in MongoDB Aggregation. I am not able to figure out to handle my use case. Mongo Document : { "_id" : ObjectId("55bfb3285e90ecbfe37b25c3"), "url" : " http://www.z.com/ne

Re: Unable to load native-hadoop library for your platform

2015-08-04 Thread Deepesh Maheshwari

the hood. Spark doesn't necessarily use these > anyway; it's from the Hadoop libs. > > On Tue, Aug 4, 2015 at 8:30 AM, Deepesh Maheshwari > wrote: > > Can you elaborate about the things this native library covering. > > One you mentioned accelerated compression. &

Re: Unable to load native-hadoop library for your platform

2015-08-04 Thread Deepesh Maheshwari

you haven't installed and > configured native libraries for things like accelerated compression, > but it has no negative impact otherwise. > > On Tue, Aug 4, 2015 at 8:11 AM, Deepesh Maheshwari > wrote: > > Hi, > > > > When i run the spark locally on windows it giv

Unable to load native-hadoop library for your platform

2015-08-04 Thread Deepesh Maheshwari

Hi, When i run the spark locally on windows it gives below hadoop library error. I am using below spark version. org.apache.spark spark-core_2.10 1.4.1 2015-08-04 12:22:23,463 WARN (org.apache.hadoop.util.NativeCodeLoader:62) - Unable to load nativ

NoSuchMethodError : org.apache.spark.streaming.scheduler.StreamingListenerBus.start()V

2015-08-04 Thread Deepesh Maheshwari

Hi, I am trying to read data from kafka and process it using spark. i have attached my source code , error log. For integrating kafka, i have added dependency in pom.xml org.apache.spark spark-streaming_2.10 1.3.0 org.apache.spark

Re: Slow Mongo Read from Spark

Write Concern used in Mongo-Hadoop Connector

Re: Slow Mongo Read from Spark

Slow Mongo Read from Spark

Custom Offset Management

reduceByKey not working on JavaPairDStream

persist for DStream

How to Handle Update Operation from Spark to MongoDB

Error while output JavaDStream to disk and mongodb

Debugging Spark job in Eclipse

Transform MongoDB Aggregation into Spark Job

Re: Unable to load native-hadoop library for your platform

Re: Unable to load native-hadoop library for your platform

Unable to load native-hadoop library for your platform

NoSuchMethodError : org.apache.spark.streaming.scheduler.StreamingListenerBus.start()V

15 matches

Site Navigation

Mail list logo

Footer information