Re: Use maxmind geoip lib to process ip on Spark/Spark Streaming

Zhun Shen Thu, 25 Feb 2016 22:48:07 -0800

Hi,

thanks for you advice. I tried your method, I use Gradle to manage my scala 
code. 'com.snowplowanalytics:scala-maxmind-iplookups:0.2.0’ was imported in 
Gradle.


spark version: 1.6.0
scala: 2.10.4
scala-maxmind-iplookups: 0.2.0

I run my test, got the error as below:
java.lang.NoClassDefFoundError: scala/collection/JavaConversions$JMapWrapperLike
        at 
com.snowplowanalytics.maxmind.iplookups.IpLookups$.apply(IpLookups.scala:53)




> On Feb 24, 2016, at 1:10 AM, romain sagean <romain.sag...@hupi.fr> wrote:
> 
> I realize I forgot the sbt part
> 
> resolvers += "SnowPlow Repo" at "http://maven.snplow.com/releases/"; 
> <http://maven.snplow.com/releases/>
> 
> libraryDependencies ++= Seq(
>   "org.apache.spark" %% "spark-core" % "1.3.0",
>   "com.snowplowanalytics"  %% "scala-maxmind-iplookups"  % "0.2.0"
> )
> 
> otherwise, to process streaming log I use logstash with kafka as input. You 
> can set kafka as output if you need to do some extra calculation with spark.
> 
> Le 23/02/2016 15:07, Romain Sagean a écrit :
>> Hi,
>> I use maxmind geoip with spark (no streaming). To make it work you should 
>> use mapPartition. I don't know if something similar exist for spark 
>> streaming.
>> 
>> my code for reference:
>> 
>>   def parseIP(ip:String, ipLookups: IpLookups): List[String] = {
>>     val lookupResult = ipLookups.performLookups(ip)
>>     val countryName = (lookupResult._1).map(_.countryName).getOrElse("")
>>     val city = (lookupResult._1).map(_.city).getOrElse(None).getOrElse("")
>>     val latitude = (lookupResult._1).map(_.latitude).getOrElse(None).toString
>>     val longitude = 
>> (lookupResult._1).map(_.longitude).getOrElse(None).toString
>>     return List(countryName, city, latitude, longitude)
>>   }
>> sc.addFile("/home/your_user/GeoLiteCity.dat")
>> 
>> //load your data in my_data rdd
>> 
>> my_data.mapPartitions { rows =>
>>         val ipLookups = IpLookups(geoFile = 
>> Some(SparkFiles.get("GeoLiteCity.dat")))
>>         rows.map { row => row ::: parseIP(row(3),ipLookups) }
>> }
>> 
>> Le 23/02/2016 14:28, Zhun Shen a écrit :
>>> Hi all,
>>> 
>>> Currently, I sent nginx log to Kafka then I want to use Spark Streaming to 
>>> parse the log and enrich the IP info with geoip libs from Maxmind. 
>>> 
>>> I found this one  
>>> <https://github.com/Sanoma-CDA/maxmind-geoip2-scala.git>https://github.com/Sanoma-CDA/maxmind-geoip2-scala.git
>>>  <https://github.com/Sanoma-CDA/maxmind-geoip2-scala.git>, but spark 
>>> streaming throw error and told that the lib was not Serializable.
>>> 
>>> Does anyone there way to process the IP info in Spark Streaming? Many 
>>> thanks.
>>> 
>> 
>

Re: Use maxmind geoip lib to process ip on Spark/Spark Streaming

Reply via email to