I'm using pyspark 1.3.0, and struggling with what should be simple.
Basically, I'd like to run this:
site_logs.filter(lambda r: 'page_row' in r.request[:20])
meaning that I want to keep rows that have 'page_row' in the first 20
characters of the request column. The following is the closest I've come up
with:
pages = site_logs.filter("request like '%page_row%'")
but that's missing the [:20] part. If I instead try the .like function from
the Column API:
birf.filter(birf.request.like('bi_page')).take(5)
I get... Py4JJavaError: An error occurred while calling o71.filter.
: org.apache.spark.sql.AnalysisException: resolved attributes request
missing from
user_agent,status_code,log_year,bytes,log_month,request,referrer
What is the code to run this filter, and what are some recommended ways to
learn the Spark SQL syntax?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-filter-if-column-substring-does-not-contain-a-string-tp25385.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]