Solving Pyspark Error – ‘NoneType’ object has no attribute ‘_jvm’

‘NoneType’ object has no attribute ‘_jvm’ is an error that occurs when you use import *.

This post is a guide showing you why you are having this error and how you can get rid of it in the most efficient way possible, I will also include some alternative solutions that may help you.

Exploring the Error : ‘NoneType’ object has no attribute ‘_jvm’

Reproducing the error is easy, a simple import like this one import * is enough to break your code.

Make sure your error message is very similar to the one bellow. In order to avoid any kind of confusion.

                                                                       #
n = sum(1 for _ in iterator)
  File "/home/dev/wk/pyenv/py3/lib/python3.5/site-packages/pyspark/python/lib/pyspark.zip/pyspark/sql/functions.py", line 40, in _
    jc = getattr(sc._jvm.functions, name)(col._jc if isinstance(col, Column) else col)
AttributeError: 'NoneType' object has no attribute '_jvm'
                                                                       #

Bellow is a number of tested solutions that I have tried and have worked for me.

Solution : replace import *.

First of all I am not only going to formulate this as a solution but I am going to formulate it as a lesson in Python.

And precisely a lesson about imports.

Using import * is bad for many reasons.

import * puts a lot of stuff into the namespace, the problem here is that it is easy to shadow another object from a previous import. Also the readability is crushed and destroyed when use import *.

So, if you are using this line for example.

                                                                       #
from pyspark.sql.functions import *
                                                                       #

You should replace it with the line bellow.

                                                                       #
import pyspark.sql.functions as f
                                                                       #

or with the following line

                                                                       #
from pyspark.sql.functions import sum as sum_
                                                                       #

This is the way to solve your problem, there is no other way as far as I know.

I hope this guide solved your issue, thank you for reading. If the solutions above helped you, consider supporting us on Kofi, any help is appreciated.

Summing-up

Thi is the end of our article, I hope you found our article and website useful, never give up, keep creating and keep coding. Errors are normal in our field, cheers.

If you want to learn more about Python, please check out the Python Documentation : https://docs.python.org/3/