Spark - Install Spark on Windows (PySpark)




Welcome to another post on spark. Sometimes we have to create a setup to try out few things quickly.


If you want to fastly do some exeriments with apache spark then just follow the following simple steps to do the setup of pyspark on windows.

1. Download and install Java


Download Link-

https://java.com/en/download/


Path Setup -

Add path to system environment variables

C:\Program Files\Java\jdk1.8.0_91\bin

2. Download and install Anaconda (Contains Python and Jupyter Notebook)


Download Link-

https://anaconda.org/anaconda/python


Path Setup -

Add path to system environment variables while installing anaconda.


3. Install Spark


Download Link-

http://spark.apache.org/downloads.html


Path Setup -

Add the path of extracted spark folder to the system environment variables.

3. Install Hadoop

Just create a Hadoop folder on c drive.

4. Download winutils - required for spark on windows


Download Link-

https://github.com/steveloughran/winutils/tree/master/hadoop-2.6.0/bin


Download and save it inside the bin folder of Hadoop folder created already in the previous step.


Path Setup -


Add bin path to system environment variables

5. Start pyspark

pyspark

Type above command and check the output

6. Setup for spark on Jupyter notebook

export PYSPARK_DRIVER_PYTHON=ipython3
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"

Use the following command to start the spark through the command line.

pyspark --master local[4]

That's it! Just run the above command and you will see the following output on the cmd window and jupyter notebook will get opened in the new browser window.


About Author

Dattatray Shinde have over 12+ years of experience in Software Design, Development & Maintenance of Web Based Applications; worked on Healthcare, Insurance, E-commerce and Learning Management System domains. Over 6 + years as Data Scientist worked mainly in predictive analytics, survey analytics, risk analytics platforms.

Featured Posts
Recent Posts