Weather Prediction Database

August 11th, 2015

One of my largest personal projects was the creation of a weather prediction database. I was looking to store 7 key metrics that could be used in the future to power Android and IPhone fitness based apps. The full program can be found on my GitHub here. I chose the National Weather Service’s NDFD database as my source since it had a free SOAP web service.

In the United States there are over 43,000 zip codes and the National Weather Service maintains weather prediction data for each of them. The broad goal of this project was to gather this data and place it into a local MySQL database where it would be easier to access.

Since this database was to power fitness based apps I only wanted to store weather prediction data that would affect someone exercising outside. Out of the available data I chose the following to insert into the MySQL database.

  • Temperature
  • Rainfall
  • Snowfall
  • Ice Accumulation
  • Humidity
  • Wind Speed
  • Wind Gust Speed

The following shows the SQL code used to create the database. While simple, the two tables provide data normalization since each zip codes latitude and longitude should be static content. Each WeatherPrediction row uses the combination of the zipcode and time_Applicable to guarantee a unique key.

                    CREATE TABLE WeatherPrediction(
                        zipcode INT NOT NULL,
                        time_Applicable VARCHAR(30) NOT NULL,
                        last_Updated VARCHAR(30) NOT NULL,
                        temp INT,
                        liquid_Precip DOUBLE,
                        wind_Speed INT,
                        ice_Accum DOUBLE,
                        snow_Amount DOUBLE,
                        gust_Speeds INT,
                        humidity INT,
                        PRIMARY KEY (zipcode, time_Applicable)

                    CREATE TABLE latLon(
                        zipcode INT NOT NULL,
                        lattitude DOUBLE NOT NULL,
                        longitude DOUBLE NOT NULL,
                        PRIMARY KEY (zipcode)

The full Java code used to populate the database can be found here. Since it is long I will not post all of it here. Instead I’ll give some highlights of the key components that make it work. Inside the main class I had to create a thread array of WeatherRequest objects. These objects are what call the NDFD web service and then sort the returned data into the database. These objects are threaded to run in parallel since the NDFD web service can take a variable amount of time to respond to requests ranging from around .5 to 5 seconds. Since the goal is to update the database each hour for all 43,000 zip codes we needed to run multiple threads.

                    WeatherRequest[] threadArray = new WeatherRequest[12];
                    for(int x = 0; x < threadArray.length; x++) {
                        threadArray[x] = new WeatherRequest(Integer.toString(x), Zips);
                        // Here we wait a small amount of time between starting each thread 
                        // so that we do not bombard the NDFD servers all at once.
                        try {
                        } catch (InterruptedException e) {
                            // Catch the interrupt

Inside the WeatherRequest objects you have to build custom SOAP requests based on each zip code. Java has a SOAP factory class that makes this a little easier and builds the actual XML file that is sent to the NDFD servers. An example of the xml document the NDFD web service returns can be found here.

                    URL url = new URL(wsdl);
                    QName serviceName = new QName(targetNS, "ndfdXML");
                    QName portName = new QName(targetNS, "ndfdXMLPort");
                    Service service = Service.create(url, serviceName);
                    /* Example of SOAP request being built
                     * NDFDgenLatLonList operation: gets weather data for a given
                     * latitude, longitude pair
                     * Format of the Message: 
                    SOAPFactory soapFactory = SOAPFactory.newInstance();
                    SOAPMessage getWeatherMsg = msgFactory.createMessage();
                    SOAPHeader header = getWeatherMsg.getSOAPHeader();
                    header.detachNode();  // no header needed
                    SOAPBody body = getWeatherMsg.getSOAPBody();
                    Name functionCall = soapFactory.createName("NDFDgenLatLonList", "schNS", nsSchema);
                    SOAPBodyElement fcElement = body.addBodyElement(functionCall);
                    Name attname = soapFactory.createName("encodingStyle", "S", soapSchema);
                    fcElement.addAttribute(attname, soapSchema);
                    SOAPElement geocodeElement = fcElement.addChildElement("listLatLon");
                    SOAPElement product = fcElement.addChildElement("product");
                    SOAPElement weatherParameters = fcElement.addChildElement("weatherParameters");
                    // make web service call using this SOAPMessage
                    Dispatch smDispatch = null;
                    smDispatch = service.createDispatch(portName, SOAPMessage.class, Service.Mode.MESSAGE);
                    SOAPMessage weatherMsg = smDispatch.invoke(getWeatherMsg);

The last part I wish to talk about is Java code responsible for inserting the weather prediction data into the MySQL database. When doing this it is good practice to use PreparedStatement provided in Java or something similar to protect your databases from SQL injections. My full SQL statement used creates a new row in the database only if one with the matching time applicable and zip code does not already exist. This way we update old prediction data with new prediction data if it is available. In order to speed the inserts we prepare statements in large batches that will execute at the same time.

                con = DriverManager.getConnection(url, user, password);
	            st = con.createStatement();
	            rs = st.executeQuery("SELECT VERSION()");

	            // Output versionquery to check for established connection
	            if ( {
	             * The following prepared statement creates a new weatherprediction row if the zipcode and time_Applicable
	             * do not exist in the current database. If it does exists the ON DUPLICATE KEY UPDATE activates and
	             * everything is updated with the current data except for the key which contains the zipcode and time_applicable.
	            PreparedStatement pst = con.prepareStatement("INSERT INTO weatherprediction(zipcode, time_Applicable, last_Updated"
	            		+ ", temp, liquid_Precip, wind_Speed, ice_Accum, snow_Amount, gust_Speeds, humidity) VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?) "
	            		+ "ON DUPLICATE KEY UPDATE zipcode = zipcode, time_Applicable = time_Applicable, last_Updated = ?, temp = ?,"
	            		+ " liquid_Precip = ?, wind_Speed = ?, ice_Accum = ?, snow_Amount = ?, gust_Speeds = ?, humidity = ?;");
	            //tempMasterHourPredictionIndex is a copy of masterHourPredictionIndex so that we can update
	            //the number of predictions we have already added to the database
	            int tempMasterHourPredictionIndex = 0;
	            // We have to create a statement for each line in the zip file
	            for(int x = 0; x < masterHourPrediction.size() - masterHourPredictionIndex; x++) {
		            HourPrediction currentHourPrediction = masterHourPrediction.get((masterHourPredictionIndex + x));
		             * For debugging
		             * currentHourPrediction.printWeather();
		            // setstring sets the ? values in the preparedStatement with the data from the zips arraylist
		            pst.setInt(1, currentHourPrediction.getZip());
		            pst.setString(2, currentHourPrediction.getTimeApplicable());
		            pst.setString(3, currentHourPrediction.getTimeUpdated().toString());
		            pst.setInt(4, currentHourPrediction.getTemperature());
		            pst.setDouble(5, currentHourPrediction.getPrecip());
		            pst.setInt(6, currentHourPrediction.getWindSpeed());
		            pst.setDouble(7, currentHourPrediction.getIce());
		            pst.setDouble(8, currentHourPrediction.getSnow());
		            pst.setInt(9, currentHourPrediction.getGust());
		            pst.setInt(10, currentHourPrediction.getHumidity());
		            pst.setString(11, currentHourPrediction.getTimeUpdated().toString());
		            pst.setInt(12, currentHourPrediction.getTemperature());
		            pst.setDouble(13, currentHourPrediction.getPrecip());
		            pst.setInt(14, currentHourPrediction.getWindSpeed());
		            pst.setDouble(15, currentHourPrediction.getIce());
		            pst.setDouble(16, currentHourPrediction.getSnow());
		            pst.setInt(17, currentHourPrediction.getGust());
		            pst.setInt(18, currentHourPrediction.getHumidity());
		            // we add each statement to a batch that is later executed
		            //For debugging
		            //System.out.println(x + " masterHourPrediction: " + masterHourPrediction.size() + " masterHourPredictionIndex: " + masterHourPredictionIndex);
	            //Update master index with how many predictions were just inserted
	            masterHourPredictionIndex = masterHourPredictionIndex + tempMasterHourPredictionIndex;
	            // Execute all batches added in the previous for loop