{"id":6675,"date":"2021-02-15T02:37:29","date_gmt":"2021-02-15T02:37:29","guid":{"rendered":"https:\/\/www.insightmind.com.my\/site\/?p=6675"},"modified":"2021-02-18T04:10:08","modified_gmt":"2021-02-18T04:10:08","slug":"ghostly-whitespaces-that-spooks-data-scientists","status":"publish","type":"post","link":"https:\/\/www.insightmind.com.my\/site\/ghostly-whitespaces-that-spooks-data-scientists\/","title":{"rendered":"Ghostly whitespaces that spooks data scientists and what can save their day"},"content":{"rendered":"[vc_row type=&#8221;in_container&#8221; full_screen_row_position=&#8221;middle&#8221; column_margin=&#8221;default&#8221; column_direction=&#8221;default&#8221; column_direction_tablet=&#8221;default&#8221; column_direction_phone=&#8221;default&#8221; scene_position=&#8221;center&#8221; text_color=&#8221;dark&#8221; text_align=&#8221;left&#8221; row_border_radius=&#8221;none&#8221; row_border_radius_applies=&#8221;bg&#8221; overlay_strength=&#8221;0.3&#8243; gradient_direction=&#8221;left_to_right&#8221; shape_divider_position=&#8221;bottom&#8221; bg_image_animation=&#8221;none&#8221;][vc_column column_padding=&#8221;no-extra-padding&#8221; column_padding_tablet=&#8221;inherit&#8221; column_padding_phone=&#8221;inherit&#8221; column_padding_position=&#8221;all&#8221; background_color_opacity=&#8221;1&#8243; background_hover_color_opacity=&#8221;1&#8243; column_shadow=&#8221;none&#8221; column_border_radius=&#8221;none&#8221; column_link_target=&#8221;_self&#8221; gradient_direction=&#8221;left_to_right&#8221; overlay_strength=&#8221;0.3&#8243; width=&#8221;1\/1&#8243; tablet_width_inherit=&#8221;default&#8221; tablet_text_alignment=&#8221;default&#8221; phone_text_alignment=&#8221;default&#8221; column_border_width=&#8221;none&#8221; column_border_style=&#8221;solid&#8221; bg_image_animation=&#8221;none&#8221;][vc_column_text]\r\n<p>Ghostly whitespaces that spooks data scientists and what can save their day.<\/p>\r\n<p>Data scientists often wonders if whitespaces in their codes and data would cause some problems and whether it makes sense to replace them with \u201c_\u201d or \u201c-\u201c instead.\u00a0 While whitespaces exist out of necessity in Python codes as tab indentation in functions and for loops, its presence in data can be troublesome. Let\u2019s confirm if Pandas column name for example can tolerate whitespace.<\/p>\r\n<p>&nbsp;<\/p>\r\n[\/vc_column_text][vc_column_text]\r\n<h3><strong><u>Data Scientist Whitespaces Example 1:<\/u><\/strong><!-- \/wp:paragraph --><\/h3>\r\n<p><!-- wp:paragraph -->The first column name, \u2018Data Engineering\u2019, is separated by whitespace. When we run the following in Jupyter Notebook, we get the desired output \u2013 confirming that whitespace is tolerated in column name.<\/p>\r\n[\/vc_column_text][vc_column_text]<!-- wp:image {\"id\":6661,\"sizeSlug\":\"large\"} --><\/p>\r\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" width=\"630\" height=\"318\" class=\"wp-image-6661\" src=\"https:\/\/www.insightmind.com.my\/site\/wp-content\/uploads\/2021\/02\/whitespaceCode-1.png\" alt=\"Data scientists whitespaces\" srcset=\"https:\/\/www.insightmind.com.my\/site\/wp-content\/uploads\/2021\/02\/whitespaceCode-1.png 630w, https:\/\/www.insightmind.com.my\/site\/wp-content\/uploads\/2021\/02\/whitespaceCode-1-300x151.png 300w\" sizes=\"(max-width: 630px) 100vw, 630px\" \/><\/figure>\r\n[\/vc_column_text][vc_column_text]\r\n<p>Now let\u2019s check if whitespace in list items and dataframe column values is giving us any problem. Whitespace separated \u2018data cleansing\u2019 as a new entry in \u2018Data Engineering\u2019 column would serve the purpose. Once again, we get the desired output when run in Jupyter Notebook without any error.<!-- \/wp:paragraph --><\/p>\r\n<!-- wp:image {\"id\":6662,\"sizeSlug\":\"large\"} -->\r\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" width=\"631\" height=\"346\" class=\"wp-image-6662\" src=\"https:\/\/www.insightmind.com.my\/site\/wp-content\/uploads\/2021\/02\/whitespaceCode2.jpg\" alt=\"Data scientists whitespaces\" srcset=\"https:\/\/www.insightmind.com.my\/site\/wp-content\/uploads\/2021\/02\/whitespaceCode2.jpg 631w, https:\/\/www.insightmind.com.my\/site\/wp-content\/uploads\/2021\/02\/whitespaceCode2-300x165.jpg 300w\" sizes=\"(max-width: 631px) 100vw, 631px\" \/><\/figure>\r\n<!-- \/wp:image -->\r\n<p><!-- wp:paragraph -->But now can the same seemingly innocent whitespaces in our data giving as nightmare when we process them further for, say visualization purposes.<\/p>\r\n<p>&nbsp;<\/p>\r\n[\/vc_column_text][vc_column_text]\r\n<h3><strong><u>Data Scientist Whitespaces Example 2:<\/u><\/strong><!-- \/wp:paragraph --><\/h3>\r\n<p><!-- wp:paragraph -->Let\u2019s see another example where we need to produce pivot table before plotting a stacked bar chart sums of the counts of keywords.<\/p>\r\n<figure class=\"wp-block-image size-large\"><img class=\"wp-image-6664\" src=\"https:\/\/www.insightmind.com.my\/site\/wp-content\/uploads\/2021\/02\/whitespaceCode3.jpg\" alt=\"whitespaceCode3\" \/><\/figure>\r\n<p>When we run the following codes and inspect the dataframe it gives the desired output along with correct sums of keywords. From the pivot table we do not notice any whitespace that worries us since our earlier observation suggests that nothing should go wrong. But unfortunately, when we start creating traces for our stacked bar chart, and assuming there are no problematic whitespaces, we get key error.<\/p>\r\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" width=\"633\" height=\"482\" class=\"wp-image-6664\" src=\"https:\/\/www.insightmind.com.my\/site\/wp-content\/uploads\/2021\/02\/whitespaceCode4.jpg\" alt=\"whitespaceCode4\" srcset=\"https:\/\/www.insightmind.com.my\/site\/wp-content\/uploads\/2021\/02\/whitespaceCode4.jpg 633w, https:\/\/www.insightmind.com.my\/site\/wp-content\/uploads\/2021\/02\/whitespaceCode4-300x228.jpg 300w\" sizes=\"(max-width: 633px) 100vw, 633px\" \/><\/figure>\r\n<p>&nbsp;<\/p>\r\n[\/vc_column_text][vc_column_text]\r\n<h3><strong><u>The Solution<\/u><\/strong><\/h3>\r\n<p>It says something is wrong with our keyword names \u2018DE\u2019 and \u2018Python\u2019. We could not have guessed that this is a whitespace problem.There are whitespaces in our keywords and we are calling them without the whitespaces. Let\u2019s now see if Pandas can reveal this whitespace problem, and the following codes expose it. Obviously, we can see trailing whitespace after both keywords.<\/p>\r\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" width=\"635\" height=\"118\" class=\"wp-image-6665\" src=\"https:\/\/www.insightmind.com.my\/site\/wp-content\/uploads\/2021\/02\/whitespaceCode5.jpg\" alt=\"Data scientists whitespaces\" srcset=\"https:\/\/www.insightmind.com.my\/site\/wp-content\/uploads\/2021\/02\/whitespaceCode5.jpg 635w, https:\/\/www.insightmind.com.my\/site\/wp-content\/uploads\/2021\/02\/whitespaceCode5-300x56.jpg 300w\" sizes=\"(max-width: 635px) 100vw, 635px\" \/><\/figure>\r\n<p>Fortunately a simple Lambda function can be employed to rename our column names and simultaneously stripping the trailing whitespaces.<\/p>\r\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" width=\"632\" height=\"22\" class=\"wp-image-6666\" src=\"https:\/\/www.insightmind.com.my\/site\/wp-content\/uploads\/2021\/02\/whitespaceCode6.jpg\" alt=\"Data scientists whitespaces\" srcset=\"https:\/\/www.insightmind.com.my\/site\/wp-content\/uploads\/2021\/02\/whitespaceCode6.jpg 632w, https:\/\/www.insightmind.com.my\/site\/wp-content\/uploads\/2021\/02\/whitespaceCode6-300x10.jpg 300w, https:\/\/www.insightmind.com.my\/site\/wp-content\/uploads\/2021\/02\/whitespaceCode6-600x22.jpg 600w\" sizes=\"(max-width: 632px) 100vw, 632px\" \/><\/figure>\r\n<p>&nbsp;<\/p>\r\n[\/vc_column_text][\/vc_column][\/vc_row]<div class=\"pvc_clear\"><\/div><p class=\"pvc_stats all \" data-element-id=\"6675\" style=\"\"><i class=\"pvc-stats-icon small\" aria-hidden=\"true\"><svg aria-hidden=\"true\" focusable=\"false\" data-prefix=\"far\" data-icon=\"chart-bar\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 512 512\" class=\"svg-inline--fa fa-chart-bar fa-w-16 fa-2x\"><path fill=\"currentColor\" d=\"M396.8 352h22.4c6.4 0 12.8-6.4 12.8-12.8V108.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v230.4c0 6.4 6.4 12.8 12.8 12.8zm-192 0h22.4c6.4 0 12.8-6.4 12.8-12.8V140.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v198.4c0 6.4 6.4 12.8 12.8 12.8zm96 0h22.4c6.4 0 12.8-6.4 12.8-12.8V204.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v134.4c0 6.4 6.4 12.8 12.8 12.8zM496 400H48V80c0-8.84-7.16-16-16-16H16C7.16 64 0 71.16 0 80v336c0 17.67 14.33 32 32 32h464c8.84 0 16-7.16 16-16v-16c0-8.84-7.16-16-16-16zm-387.2-48h22.4c6.4 0 12.8-6.4 12.8-12.8v-70.4c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v70.4c0 6.4 6.4 12.8 12.8 12.8z\" class=\"\"><\/path><\/svg><\/i> &nbsp;254&nbsp;total views<\/p><div class=\"pvc_clear\"><\/div>","protected":false},"excerpt":{"rendered":"<p>[vc_row type=&#8221;in_container&#8221; full_screen_row_position=&#8221;middle&#8221; column_margin=&#8221;default&#8221; column_direction=&#8221;default&#8221; column_direction_tablet=&#8221;default&#8221; column_direction_phone=&#8221;default&#8221; scene_position=&#8221;center&#8221; text_color=&#8221;dark&#8221; text_align=&#8221;left&#8221; row_border_radius=&#8221;none&#8221; row_border_radius_applies=&#8221;bg&#8221; overlay_strength=&#8221;0.3&#8243; gradient_direction=&#8221;left_to_right&#8221; shape_divider_position=&#8221;bottom&#8221; bg_image_animation=&#8221;none&#8221;][vc_column column_padding=&#8221;no-extra-padding&#8221; column_padding_tablet=&#8221;inherit&#8221; column_padding_phone=&#8221;inherit&#8221; column_padding_position=&#8221;all&#8221; background_color_opacity=&#8221;1&#8243; background_hover_color_opacity=&#8221;1&#8243; column_shadow=&#8221;none&#8221; column_border_radius=&#8221;none&#8221; column_link_target=&#8221;_self&#8221; gradient_direction=&#8221;left_to_right&#8221; overlay_strength=&#8221;0.3&#8243; width=&#8221;1\/1&#8243; tablet_width_inherit=&#8221;default&#8221; tablet_text_alignment=&#8221;default&#8221;&#8230;<\/p>\n<div class=\"pvc_clear\"><\/div>\n<p class=\"pvc_stats all \" data-element-id=\"6675\" style=\"\"><i class=\"pvc-stats-icon small\" aria-hidden=\"true\"><svg aria-hidden=\"true\" focusable=\"false\" data-prefix=\"far\" data-icon=\"chart-bar\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 512 512\" class=\"svg-inline--fa fa-chart-bar fa-w-16 fa-2x\"><path fill=\"currentColor\" d=\"M396.8 352h22.4c6.4 0 12.8-6.4 12.8-12.8V108.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v230.4c0 6.4 6.4 12.8 12.8 12.8zm-192 0h22.4c6.4 0 12.8-6.4 12.8-12.8V140.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v198.4c0 6.4 6.4 12.8 12.8 12.8zm96 0h22.4c6.4 0 12.8-6.4 12.8-12.8V204.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v134.4c0 6.4 6.4 12.8 12.8 12.8zM496 400H48V80c0-8.84-7.16-16-16-16H16C7.16 64 0 71.16 0 80v336c0 17.67 14.33 32 32 32h464c8.84 0 16-7.16 16-16v-16c0-8.84-7.16-16-16-16zm-387.2-48h22.4c6.4 0 12.8-6.4 12.8-12.8v-70.4c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v70.4c0 6.4 6.4 12.8 12.8 12.8z\" class=\"\"><\/path><\/svg><\/i> &nbsp;254&nbsp;total views<\/p>\n<div class=\"pvc_clear\"><\/div>\n","protected":false},"author":5,"featured_media":6657,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[49],"tags":[56,57,58,55],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v15.2.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Ghostly Whitespaces that Spooks Data Scientists | Insight Mind<\/title>\n<meta name=\"description\" content=\"Data scientists often wonders if whitespaces in their codes and data would cause some problems and whether it makes sense to replace them\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.insightmind.com.my\/site\/ghostly-whitespaces-that-spooks-data-scientists\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Ghostly Whitespaces that Spooks Data Scientists | Insight Mind\" \/>\n<meta property=\"og:description\" content=\"Data scientists often wonders if whitespaces in their codes and data would cause some problems and whether it makes sense to replace them\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.insightmind.com.my\/site\/ghostly-whitespaces-that-spooks-data-scientists\/\" \/>\n<meta property=\"og:site_name\" content=\"Insight Mind\" \/>\n<meta property=\"article:published_time\" content=\"2021-02-15T02:37:29+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-02-18T04:10:08+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.insightmind.com.my\/site\/wp-content\/uploads\/2021\/02\/whitespace.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1080\" \/>\n\t<meta property=\"og:image:height\" content=\"721\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\">\n\t<meta name=\"twitter:data1\" content=\"Dr Mohd Rais Ahmad\">\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\">\n\t<meta name=\"twitter:data2\" content=\"2 minutes\">\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.insightmind.com.my\/site\/#website\",\"url\":\"https:\/\/www.insightmind.com.my\/site\/\",\"name\":\"Insight Mind\",\"description\":\"Inspire Insight\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":\"https:\/\/www.insightmind.com.my\/site\/?s={search_term_string}\",\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/www.insightmind.com.my\/site\/ghostly-whitespaces-that-spooks-data-scientists\/#primaryimage\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/www.insightmind.com.my\/site\/wp-content\/uploads\/2021\/02\/whitespace.jpg\",\"width\":1080,\"height\":721},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.insightmind.com.my\/site\/ghostly-whitespaces-that-spooks-data-scientists\/#webpage\",\"url\":\"https:\/\/www.insightmind.com.my\/site\/ghostly-whitespaces-that-spooks-data-scientists\/\",\"name\":\"Ghostly Whitespaces that Spooks Data Scientists | Insight Mind\",\"isPartOf\":{\"@id\":\"https:\/\/www.insightmind.com.my\/site\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.insightmind.com.my\/site\/ghostly-whitespaces-that-spooks-data-scientists\/#primaryimage\"},\"datePublished\":\"2021-02-15T02:37:29+00:00\",\"dateModified\":\"2021-02-18T04:10:08+00:00\",\"author\":{\"@id\":\"https:\/\/www.insightmind.com.my\/site\/#\/schema\/person\/1e02e3c52308bfca0ff3ff94b03e0789\"},\"description\":\"Data scientists often wonders if whitespaces in their codes and data would cause some problems and whether it makes sense to replace them\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.insightmind.com.my\/site\/ghostly-whitespaces-that-spooks-data-scientists\/\"]}]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.insightmind.com.my\/site\/#\/schema\/person\/1e02e3c52308bfca0ff3ff94b03e0789\",\"name\":\"Dr Mohd Rais Ahmad\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/www.insightmind.com.my\/site\/#personlogo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/7da333265819bf54cee0a89a0ccfc183?s=96&d=blank&r=g\",\"caption\":\"Dr Mohd Rais Ahmad\"},\"description\":\"He is a dependable leader who has developed a mature approach to innovation and staff development, and demonstrates strong problem-solving capabilities to exceed expectation\",\"sameAs\":[\"https:\/\/insightmind.com.my\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","_links":{"self":[{"href":"https:\/\/www.insightmind.com.my\/site\/wp-json\/wp\/v2\/posts\/6675"}],"collection":[{"href":"https:\/\/www.insightmind.com.my\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.insightmind.com.my\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.insightmind.com.my\/site\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.insightmind.com.my\/site\/wp-json\/wp\/v2\/comments?post=6675"}],"version-history":[{"count":12,"href":"https:\/\/www.insightmind.com.my\/site\/wp-json\/wp\/v2\/posts\/6675\/revisions"}],"predecessor-version":[{"id":6680,"href":"https:\/\/www.insightmind.com.my\/site\/wp-json\/wp\/v2\/posts\/6675\/revisions\/6680"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.insightmind.com.my\/site\/wp-json\/wp\/v2\/media\/6657"}],"wp:attachment":[{"href":"https:\/\/www.insightmind.com.my\/site\/wp-json\/wp\/v2\/media?parent=6675"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.insightmind.com.my\/site\/wp-json\/wp\/v2\/categories?post=6675"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.insightmind.com.my\/site\/wp-json\/wp\/v2\/tags?post=6675"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}