Comma-separated values (CSV) is a widely used file format that stores tabular data (numbers and text) as plain text. Its popularity and viability are due to the fact that a great deal of programs and applications support csv files, at least as an alternative import / export format. Moreover, the csv format allows users to glance at the file and immediately diagnose the problems with data, if any, change the CSV delimiter, quoting rules, etc.
All this is possible because a CSV file is plain text and an average user or even a novice can easily understand it without any learning curve. In this article, we will try to investigate quick and efficient ways to export data from Excel to CSV and learn how to convert Excel to CSV keeping all special characters and foreign symbols intact. The below methods work for all versions of Excel 2016, 2013, 2010 and 2007. (keeping special characters) How to convert Excel file to CSV If you need to export an Excel file to some other application, e.g. To the Outlook Address book or Access database, you can convert your Excel worksheet to CSV first and then import a.csv file in another program. Below you will find the step-by-step instructions to export an Excel workbook to the CSV format by using Excel's Save As command. In your Excel workbook, switch to the File tab, and then click Save As.
Alternatively, you can press F12 to open the same Save As dialog. In the Save as type box, choose to save your Excel file as CSV (Comma delimited). Besides CSV (comma delimited), a few other csv formats are available to you:. CSV (comma delimited). This format saves an Excel file as a comma-separated text that can be used in another Windows program or another version of Windows operating system. CSV (Macintosh). This format saves your Excel workbook as a comma-separated file for use on Mac operating system.
UTF-8 is similar to UTF-16, except its codes are all one byte (8 bits) long, with characters represented by between one and four codes. Plain text (ie ASCII) characters are all represented by a single byte, in a manner identical to normal non-Unicode strings.
CSV (MS-DOS). Saves your Excel file as a comma-separated file for use on the MS-DOS operating system. Unicode Text (.txt). This is a computing industry standard supported by almost all current operating systems including Windows, Macintosh, Linux and Solaris Unix. It can handle characters of almost all modern languages and even some ancient ones.
So, if you Excel file contains data in a foreign language, save it in the Unicode Text format first and then convert to CSV, as explained in. All of the above mentioned formats save only the active Excel sheet. Choose the destination folder where you want to save your Excel file in the CSV format, and then click Save. After you click Save, Excel will display two dialogs. Don't worry, these are not error messages and everything is going right. The first dialog reminds you that only the active Excel spreadsheet will be saved to the CSV file format.
If this is what you are looking for, click OK. If you need to save the contents of all the sheets your workbook contains, click Cancel and then save each spreadsheet individually as a separate Excel file (workbook). After that save each Excel file as CSV.
Clicking OK in the first dialog will display a second message informing you that your worksheet may contain features unsupported by the CSV encoding. This is Okay, so simply click Yes. This is how you convert Excel to CSV. The process is quick and straightforward, and you are unlikely to run into any hurdles along the way.
Export Excel to CSV with UTF-8 or UTF-16 encoding If your Excel spreadsheets contain some special symbols, foreign characters (tildes, accent etc.) or hieroglyphs, then converting Excel to CSV in the way described above won't work. The point is the Save As CSV command distorts any characters other than (American Standard Code for Information Interchange). And if your Excel file has smart quotes or long dashes (e.g. Inherited from the original Word document that was copied /pasted to Excel), these characters would be mangled too. An easy alternative is saving an Excel workbook as a Unicode(.txt) file and then converting it to CSV. In this way you will keep all non-ASCII characters undamaged. Before we proceed further, let me briefly point out the main differences between UTF-8 and UTF-16 encodings so that you can choose the right format in each particular case.
UTF-8 is a more compact encoding since it uses 1 to 4 bytes for each symbol. Generally, this format is recommended if ASCII characters are most prevalent in your file because most such characters are stored in one byte each. Another advantage is that a UTF-8 file containing only ASCII characters has absolutely the same encoding as an ASCII file.
UTF-16 uses 2 to 4 bytes to store each symbol. However, a UTF-16 file does not always require more storage than UTF-8. For example, Japanese characters take 3 to 4 bytes in UTF-8 and 2 to 4 bytes in UTF-16. So, you may want to use UTF-16 if your Excel data contains any Asian characters, including Japanese, Chinese or Korean. A noticeable disadvantage of this format is that it's not fully compatible with ASCII files and requires some Unicode-aware programs to display them. Please keep this in mind if you are going to import the resulting file somewhere outside of Excel. How to convert Excel to CSV UTF-8 Suppose you have an Excel worksheet with some foreign characters, Japanese names in our case: To export this Excel file to CSV keeping all the hieroglyphs intact, follow the steps below:.
In your Excel worksheet, go to File Save As. Name the file and choose Unicode Text (.txt) from the drop-down list next to 'Save as type', and then click Save. Open the unicode.txt file using your preferred text editor, for example Notepad. If you do not need exactly the comma-separated file, just any CSV file that Excel can understand, you can skip this step because Microsoft Excel handles tab-separated files fine. If you do want a comma-delimited CSV file, proceed with Notepad in the following way:.
Select a tab character, right click it and choose Copy from the context menu, or simply press CTRL+C as shown in the screenshot below. Press CTRL+H to open the Replace dialog and paste the copied tab ( CTRL+V) in the Find what field. When you do this, the cursor will move rightwards indicating that the tab was pasted. Type a comma in the Replace with field and click Replace All. In Notepad, the resulting file should look similar to this:. Click File Save As, enter a file name and change the encoding to UTF-8. Then click the Save button.
Open the Windows Explorer and change the file extension from.txt to.csv. An alternative way is to change the.txt extension to.csv directly in Notepad's Save as dialog and choose All files (.) next to Save as type, as shown in the screenshot below. Open the CSV file from Excel by clicking File Open Text files (.prn,.txt,.csv) and verify if the data is Okay.
If your file is intended for use outside of Excel and the UTF-8 format is a must, do not make any edits in the worksheet, nor should you save a CSV file in Excel, because this may cause encoding problems. If some of the data does not appear right in Excel, open the file in Notepad and fix the data there. Remember to save the file in the UTF-8 format again. How to convert an Excel file to CSV UTF-16 Exporting an Excel file as CSV UTF-16 is much quicker and easier than converting to UTF-8.
Utf 16 To Ascii Converter
This is because Excel automatically employs the UTF-16 format when saving a file as Unicode (.txt). So, what you do is simply click File Save As in Excel, select the Unicode Text (.txt) file format, and then change the file extension to.csv in Windows Explorer. If you need a comma-separated or semicolon-separated CSV file, replace all tabs with commas or semicolons, respectively, in a Notepad or any other text editor of your choosing (see Step 6 above for full details). Other ways to convert Excel files to CSV The methods of exporting Excel to CSV (UTF-8 and UTf-16) we have just described are universal, meaning they work for all special characters and in any Excel version, from 2003 to Excel 2016.
There exist a handful of other ways to convert Excel data to CVS. Unlike the previous solutions, they won't produce a pure UTF-8 CSV file (except for that allows exporting Excel files to several UTF encodings), but in most cases they will contain the correct characters which you can painlessly convert to the UTF-8 format later using any text editor.
Convert Excel to CSV using Google spreadsheets The use of Google Spreadsheet to export an Excel file to CSV seems to be a very simple workaround. Assuming that you already have installed, perform the following 5 easy steps.
On the Google Drive, click the Create button and choose Spreadsheet. Click Import from the File menu. Click Upload and choose a file from your computer. Choose Replace spreadsheet and then click Import. If you have a relatively small Excel file, then you can simply copy / paste the data to a Google sheet to save time.
Go to the File menu Download as, select Comma separated values (CSV, current sheet) and save the SCV file to your computer. Finally, open the CSV file in some text editor to make sure all of the characters are saved correctly. Regrettably, the CSV files converted in this way do not always correctly display in Excel. Kudos to Google and shame to Microsoft: ) Save.xlsx to.xls and then convert to.csv file This method of converting Excel to CSV hardly needs any further explanations because the heading says it all.
I have come across this solution on one of Excel forums, cannot remember now which exactly. To be honest, this method has never worked for me, but many users reported that special characters, which got lost when saving.xlsx directly to.csv, were preserved if to save a.xlsx file to.xls first, and then save.xls as.csv in Excel, as explained in.
Anyway, you can try this method of exporting Excel to CSV on your side and if it works, this can be a real time-saver. Save Excel as CSV using OpenOffice, an open-source suite of applications, includes a spreadsheet application named Calc that is really good at exporting Excel data to CSV format. In fact, it provides more options to convert spreadsheets to CSV files (encodings, delimiters etc.) than Excel and Google Sheets combined. You simply open your Excel file with OpenOffic Calc, click File Save as and save the file as Text CSV (.scv) type.
In the next step, you will have a choice of various Character sets (encodings) and Field delimiters. Naturally, you select Unicode (UTF-8) and comma if your goal is a CVS UTF-8 file (or whatever encoding and separator character you need) and click OK. Typically, the Text delimiter will remain the default quotation mark ('). You can also use another application of the same kind, to perform fast and painless Excel to CSV conversions. It would be really nice if Microsoft Excel provided similar options with regard to CSV encodings, agree? These are the ways of converting Excel to CSV I am aware of.
If you know other more efficient methods to export an Excel file to CSV, please do share in comments. Thank you for reading! You may also be interested in:. Hi there, They send me a CSV file in which the information is separated into columns. Once I open it on my PC the information per line is displayed in one cell per line and all columns are distringuished with a comma between them (i.e. A comma delimited CSV file), although the person that created it had the information separated into columns. I do not want to use every time the function DataText to columns in order to bring the information back to columns.
What is wrong with my excel 2007 that I have installed and the CSVs sent to me are only opened in this comma delimited mode? Hello, Sorry about being a bit off the topic, but really need help. I am trying to save an Excel spreadsheet to a tab delimited text file.
Information in the spreadsheet contains french accent symbols. They appear alright in the spread sheet, but after saving a tab delimited file these symbols turn into question marks.
Now the most interesting part - i have changed a setting in tools (next to save button) to unicode and unicode utf-8, but none of them changed a thing. I would really appreciate if someone could help me out with this. Thanks, Alex. This week I was working on some project that uses excel sheet, dbf and ASCII formated data and I was a bit challenged as how to convert data from excel 2007 to the other formats it does not directly have by itself.
Finally, I went to GOOGLE search with the question I had and wrote my question on the search bar of google window. Automatically,my problems are resolved by reading the suggestions forwarded by respective experts in the subject matter and I am done with my problem and also I am comfortable using the excel 2007. Thank you so much and I will come to you with questions as I proceed with my project in the preceeding time. With Regards, Habtamu. I am trying to convert Excel to.csv format so I can import into a database. I followed all the steps you listed.
Utf-32
However, when I open the.csv file in Notepad to make sure the formatting is correct, I find after the last line of data there are hundreds of commas running down the left side. A co-worker does not have this problem when she tries it on her PC. Which makes me think there is a setting of some kind in Excel or an add-in not enabled that I need on my PC. Our IT staff are not familiar with the 'inner workings' of Excel can't really help with this issue. Can you help?
I believe there are a lot of good articles about this around the Web, but here is a short summary. Both UTF-8 and UTF-16 are variable length encodings. However, in UTF-8 a character may occupy a minimum of 8 bits, while in UTF-16 character length starts with 16 bits. Main UTF-8 pros:. Basic ASCII characters like digits, Latin characters with no accents, etc. Occupy one byte which is identical to US-ASCII representation. This way all US-ASCII strings become valid UTF-8, which provides decent backwards compatibility in many cases.
No null bytes, which allows to use null-terminated strings, this introduces a great deal of backwards compatibility too. UTF-8 is independent of byte order, so you don’t have to worry about Big Endian / Little Endian issue.
Main UTF-8 cons:. Many common characters have different length, which slows indexing by codepoint and calculating a codepoint count terribly.
Even though byte order doesn’t matter, sometimes UTF-8 still has BOM (byte order mark) which serves to notify that the text is encoded in UTF-8, and also breaks compatibility with ASCII software even if the text only contains ASCII characters. Microsoft software (like Notepad) especially likes to add BOM to UTF-8.
Main UTF-16 pros:. BMP (basic multilingual plane) characters, including Latin, Cyrillic, most Chinese (the PRC made support for some codepoints outside BMP mandatory), most Japanese can be represented with 2 bytes. This speeds up indexing and calculating codepoint count in case the text does not contain supplementary characters. Even if the text has supplementary characters, they are still represented by pairs of 16-bit values, which means that the total length is still divisible by two and allows to use 16-bit char as the primitive component of the string. Main UTF-16 cons:.
Lots of null bytes in US-ASCII strings, which means no null-terminated strings and a lot of wasted memory. Using it as a fixed-length encoding “mostly works” in many common scenarios (especially in US / EU / countries with Cyrillic alphabets / Israel / Arab countries / Iran and many others), often leading to broken support where it doesn’t. This means the programmers have to be aware of surrogate pairs and handle them properly in cases where it matters!.
It’s variable length, so counting or indexing codepoints is costly, though less than UTF-8. In general, UTF-16 is usually better for in-memory representation because BE/LE is irrelevant there (just use native order) and indexing is faster (just don’t forget to handle surrogate pairs properly). UTF-8, on the other hand, is extremely good for text files and network protocols because there is no BE/LE issue and null-termination often comes in handy, as well as ASCII-compatibility. They’re simply different schemes for representing Unicode characters. Both are variable-length – UTF-16 uses 2 bytes for all characters in the basic multilingual plane (BMP) which contains most characters in common use.
UTF-8 uses between 1 and 3 bytes for characters in the BMP, up to 4 for characters in the current Unicode range of U+0000 to U+1FFFFF, and is extensible up to U+7FFFFFFF if that ever becomes necessary but notably all ASCII characters are represented in a single byte each. For the purposes of a message digest it won’t matter which of these you pick, so long as everyone who tries to recreate the digest uses the same option. See for more about UTF-8 and Unicode. (Note that all Java characters are UTF-16 code points within the BMP; to represent characters above U+FFFF you need to use surrogate pairs in Java.).