dbms-publishing Stürner

dbms publishing - Oracle Database Technology

Interviews

Günther asks Tirthankar #1 (Dez 2017)

Ich schreibe, weil ich nicht weiß, was ich denke,
bis ich lese, was ich sage.
Mary Flannery O' Connor

Günther asks Tirthankar

Interview Tirthankar Lahiri* & Günther Stürner September 2017

Interview #2-2017 *Oracle Disclaimer - The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Günther: Hallo Tirthankar, thank you for your time! I have a few questions for you…

Tirthankar: Of course, no problem. As long as they are not hard questions ..

Günther: In the last few years you visited Germany, German customers and several DOAG conferences and I am sure many of our readers know you. For all others could you please describe your current role within the Oracle database development organization ?

Tirthankar: I am a Vice President in the Oracle Database Organization, responsible for a group known as the Data and In-Memory Technologies. This group is responsible for the Oracle Database “Engine”, including Transactions, Tables, Indexes, Space management as well as Database In-Memory. I am also responsible for the TimesTen In-Memory Database which is a separate database product.

Günther: In Germany you are known as Mr. InMemory but beside that there are a couple of other important themes on your agenda and development groups you have to manage. Is it fair to say that In-Memory-technologies are the most exciting area in these days ?

Tirthankar: This is truly a golden age for Databases! There is an explosive growth in data volumes, increasingly stringent real time requirements for data processing, new and emerging areas for data management such as IoT, machine learning, cloud, etc. In-Memory is definitely a key enabler for this renaissance but data will continue to live on disk and Flash and efficient cross-tier mechanisms are required to enable the most cost-effective solutions. Another very interesting new technology space is with Non-Volatile memory. Previously the realm of academic research, Intel is very close to productizing their “3D Xpoint” non-volatile memory that looks just like DRAM except that its contents are safe from power failures. So lots of activity in the Database engine, spanning memory, non-volatile memory, flash and conventional storage.

Günther: Database In-Memory is the youngest member of the Oracle Database Options family but was from the start in June 2014 a kind of ‘star’ and very successful.

Tirthankar: Yes, Database In-Memory is one of the most successful new technology offerings in Oracle Database. It is definitely motivated by the increasing need for real-time analytics on any data source and with the amount of memory available on modern hardware today, it is definitely very attractive to store hot subsets of a database completely in-memory.

Günther: May I ask you to briefly describe the basic principles of the Database In-Memory technology?

Tirthankar: The basic idea is actually very simple and elegant. It’s been known for a very long time that transaction processing (OLTP) workloads benefit best from a row-based format, where the contents of a row are stored contiguously. This is because OLTP workloads typically access a small number of rows but process many columns (e.g insert a new order, retrieve a customer record by customer id). Oracle already has a highly efficient inmemory row-store, which is the Database buffer cache for the fastest possible OLTP. However, for Analytics, the row format is not optimal – most analytic reports access a few columns in a table but a very large number of rows, e.g. find total sales in Munich. This query will need to access the city column and the sale price column but will access all rows in the sales table. It is much faster to store the data in columnar format, where the city column can be scanned and filtered for values matching Munich and the corresponding sale column value is added to the aggregate. Oracle’s unique approach is therefore to create the first “Dual Format” database in which data is stored in-memory in both the row format (the existing buffer cache) and column format (the new in-memory column store). The column store is a new purely in-memory structure, constructed from the row based representation of the table on storage. It can be scanned at billions of rows per second using SIMD vector instructions that are nowadays available with modern CPUs (such as Intel’s AVX instruction set). Another very important attribute of Database In-Memory is that it is not necessary to have the entire database live in-memory. You can choose the important subsets of the database on which real-time analytics is required, such as the most recent partition(s) of the SALES table.

Günther: Oracle was not the first database with a columnar/in-memory technology. Today there are a couple of players in that segment with different architectures and features. What are the main benefits of Oracle Database In-Memory compared to these competitors ?

Tirthankar: The main benefit of Oracle’s approach is that since the in-memory column store was natively built into the Database engine, all advanced high availability, security, reliability features of Oracle Database work without compromise. Many of our competitors “added” in-memory columnar capabilities as a separate engine that was later incorporated into the main database. This creates long term issues of maintainability and compatibility. Oracle Database In-Memory is probably the first in-memory technology that from day one worked seamlessly with all class-leading Enterprise features like RAC, Flashback, Golden Gate, Advanced Security, etc…

Günther: Application transparency is extremely important for customers. This brings me to SAP. The vast majority of SAP application customers run their apps with Oracle as their database. How can they benefit on their given platform from the Database In-Memory technology ?

Tirthankar: SAP has certified Oracle Database In-Memory less than a year after its initial release, a testament to how ready and robust the technology was. Using Database In-Memory is fairly trivial: all you need to do is to identify hot candidate tables and issue DDL ALTER commands to populate them into the column store. Once this is done, analytic queries can transparently benefit from the in-memory format. SAP has also enhanced its schema for its Business Warehouse (BW application) where they take advantage of database in-memory through a greater level of denormalization in the schema. (Since the in-memory format can be scanned at enormous speeds, it is often advantageous to replace joins with scans of denormalized tables.)

Günther: Oracle Database In-Memory organize data in columnar format in main-memory. This format is known as excellent for analytic workloads but is sub-optimal – to be polite- for transaction workloads. What are the best fitting use-cases for Oracle Database In-Memory ? Tirthankar What you say is true of pure columnar databases but as I have said earlier, we have a dual format architecture for the fastest possible performance for both OLTP and Analytic workloads. Transactions run against the row store, and changes made by transactions are applied through purely in-memory operations to the column store. Since the column store is not persistent, maintaining it is very lightweight, cheaper than maintaining indexes. In fact, you can drop analytic indexes altogether once you have the column store since the column store gives you fast analytics on all columns. This can speed up DML operations considerably since they no longer have to maintain indexes. Dropping Analytic indexes also can help to reduce the overall size of the database. We have customers who have used Database In-Memory in OLTP heavy environments because they can replace their analytic indexes with the in-memory column store. Customers like Bosch have found that using the in-memory column store, not only do their reporting queries run faster, but their DML and batch operations also speed up due to reduced index maintenance.

Günther: What is the best way for an Oracle customer to identify the applications and databases which are best for In-Memory ?

Tirthankar: We do have an in-memory advisor to assist with this, but the basic thing to look for is a database or application that spends a large fraction of its time on analytic data access (queries that access a large amount of data, and perform index range scans or full table scans). This is of course true for most Data Warehouse or Datamart applications, but is also often true even for OLTP workloads. It is also important to consider how much time the application spends on maintaining analytic indexes – do DMLs run slow because of the large number of indexes on a table? If so, replacing them with the column store would help speed them up. We have a lot of information in AWR reports, segment level statistics, SQL monitor reports, etc. that can be used to identify candidate applications and database objects.

Günther: We talked about external competitors and the benefits of the Oracle Database In-Memory implementation. But what´s about the internal competitor like Exadata ? Why should someone decide for an Exadata machine when Database In-Memory makes his data warehouse so fast with conventional hardware or why should an Exadata customer use In-Memory ?

Tirthankar: Exadata is not a competitor and it’s unfortunate it’s sometimes seen as such. Exadata is the optimal platform for Oracle database and the optimal platform for Database In-Memory. In-memory complements the benefits of Exadata smart scan: For data that is the hottest, it is fastest to keep it in the DRAM of the Exadata database nodes and use the full power of memory bandwidth (which is much higher than IO bandwidth) to process that data. For data that is too large to fit in-memory or not that hot, Exadata smart scans provide very efficient access to that data. In Oracle database 12.2 onwards, if you have the in-memory option, then on Exadata, the Exadata Storage Flash Cache is also formatted with the same in-memory format as we have on the Database nodes. This allows offloaded smart scans to achieve similar performance to the scans that run on the Database nodes. To use a car based analogy, you can think of Exadata as a high end sports car, and Database In-Memory like a super-charger to make it run even faster.

Günther: In spring-time 2017 when 12.2.0.1 came out you and your team released some very new cool features for Database In-Memory. My personal favorites are In-Memory Expressions, In-Memory Virtual Columns and In-Memory JSON. It seems this is something really unique. Could you please explain this cool stuff ?

Tirthankar: Yes, these are definitely very unique capabilities made possible by our dual format approach. Normally, the in-memory column store has data that is present in table columns. But often, workloads need to query a derived value such as “NET_PRICE” which is computed as (PRICE * (1-DISCOUNT_RATE) + (1 + TAX_RATE). A query that selects the net price must calculate this expression for every row that it accesses. With the in-memory expressions feature, this Net price can be stored as an in-memory virtual column in the column store: When the table is populated into memory, the “net price” column is also pre-populated using the values present within the same row. This avoids the need to recompute the values later on. Also, as the base values are modified, the in-memory expression columns are automatically recomputed as the column store is maintained. We’ve seen that this can help complex queries run 3x-5x faster by eliminating repeated computation. Note that this virtual column is not actually present in the database. It is only present within the in-memory column store. The same idea can be used for JSON. JSON is a flexible text based format for processing web data but it is very inefficient to query in its native text form. Oracle Database 12c added native JSON support but it was previously stored in the column store in that text based representation. In Oracle Database 12.2, we leveraged the In-Memory Expressions infrastructure to create an in-memory optimized binary format for JSON within the column store. With this format, searches on JSON documents (such as find me all the movie titles that contain the word “Rogue”) run up to 60x faster!

Günther: I am sure some people are now very excited and look for a test- environment to try it out. What is the best way for them to get started?

Tirthankar: Any Oracle Database Enterprise Edition installation from release 12.1.0.2 and beyond has the Database In-Memory functionality. Oracle offers free trial periods for the purposes of evaluating the technology. It is also possible to get a starter experience of Database In-Memory with the Exadata Express Service on Oracle Public Cloud. This is a low-cost way to enable in-memory with a small sized column store (up to 10GB, which is adequate for many test and dev, even small departmental, use cases). And of course any customer can use the local Oracle resources from the Sales Consulting groups in Germany, Switzerland or Austria or one of our partners. I am sure they are happy to help and maybe DOAG plans some in Memory Test-Labs. (smile). We also have a lot of online resources including best practice documents, whitepapers, an inmemory blog, and a very active inmemory product management team that can help as needed.

Günther: As you mentioned before, Database In-Memory is extremely helpful for analytic workloads. Since ten years Oracle has an high-transaction-rate in-memory database: Oracle Times Ten. Are there plans to enhance this stand-a-lone product and second, are there plans to integrate it into the Oracle Database similar to the Database In-Memory which is deeply integrated ?

Tirthankar: TimesTen is a completely separate in-memory product for a very different use case: for latency critical OLTP workloads. TimesTen is used by thousands of customers and is pretty much the gold-standard for in-memory databases for OLTP. The use case for TimesTen is primarily when the application needs to get response times within micro-seconds – such as prepaid billing workloads for Telco cellular applications, real-time stock trading, and even real-time fraud detection (should my credit card charge be authorized?) TimesTen is very much an active development product and our plans are to extend it to be a scale out in-memory database for OLTP. At Oracle Openworld that concluded on October 5, we previewed the next generation of TimesTen that we are currently referring to as “Velocity Scale” which is currently in Beta. With Velocity Scale we showed it is possible to run at previously unheard of performance: 1.2 Billion SQL Select statements per second! This is why we refer to TimesTen as the Worlds’ Fastest OLTP Database. TimesTen is really a separate product since it runs best when embedded with the application. There are no plans as of now to integrate it within Oracle Database which typically runs on hardware that is separate from the Application-Tier.

Günther: It seems that Oracle TimesTen lives a bit in the shadow of the very successful columnar-oriented Database In-Memory. Is there less need for a high-transaction-rate database or is Oracle Database so fast that a specific near-realtime transaction engine is not needed ?

Tirthankar: Oracle Database is a very fast OLTP database, but it is not able to deliver micro-second response times since it requires a network round-trip with the application. TimesTen customers in verticals like Telecommunications and Financial Services have very strict SLAs and they often require sub-millisecond database response times. This is a specialized segment of the OLTP market for which TimesTen is a perfect fit. If Oracle Database were considered to be a high end sports car, TimesTen would be a high end motorcycle – even faster, but not as full-featured.

Günther: Currently there is a discussion about the future of database-architectures esp. when it comes to NVRAMs - non-volatile RAMs. Some people expect that the end of discs as the main storage media for databases is not too far away. How will such an development influence the future Oracle database architecture ?

Tirthankar: NVRAM is a very disruptive technology that will fundamentally influence core database designs. All I can say now is please stay tuned !

Günther:Oracle 18 is knocking at the door and I am sure you and your team had some ideas for this next version of the Oracle Database. It would be great if you can share some of the new stuff with us.

Tirthankar: I can’t go into too much detail at present, but there are some very interesting new inmemory features planned for in Oracle 18c that enhance both performance and ease of use. For instance, in keeping with the autonomous theme for Oracle Database 18c, we are planning to deliver an Automatic In-Memory mechanism that automatically identifies the ideal in-memory candidate tables in a customer workload. There are many additional planned improvements for Hybrid (OLTP + Analytics) workloads to further reduce the overhead of column store maintenance in the presence of heavy OLTP activity. Furthermore, there are also many planned improvements for complex query processing such as automatic parallelization, highly optimized arithmetic operations, etc. More details later!

Günther:This is a good statement to finish our talk. Thank you very much.

Tirthankar: Welcome!

Diese Interview wurde im Red Stack Magazin, dem Mitgliedermagazin der DOAG, SOUG und AOUG veröffentlicht. Red Stack Nr.6- Dez 2017 ISBN 2366-7915.