Question on Avro schema management - NoSQL Database

Hi,
in our use case, we need to generate Avro schema on the fly (given some business object structure) and install with Oracle NoSQL. I have
the following questions:
1. Is there an API to install the Avro schema in NoSQL. Ideally we'd like to avoid using the command line tool for this (and also creating
a .avsc file)
2. Any recommended way to make the Avro schema available on the client? Ideally we don't want to use any file system operations
for this. Would it be a reasonable way to store the Avro schema itself as a String in NoSQL and then when the client connects to NoSQL,
first thing it does is read the schemata stored in NoSQL and parse them?
On another topic, are there any performance penalties using JsonAvroBinding vs. GenericAvroBinding. Our objects are all JSON so we'd
like to use JsonAvroBinding, however we'd go the extra mile and use GenericAvroBinding if that performs better.
Best Regards and thanks in advance for your answer,
Ralf 

Hello Ralf,
1. Is there an API to install the Avro schema in NoSQL. Ideally we'd like to avoid using the command line tool for this (and also creating a .avsc file)No, there is no administrative API currently available. So currently, I think this would have to be done using a script that is invoked by your application and that uses the NoSQL DB admin CLI.
2. Any recommended way to make the Avro schema available on the client? Ideally we don't want to use any file system operations for this. Would it be a reasonable way to store the Avro schema itself as a String in NoSQL and then when the client connects to NoSQL, first thing it does is read the schemata stored in NoSQL and parse them?The begs the question of how you'll do schema evolution. Will there ever be multiple versions of the same Avro schema (more than one version having the same schema name) in your application? If so, there will be additional complications with the approach you're taking.
If not (if there will only ever be one version of a given schema), you can use the latest (only) version of a schema on the client. In that case you can call AvroCatalog.getCurrentSchemas to get the schemas you need. If a schema you recently added is not in the returned map, call AvroCatalog.refreshSchemaCache followed by AvroCatalog.getCurrentSchemas.
On another topic, are there any performance penalties using JsonAvroBinding vs. GenericAvroBinding. Our objects are all JSON so we'd like to use JsonAvroBinding, however we'd go the extra mile and use GenericAvroBinding if that performs better.They should have similar performance characteristics -- they do essentially the same thing -- but I would guess that in general JsonAvroBinding is slightly faster than GenericAvroBinding, based on my knowledge of the Avro and Jackson code. However, we have not done any performance testing to compare them. It is possible that the answer depends on the schema you're using. So you should do your own testing, if you're concerned about minor performance differences.
--mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       

Related

capture-schema and Informix

As I wrote in a response to the topic http://softwareforum.sun.com/NASApp/jive/thread.jsp?forum=58&thread=13755, the capture-schema utility behaves very strangely with an Informix database. I just read in an article (Free SunOne ID: 8343) that the capture-schema is supposed to work with the databases MS-SQL, Oracle, Pointbase and Sybase. No mention of Informix. Does that mean that I can't expect it to work with Informix? If that is the case, would that mean that I can't expect CMP to work at all with an Informix database? Surely not, but I really need some answer.
We are building a pretty big system, consisting of a few hundred EAR modules, of which maybe a hundred are using CMP with maybe 800 tables in an Informix database.
I started to deploy a few core modules and all went well until I tried to deploy a sligtly larger one, with 20+ entity beans. That's when really strange things started to happen. During deployment I got errors saying that a certain column didn't exist in a table. When I examinded the schema file, I found that a column reference in the table was pointing all wrong. Got that table to work by changing the order of the "-table" options, but then the problem just moved to other tables.
The documentation indicates that there is no way to deploy without a schema file, which basically means that I'm stuck. Should I give up on Sun ONE? Please say no!
Hi Gunnar,
Informix is not an officially supported database vendor
(this is why you didn't see it in the list).
Here is what you can try to do:
Create a similar schema on any other database (preferably the one that is the closest in behavior to Informix) and capture schema there.
The CMP runtime uses this data to construct the sql statements and identify pk/fk dependencies.
If the generated sql has any problems, let us know and we will try to help.
Thank you,
Marina
Thanks for your reply. By using a newer JDBC driver from Informix (IBM), I have already managed to produce a schema that lets me deploy the modules. I haven't replied to this topic because now I get strange runtime errors that I have written about in thread 58/19184.
Very sorry to hear that Informix is not officially supported. That might be the reason why there has been no reply to that other thread. We are building a very large system, and changing database is not an option. I would have hoped that SunONE could have been a strong candidate when we later this year decide what app servers to buy. Thanks anyway.
In my previous reply I wrote that capturing schema now seems to work. I was wrong. After adding a simple column to a table, the schema produced is again broken. REFERENCE links point to ID links for completely unrelated elements.
Since using CMP with Sun ONE is impossible without a way to produce a correct schema, I would think that fixing the capture-schema utility would have pretty high priority. The file format isn't very "human friendly", so it is very difficult to correct the errors manually. This is very frsutrating, since I am trying to evaluate SunONE and I really like almost everything else, so far, but this is not something that you can work around. What to do?
Hi Gunnar,
S1 Studio CE allows you to browse dbschema files. You can see tables, columns, indexes and foreign keys captured in the schema. Just point S1 Studio to your dbschema files. It also allows you the capture dbschemas and is much more robust than the command line interface.
Regards,
-- markus. 
Hi Markus,
Thanks for your reply. I might give it a try, but I really want to be able to do it from ant. Also, the cli utility works fine (so far) with SQL Server, and the Informix evaluation is put on ice until the char problem is solved.
I have some more information that might help track down the capture-schema bug:
When I place the -table options in such an order that there are no FK dependencies from a table to another table that comes before it in the argument list, there are much fewer possibilities for capture-schema to use REFERENCEID, since most column references will be to columns that it has not seen yet. The schema file will then be much larger but (so far) always correct.
Disregard the previous reply. Today capture-schema messed up again. By moving around the order of the table options, I can make it produce a correct schema, but I can't find a system to it.
I have now tried using S1 Studio to capture the schema, and so far it works. Still, it is not a real solution since the Studio capturing can't be run by ant.
Obviously Sun has access to a working implementation of the capturing schema logic. Couldn't the AS people politely ask the Studio people if they might get a copy of the code, and then create a new capture-schema utility?
Just an idea.
I am confused by your need to cature the schema in an ant script? Is the database schema that you are coding against changing that dynamically?
Based on your description, I would assume that the database schema is a static entity? If it is fairly static, why not capture the schema once, and put that into yfscr (your favorite source code repository)?
vbk 
As you say, the schema is fairly static, but not static enough. Our database contains around 800 tables. The system is divided into service projects, where each service handles up to around 20 beans, with 5 as the avarage. Each such service project contains a schema file for the beans that it handles. We already have meta information in the database that describes all entities, their attributes and relationships. Each entity is also associated with a single service that is allowed to update the bean, and zero or more services that can read selected attributes from it. From this information we automatically create all Ant build files, all entity bean implementations, session facades, test cases etc. Sometimes tables are changed or a bean moves from one service to another. Having to manually maintain the schema files would be irritating as well as error prone. Our system is complex enough already!

How to manage large quantities of XML data efficiency in a web app

Hello,
I am planning the development of a web application/web service which asks users questions according to a script represented in XML. There is likely going to be many hundreds of such XML scripts.
I would like some advice as to the best way to handle this XML data, particularly in terms of whether I should store this information in a database, or whether I am better off treating them as separate files much like text or HTML files.
I would like to know the sort of things that should influence a developer's decision, and what sort of solutions and database-related technologies there are to support this XML storage problem.
Thank you very much for any advice.
Greg 
If you have a collection of documents, and you have people who are editing those documents and creating new ones, then you might need a Content Management System to manage the process. You could of course write your own management system, but it would still be worthwhile to to have a look at existing CMSes to see which of their ideas you could use. 
Thank you for this, but it doesn't answer my original question. I would like to understand better the issues behind storing XML content. When would it be better to use a database? When should I just store them in files? If it is useful, for my application, the XML files will be read much more often than they are written/edited. I would also like to be able to search through the data. 
Well, the issues are those that are addressed by content management systems in various ways. The fact that the data happens to be XML is (I think) not all that important. The main issue is that you have lumps of text that need to be stored and retrieved.
For one project I did, I decided to store data in the database rather than in files because it was going to be accessed from several application servers, and it was easier to make them all access the same database than it was to make them all access the same file system.
That's the sort of question that arises. But it's nothing to do with XML per se. The point I'm trying to make here is that it's a general content management decision you are trying to make, so you need to read up on content management in general. I'm no expert in that field but that's what I would do if I had to answer the question. 
You probably don�t need a full blown CMS solution for XML document management but you may want to consider an XML DB. Storing your documents as flat files may work while your app is simple, but businesses change and software that was supposed to be �simple� has a way of becoming complex. A database may give your software more flexibility (and thus increase its survivability). Generally, the approach you should take will depend on how you use your data.
If your data use is document-centric (meaning you only work with whole documents and your queries / updates don�t span documents), then you may be able to get away with a flat file format. Keep in mind that you�ll also have to deal with partial reads / writes (consider an unexpected power outage or failure) that a DB could manage for you. Relational DBs with XQuery support layered on top tend to perform better for document-centric queries vs native XML DBs.
If your use is data �centric (meaning your queries / updates span multiple documents), then I would recommend a native XML DB. The XQuery support offered by these DBs will greatly simplify your code as they take care of the queries and updates for you. In addition, these DBs tend to improve performance over their relational counterparts (and definitely over flat files) because of the way they index and store the XML documents. Try to avoid DBs that layer XQuery on top of a relational DB in this case. These DBs must shred the documents into tables and the performance will not be as good as a true native XML DB.
I hope this helps,
Alex

using XML persistence during database outage

Hello,
I am in the middle of designing error handling for a java rmi application. Certain types of transactions on the client are not dependent on the database being available. However, an audit of the transaction needs to be placed in the database for reporting at a later time.
I am trying to architect this scenario where clients can still process a subset of transactions and track the transactions locally until the database, assuming it is down, becomes available. At that point, I would commit all transactions from the client's local temporary store to the database.
I think this would be a great use for an xml file. I have not yet utilized xml from java and am wondering if this will be a difficult task.
Can you please suggest what tools I would need to integrate into this solution or if there is a better known pattern for handling this type of scenario?
Thanks in advance for any thoughts.
Todd
Using XML in Java isn't very difficult. However, XML isn't really a great replacement for a relational DB. You can find all the info you need on using XML in Java here or by searching on Google. The only advantage I can think of is that you could use schemas to enforce the DB data validation to a degree.
Look at the Memento pattern and Serialization.

Storing XML in database

We have a server application which stores its configuration in XML file. We are migrating to J2EE, there is a thought going on that we should store the configuration in database and use EJBs to read and write the data.
I dont want to change the existing interfaces of server that caters data requests. I just want to change the persistence part of server which writes/reads configuration to/from database instead of XML file. Is there a nice and efficient way of storing XML in database? Can any one help me with design of database schemas and issues involved?
Thanx 
The big question here is what database do you want to use? Some databases directly support XML input/output, others allow you to store XML as a String (varchar), others put limitations on varchar sizes. If you want to maintain a level of database independence, I would recommend storing the XML document as a BLOB in your database. This allows you to store any kind of binary data, XML docs, Java objects, images, etc.
Good luck 
If you are using oracle8i or higher you can use CLOB.
Oracle provides samples on how to do this 
If you have Oracle 9i, you should even use XML type rather than CLOB...
Hi,
If you are looking at cost rather than (possibly) speed, then mysql is built for handling blob data. There is a newish(?) database storage using JDO which is java data objects , written in java and I found this though IBM\developworks\com\tutorials or something similar. There is also the xindice opensource project.
I have never used Oracle, so I have no comparison. But these new technologies are more interesting than using what is already written or drafted, I think.
Anyway,if you are developing and testing, try some of these out, else if in production, maybe Oracle is better.
best
kev

Urgently!! Is this possible?

Hi,
I had read xml specification long back and couldnot
get oppurtunity to work with this that time,Now I want
to create an xml file which will replace one of the table
in database, and using an java I should be able to
view ,insert ,delete and update from the xml file.
one of my friend had earlier had worked on this
and was unable to update the contents to xml file.
According to me it should be possible and I would
like to get the feed back from those who had already
worked with the same logic,if possible please send
the sample code so that I can verify it.
I require the solution urgently as my work is held
because of this.
regards vicky 
Yes, it's possible. I don't have any sample code because I haven't done it. But it sounds like an idea which is not very useful. One of the problems with it: Every time you do an update you have to read in the entire XML file, change it, then write out the new version. So you have performance problems when the file gets large, and consistency problems when simultaneous updates occur.
However, for small tables that rarely change, it could be useful. 
Yes, it's possible. I don't have any sample code
because I haven't done it. But it sounds like an idea
which is not very useful. One of the problems with
it: Every time you do an update you have to read in
the entire XML file, change it, then write out the new
version. So you have performance problems when the
file gets large, and consistency problems when
simultaneous updates occur.
However, for small tables that rarely change, it could
be useful.Yes I totally agree with doc. It is possible and a lot new technologies like xpath used with dom really make the stuff a pleasure to work on. But the biggest problem is performance(as the size grows the performance drops exponentially) at the moment. As the technologies evolve I personally believe this is going to be teh future. 
I hope you are not suggesting that Relation Databases will be replaced with XML files in the future.
This just will not happen. 
I hope you are not suggesting that Relation Databases
will be replaced with XML files in the future.
This just will not happen.Not in the true sense. But yes xml is gonna do a lotta job databases used to do.
Dont worry the database guys are not gonna lose jobs...just kiddin
A good example in the direction is (check the website)
http://www.ixiasoft.com/
forgot to mention
I read in FAQ's and quote:-
"Why should I use a native XML Database instead of a relational database?
Three reasons: Superior Performance, Reduced Development Efforts and Greater Flexibility.
Relational databases are structured with a rigid columns and rows model, which is poorly adapted to represent the variety and the richness of the relationships found in the structure of XML documents. Using relational databases for storing and retrieving XML documents involves complex and time-consuming bi-directional transformations. They deconstruct XML documents and store their elements in tables resulting in database inefficiencies. In order to "shoehorn" the XML content into tables, one must programmatically map the data, and in many cases re-program this process, in the event that the content, or the application that it supports, changes. Conversely, creating tables that indiscriminately seize all information contained in XML documents creates volumes of empty results spaces, which in turn produce database inefficiencies.
Native XML databases store XML documents in their original format, which eliminates the mapping processes and gives rise to superior performance. The structure of native XML databases will evolve to reflect changes in the structure of the XML content, or changes to the applications they support without exhaustive re-programming."
I hope you are not suggesting that Relation
Databases
will be replaced with XML files in the future.
This just will not happen.Not in the true sense. But yes xml is gonna do a
lotta job databases used to do.I agree, IMHO XML looks set to replace a pretty large subset of data which should never have been stored in relational databases in the first place, because the Object to Rational mapping is just so Arhh, lousy.
- System Data
- Configuration Data
- Meta Data
- Document Data
Yes you can ,
I have developed an JDBC driver for XML docs. This lets U
to perform standard SQL operations like SELECT, INSERT,
UPDATE and delete.
One can use java program and use JDBC api .
I can send an PDF document. is there a way to post docs/files.
regards
Rajesh

Categories

Resources